Episode 003 · June 4, 2026 · 01:04:53

Are We All Managers Now?

Dan Gerlanc

Podcast Host

BlueSky Threads X LinkedIn

Angie Jones

VP, Agentic AI Foundation

Demetrios Brinkman

Founder, MLOps Community

Website LinkedIn

Dan, Angie, and Demetrios explore how the latest frontier models have turned software engineering into something closer to managing a team of agents, and why governance, tokenomics, and design now matter more than typing code.

0:00 −01:04:53

Show notes

Dan, Angie Jones, and Demetrios Brinkman open with a discussion of the Agentic AI Foundation (“AAIF”), founded by Anthropic, OpenAI, and Block in December 2025 and now home to roughly 180 member companies. AAIF recently launched an ambassador program (apply here) and has upcoming events across the globe from AGNTCon in San Jose to gatherings in Amsterdam, India, Tokyo, and Seoul.

A recurring theme is that the whole industry is learning agentic engineering together. So get out of your “lab” and compare notes! You don’t have to do all of this R&D on your own (well, maybe some of it, but it doesn’t hurt to collaborate).

Everything is changing. And quickly. Angie marks the release of Claude Opus 4.5 as when agentic engineering became viable. Where engineers once obsessed over context engineering and priming a repo so an agent had a chance, the latest frontier models often just need to be pointed at a codebase and told the problem.

Drawing on her time leading agentic AI at Block, Angie describes the agent they build that can hold a world model across 25,000 codebases. They paired this agent with cloud workstations where an agent picks up a Jira ticket, clones the repo, and opens a PR without anyone babysitting a terminal.

With this kind of firepower comes new problems that look less like coding and more like management. Demetrios argues the unglamorous topic of governance — keeping teams aligned, codifying security practices, deciding what belongs in “the harness” — are the new challenges companies are grappling with. Sandboxes and cloud workers have gone mainstream.

The group pushes back on the wave of AI-justified layoffs, worrying that companies are cutting the very mentorship and middle-layer “glue” needed to steer agents. They also dig into tokenomics: budgets blown by mid-year, tools that can cost more than the engineer using them, and Angie’s hard-won lesson at Block that getting 95% of engineers onto coding agents produced no velocity until she funded a small group of “AI champions” to learn the tools properly. Tokens, everyone agrees, are not the same as value.

As to what the group has found effective for agentic engineering, Angie makes the case for RPI (Research, Plan, Implement) from HumanLayer and for adversarial review. A 32-file refactor that earned a clean pass from Codex made her a believer. Alongside review skills, the Council of Mine MCP server, and Jesse Vincent’s Superpowers skill pack; Dan adds Wes McKinney’s RoboRev for continuous background review.

The episode closes on the human side: whether “we’re all managers now,” the identity crisis facing engineers who loved the craft, how Angie found the same flow state building agents that she once found writing code, and how all of this democratizes building for non-engineers. A few quick stops to discuss the token-saving Caveman skill, naming your agents, and a duck-themed calendar app. There’s still no free lunch, Dan notes, but the price has come down. At least until the next model drops.

Chapters

From this episode

We've all become these decentralized R&D departments.
— Demetrios Brinkman

You can spend a lot of tokens and get absolutely nothing done. You can't equate tokens with value at all.
— Demetrios Brinkman

Now I spend more on the tool than the actual engineer that uses the tool.
— Angie Jones

We need to kind of slow down to speed up.
— Angie Jones

I do not miss coding, which kind of blows my mind. I just want to get to solutions.
— Angie Jones

Claude isn't gonna tell you what you have to do next in your career.
— Dan Gerlanc

You've invented employees again.
— Dan Gerlanc

Mentioned

Agentic AI Foundation: open foundation for agentic AI that both guests work with
RPI (Research, Plan, Implement): research-plan-implement coding workflow from Dexter Horthy at HumanLayer
Superpowers: Jesse Vincent's Claude skills framework for structured agentic development
roborev: Wes McKinney's continuous background code-review tool for agents
Council of Mine: MCP server where AI personas debate and vote on an answer
Caveman: skill that makes a model talk like a caveman to save tokens
cmux: macOS terminal for running multiple coding agents in parallel
context rot: Chroma research on LLMs degrading as input grows; sparked 'caveman rot' idea
LLM Council (Andrej Karpathy): Karpathy's project where multiple models answer and rank each other
MLOps Community: Demetrios's community, now the AAIF's official user group
Davis Treybig: Innovation Endeavors VC whose blog on how agents differ from humans came up
Quakpit: menu-bar app that flies a duck across your screen before meetings
Flying Toasters (After Dark): classic Berkeley Systems screensaver Dan recalls fondly
Broomy: Rob Ennals's open-source IDE for working with many agents

Transcript

00:13 Dan

I’m Dan Gerlanc and welcome to Agents and Engineers. Today we have Angie Jones, who is the VP of the Agentic AI Foundation. Previously, she was VP of Engineering at Block, where she led Agentic AI, Senior Director at Applitools, and a software engineer at Twitter, LexusNexis, Teradata, and IBM.

Welcome Angie. And we also have Demetrios Brinkman, who is the founder of the MLOps community and recently joined the Linux Foundation to represent the user community at the Agentic AI Foundation. Welcome Demetrios. Great to have you both here today. And so I actually

00:43 Angie Jones

Thank you.

01:03 Demetrios Brinkman

Good to see you again, Dan.

01:10 Dan

If y’all haven’t seen, I was I think one of the earliest ⁓ guests on in the ML Ops community. I looked back, it was in March of 2020, where I talked about Dask. now we’re talking about AI and agents all the time. ⁓ so I guess first I guess you guys have

are doing a lot of events and also have an AI ambassador program that you have opened up at the Agentic AI Foundation. Do you wanna tell our listeners a little bit about that?

01:49 Angie Jones

Yeah, ⁓ I can start. So ⁓ yeah, the the Agentic AI Foundation, you know, is it started in December of twenty twenty five. So it’s just roughly six months old. So pretty young, but ⁓ you know, getting pretty mature. Like we have over a hundred and eighty members, which are ⁓ companies.

So ⁓ it was founded by Anthropic Open AI and Block. And ⁓ our companies consist of, you know, ⁓ the likes of Google and and AWS, you know, so ⁓ everyone who ⁓ is shaping ⁓ the you know this agentic AI space is doing so at ⁓ AAIF. So ⁓ yeah we we we

Have an ambassador program because everybody’s trying to figure out how to do this stuff. Like as an industry, we’re trying to learn how to do all of this stuff together, right? It’s a totally new way of doing software engineering. People are building, ⁓ even outside of engineering, they’re building agents to help, you know, operationalize some of their work and and things like that. And so, ⁓ and the ambassador program is a way for those who are really passionate about a genetic AI, who

have figured some things out and wanna you know share that with others and that could be via conference talks or blog posts, workshops, you know, things like that. And so yeah, we launched that yesterday. We had a lot of interest, hundreds of applications came in ⁓ for our initial cohort. So yeah I think this is gonna be really exciting. Demetrius you want to talk about events?

03:38 Demetrios Brinkman

Yeah, and I’ll just dovetail off of the ambassador program. I think it is super cool to bring together folks who are

like knee deep in this and if they’re sharing their knowledge out and it’s almost this way for them to have an excuse to put the time in and learn things or share what they are learning with the greater community. And that’s kind of why I think we had that inspiration for like, don’t let that knowledge just sit there. Let it get out. Let others learn from it or be inspired by it.

And ⁓ as far as events go, we’ve got so many, I don’t even know where to start, but

04:20 Dan

Ha ha ha.

04:22 Angie Jones

The globe, like

04:23 Demetrios Brinkman

Yeah,

it is wild. It is truly global when it comes to the events because there’s stuff happening in India in June. Then we’ve got stuff in Shanghai and Tokyo and Seoul. But what’s probably more relevant to the listener right now is there is a San Jose Agent Con that is happening in October. I think it’s October 22nd.

And that’s gonna be a pretty big event. I think we’re expecting a lot of people, we’ll say. Just ⁓ we’ll put it at that. And we also have ⁓ Amsterdam event on September 17th and 18th. That’s like the agent con of Europe. And again, these are the bigger events we’re doing.

citywide meetups too. So it’s not just like these big one off events in different continents, but for anybody that is interested in going to something that’s more regular in your city or in your local geo, we’re doing that too. So that’s kind of like the TLDR of our events.

05:39 Dan

That’s great. Don’t don’t just be building agents at your computer. Go and actually meet other people who are doing it.

05:43 Angie Jones

Get outside.

05:47 Demetrios Brinkman

Yeah.

05:49 Angie Jones

No, but I think I think I think that’s one of the biggest things that I’m seeing right now in the space is everyone is learning kinda in their lab. ⁓ and no one wants to say, I don’t know how to do this stuff, you know. And so you kinda quietly try to figure it out on your own. Maybe you’re watching some YouTube videos or, you know, scrolling on Twitter or something trying to, you know, get a clue. Yeah, yeah. Listen to this podcast. So like these events I think are a great way to kinda

06:12 Dan

Listening to this podcast.

06:19 Angie Jones

Come out and ⁓ share what you’re learning and learn from others, right? As much as I’m in this space, like I’m always every day learning from other people. They tried some hacky thing and it’s like, wow, I never even thought about that. I can’t wait to get back to my lab and try that out, you know? Or now we’re doing it on our phone, so I don’t even have to wait anymore. Really? Yeah.

06:38 Demetrios Brinkman

And I

06:41 Dan

Ha ha ha.

06:41 Demetrios Brinkman

Yeah, send a few texts. Yeah, and I

think it’s fascinating too because we’ve all become these decentralized RD departments. Everyone now is doing RD when they’re experimenting with the agentic engineering. And so being able to get together and compare notes with these little nodes of the system is very useful.

07:13 Dan

I’ll include all the links for the applications and the events in the show notes so folks can find those or search the Agentic AI Foundation.

So there’s always a lot to talk about in the world of agents. I mean, one thing I always like to ask folks, especially things or since things have moved really quickly, I think, in the last six months, is what are you doing differently today or see people doing differently today than they’re doing six months ago?

08:03 Angie Jones

I can start. ⁓ so six months ago was probably right around the time that like the world shifted with the drops of like Opus 47, right? and that really changed the way we were doing things. So when we were doing engineering, just like let’s say seven months ago.

⁓ there was a lot of emphasis on context engineering, on prompting your agent just right, on making sure like you had everything in place in your repo so that the agents could have like the slightest chance of helping you actually do something instead of like crapping out, right?

⁓ and then everything changed with the drop of, you know, the the latest Premier Frontier models. And so the way we do engineering has changed now as well. Like there’s not as much you need to do to kind of get started. You could kind of just point it at your repo, as is tell it like your problem and it just like

does it right and so that’s been interesting ⁓ to kind of watch this shift and also I think that it’s brought in a lot more people in the fold whereas before they were like either one ⁓ this is all hype this stuff sucks when I asked it to do something it doesn’t do well or two if I gotta lay out all this context don’t worry about it I’ll just do it myself right so now I think people who were skeptical

are now giving it another look or you know they have between in these last six months and have realized, ⁓ crap, like this is actually pretty useful, right? And so they are now slowly but surely getting more involved.

10:16 Dan

And

How does that look? You were in charge of AI at block, so in terms of doing this from a larger corporation within lots of different teams working together versus individual engineers, how have you seen that look or that differ?

10:41 Angie Jones

Yeah,

what it’s done is unlock a ⁓ a lot of capability, right? So for example, before it’s like every team trying to figure out how to get these agents to work, you know, for for their repo. ⁓ since the models have gotten so much better, now we can kind of move that up a layer and have like common

Agent that can just work across any of these repos. In fact, it can now kind of hold a world model of all of our 25,000 code bases, right? ⁓ as well as customer requests and and scenarios and things like that. And now that’s gotten to the point where everybody doesn’t have to figure out necessarily how to use the agents themselves. We can kind of

Abstract that out and bake it into the tools themselves with skills and things like that. And so now an engineer can just like delegate work to an agent. Like, hey, I’m in my sprint. ⁓ the the agent is now a member of that sprint team. And here’s your your tickets for for this sprint, right? Without babysitting at all. I don’t need to open a terminal. I don’t need to open an IDE. I don’t need to even watch the agent work.

We’ve like created these ⁓ cloud workstations where the agent will just go, you gave it the task, you know, assigned it a a JIRA issue or something like that. It goes off into that cloud workstation, clones the repo, gets the work, puts up a PR, right? And so that’s what this has unlocked for for I think enterprise teams.

12:32 Dan

So is that colo sorry, go ahead, Demetrio.

12:32 Demetrios Brinkman

Yeah.

I I

I think one thing that you’re seeing too now, as opposed to six months ago, is things becoming more popular because there’s a need for them. So the term, which is like, I joke, it’s the least sexy thing in the world, but governance and how all of our teams are using these agentic engineers. How do get these agentic engineers to work with each other? So if I have a lot of different agents,

For myself, how can I make sure that our team works well together? It’s not just that I’m off on my own with my twenty thousand agents working and burning fifty grand a month on my anthropic bill. Yeah, chilling it. Like, what does that mean for the rest of the team? Yeah.

13:09 Angie Jones

Yeah.

Kill him.

13:27 Dan

Token maxing.

13:28 Angie Jones

Yeah, talk about a leaderboard.

13:31 Demetrios Brinkman

Exactly. And it’s it’s just it in my opinion, it’s more than hey like an easy way that I think a lot of this or ⁓ the attention is going to well, yeah, now we’re generating all this code and now we have to review this code, and so that’s now the bottleneck. That’s an easy narrative that you hear folks say. I think it’s much more than that because of

Like human dynamics and also things like how are we making sure that our teams are focusing on the right things at the right time and all rowing in the right direction if we’ve got this extra firepower. So governance is one that I think became very popular in the last six months. I also think you hear and there is a lot of conversation now happening around the harness.

And what and what

Do I not do with the harness? Where do I optimize the harness versus what I’m using if I’m using some external tool? What constitutes the harness that I want to use? All of that is come into question now or has come into question. And then along the harness lines, like sandboxes now are huge. And Angie said it, kicking off cloud workers are becoming a common pattern now, which I think successfully.

Six months ago it was like very nascent still.

15:06 Angie Jones

Yeah.

15:10 Dan

And I mean there are two things here actually that struck me with what you both said. One, a lot of the challenges almost outside of the coding, the delivery of code. And yet all of a lot of these big tech companies now they’re laying off the managers, they’re removing those layers that direct this.

Do you think we’re actually going and eliminating the part that we actually need more of in this case in order to be able to direct these agents? We actually need humans who can review this part.

15:39 Demetrios Brinkman

The glue, yeah.

15:57 Angie Jones

Think ⁓ I think we definitely need that layer that and I’m really sensitive to this. ⁓ because I can see where it’s like, ⁓ okay, like man like middle layer management is kind of like a friction point, right? And so you’re slowing us down a bit. But I think some fri friction is necessary, you know, some, maybe not all, right? But

To Demetrius’ point, how do we make sure that we’re steering in the right direction if everybody’s just off doing whatever they want with their five agents or whatever? Like, man, I got an idea, whatever. And as someone who has led engineering teams, I know that not every engineer is necessarily motivated by the customer’s need, right? Some of them are in this for the craft.

And you know, whatever it is what it is. But yeah, I I just like building stuff. Like, so I’m not necessarily in tune with, you know, what’s gonna, you know, sell or you know, what what’s gonna like improve customer satisfaction or things like that, right? And so I think this is where you kind of need someone like making sure that we’re just not I had a great idea, I’m just gonna build this today.

And and ship it and it’s something that no one asked for, right? So I don’t know. What’s your take?

17:28 Demetrios Brinkman

And also I think there’s the features. So there’s folks that can go off on their own and build features, but then there’s also the piece where it’s how you provision resources in your company. And if an agent is left to doing that at will, they may have one way of doing it that is totally against the like security team’s recommendations. And so how do you codify that into their way of

17:54 Angie Jones

True.

17:59 Demetrios Brinkman

acting.

Right. So this is why it’s like it’s governance stuff. It’s not that new and it’s also not that exciting. ⁓ but it is a huge theme that comes up now that you have almost everyone using some kind of a coding agent at their job. Well, I mean, I say almost everyone, I realize that we are very much inside of a bubble. ⁓ because I have a friend who is also an engineer here in Europe where I live, and just the other day I was like I was talking to him about you.

18:21 Angie Jones

Yeah.

18:30 Demetrios Brinkman

using Claude code and he said, yeah, my company gives me GitHub Copilot, but I never use it. It’s not that good And it’s like you might wanna try again dude. It it got better.

18:40 Angie Jones

Yeah. Yeah. The other the other piece to that whole like, we don’t need manager thing, I saw a tweet this week ⁓ from someone who said, like, yeah, we don’t even need them to like help you advance your career. And it’s just like, wait, what? So because we have a tool that’s pretty capable of helping us do our jobs.

18:57 Demetrios Brinkman

Yeah.

19:08 Angie Jones

We no longer need mentors, like we no longer need people to help like coach us or like help us develop. Like that doesn’t seem right to me.

19:24 Dan

I would think it’d be even more important because Claude isn’t gonna tell you what you have to do next in your career.

19:27 Angie Jones

Exactly.

19:27 Demetrios Brinkman

Yeah.

19:31 Angie Jones

It’s gonna tell you whatever you wanna hear whatever whatever is gonna

make you smile, which might not like as a people league, I have some pretty tough conversations with folks that they appreciate a lot, right? And that, you know, I’m not you can trust what I say. I’m not blowing smoke up your butt, right? And so, you know, but I’m helping you become a better ⁓ professional. ⁓ you you don’t get that. Like I I get so sick of my agents just like, Yeah, yeah, you’re right.

19:48 Dan

Ha ha ha.

20:00 Angie Jones

No, actually it’s today is not that day. And they go, Yeah, you’re right. Like come on. Like this is this is who you think is gonna help us become better as people, as professionals. I don’t know.

20:00 Demetrios Brinkman

Yeah.

20:06 Dan

Ha ha ha

20:14 Demetrios Brinkman

I think

there’s two pieces to this too. That is one is it feels like a lazy out when folks say, all right, we’re laying off ⁓ X percent of the team because of AI. I think that that is an easy out, and a lot of times there’s more to that story than just AI, quote unquote. And the other thing to this is that

It’s an experiment right now, it feels like, and we’re gonna see in a few years how, or maybe not even a few years, maybe six months, a year from now, how that experiment pans out and if it is actually a good idea.

20:57 Angie Jones

Yeah. Hopefully sooner than later. but yeah, the shift is is moving is moving fast. ⁓ so I don’t know. I think it’ll course correct. That’s my my theory.

21:10 Demetrios Brinkman

Yeah.

21:14 Dan

Yeah, the the bill will come due for both both the tokens and the choices of today.

21:16 Angie Jones

yeah.

I think people are already seeing that. Like we’ve seen some companies go, like, whoa. Now I’ll tell you, at Block, I was like the one responsible for like buying our tools and things like that. So you have a a budget, which ⁓ is finite. and you know, you you allocate it toward this. And so I had like multi-million dollar deals with, you know, the frontier lab.

For their tooling. The problem is, if my budget is set in January, and now let’s say, you know, March, April, May, some new crazy model has just launched. Every time these models launch and they’re like so much more capable than the last, they’re much more expensive than the last as well, right?

And so you’re seeing now articles from folks like ⁓ Microsoft and other companies who have Uber who have blown through their budget they had allocated for these AI tools. And we’re not even at the halfway point of the year, right? And so I think now the companies start to realize that wait a minute, now I spend more on the tool than the actual engineer that uses the tool.

22:22 Demetrios Brinkman

Never.

22:43 Angie Jones

Right, it’s like whoa whoa whoa and people are starting to kinda second guess this a bit.

22:48 Demetrios Brinkman

Did you see the meme where somebody was saying, We’ve now started to hire junior engineers to do the work of what our agents can’t do because it’s too costly? And somebody just wrote on top of that, like, great, we’ve now come full circle. You know? And

22:55 Dan

Mm-hmm.

23:10 Angie Jones

AHHH

23:10 Dan

You’ve invent you’ve invented employees again.

23:13 Demetrios Brinkman

Yeah.

100%. But also there is something to be said too when it comes to this whole tokenomics discussion that I think is being left out of the conversation. And that’s you can spend a lot of tokens and get absolutely nothing done. And so like you can’t equate tokens with value at all.

23:39 Angie Jones

Yeah. I’ll tell you, we ⁓ when we first started our ⁓ agentic engineering journey, right? The the goal was to get people using the tools. And we had gotten to like 95% of our engineers were using some coding agent, right? But we didn’t see any velocity at all with, you know.

⁓ features being shipped or PRs or anything like that. And so then I had to dig deeper and like say, okay, how do we actually get like a return on our investment? How do we make our engineers more efficient using these tools? And we were able to, you know, improve that quite a bit. But still you’re absolutely right. I don’t think there’s a guarantee that just because I use these tools, it makes me ⁓ like

It’s gonna increase my velocity at a rate at which you’re spending for it, you know?

24:41 Dan

Is there a difference or what is the difference you think between being able to get value from these versus not at different levels of token use or are there patterns you’ve seen in engineers or organizations where it goes from delivering more velocity or more value to having not done that?

25:12 Angie Jones

I think that we’re going to soon see where this is all going to reverse. So right now, I don’t know if these CEOs have a little group chat or what, where they’re just like kind of, you know, man, you know, my people are doing this. And so they get the other CEO goes back, like, we need to do this. So now they have like these token leaderboards, which I think are absolutely ridiculous. and

So now you’re just kind of token maxing for the sake of token maxing. Again, Bill comes and you realize, ⁓ crap, like this is not great. I think what we’re gonna see is where there’s now gonna be a shift to become more efficient with your token usage. And so we still want the velocity, but we also don’t want you like, I don’t know, using ⁓

⁓ five five like like high reasoning for you know a simple file edit you know what I mean so I think there’ll be this whole wave of figuring out how to ⁓ be more efficient with our tokens while still increasing velocity and I think there’s something to that like no one wants to really invest in training their folks on how

to do this. I I and some of it is like you don’t even know yourself, right? where is the training? But I think like setting what what I did to accomplish this was set timeout. So I I got I formed a group of what I call AI champions, which ⁓ were like fifty engineers and we had thirty five hundred. So it’s a small subset, but

Small but mighty. These fifty engineers represented like our largest code bases, you know, our most ⁓ important ones as far as a business perspective. And so they would dedicate. I got clearance with their managers. I need 30% of their time dedicated to actually exploring the tools, the models, the context engineering, and gentlemen engineering as a whole.

⁓ and then bring those lessons back to their teens. And that paid off quite a bit because now they had the space to do that, right? If you don’t have time and people are just like, I want 20 PRs a day, you know, like when when do I have time to learn how to use the tools efficiently or try new things that I learned at a conference or or saw on Twitter? I don’t have time because I I just gotta crank stuff out, right? And so that’s when you get to a point where it’s slop.

The PRs are just sitting there because nobody has time to review them, you know. So even though you increase velocity, so what is still blocked? You know what I mean? So I think like we need to kind of slow down to speed up and just kind of take a beat and figure out how to use these tools effectively for your use cases and your environment. And then we probably could see the velocity that we’re looking for.

28:25 Demetrios Brinkman

Yeah, I think I’ve seen a lot of people find success when they will front load the hard work and the thinking and the scaffolding and making sure that the environments are set up for success, making sure you’ve got the MCP servers connected and everything that you need to let the agents run. You wanna have that ready. Like i is the are all the naming conventions in the code base and like the code base structure, is that

like all consistent and how can you do the work up front so that when you let the agents wild you actually have the ability to walk away and then come back and and ⁓ I I feel like you

you want to try and think through and give everything you need to the agents or or try and think through as much as you can. You can’t always know what’s going to come around the corner. But if you can do more of that work in the beginning, then it allows for the agents to execute much more on the tail end.

29:41 Dan

I’ve seen I don’t remember who said this, but the difference between vibe coding and agenc engineering is doing this actual design process, either with or without AI tools, but you have to. The engineering is actually figuring out what you want to do.

30:12 Demetrios Brinkman

Exactly.

30:15 Dan

And is there tooling or patterns you’ve seen people use to accelerate how they work with agents or to well say else I’ll say doing the the agentic air agentic engineering instead of vibe coding? What have you seen people do to be more successful in that?

30:42 Angie Jones

Yeah, we adopted RPI, it was just research plan implement from Dexter at Human Layer. ⁓ and we found a lot of success with that. But that was like pre ⁓ the Opus for Seven movement. ⁓ so like we we found great success then. It’s still like pretty good now, but not necessary as much.

when it’s smaller changes. But if you have like a big code change, like I’m trying to refactor this entire code base, or I’m trying to like migrate from this language to another or this framework to another, or you know, like just I don’t know, like a a big huge complex feature, then I still find I’ll reach for ⁓ RPI as a technique, right? And so that’s basically where you have ⁓ the aging research first.

the code base, figure out like what’s possible, what where things are and stuff like that. So you haven’t given it anything to do just yet. You’re just saying, research this area. Like I don’t know

the cart feature. Go research that, right? And so it finds like all of the areas, all of the integration points and stuff like that. Writes that to like a research document. And then in like a fresh session, you would have ⁓ an agent’s ⁓ plan. Now you tell it what you actually want to do, right? If you tell it in the research phase, it’s gonna try to do it. You know, you don’t want to muddy it. So in the plan phase you say, okay, here’s what I want to do. I don’t want to

To implement it, but based on this research, can you now make a plan? And so it then does all of that, writes that to a markdown file, and then you can go into an implement session where it’s like, all right, here’s the plan, just go implement it. So we were filing like really great ⁓ success with that. Like the the quality of the code was what was great, right? So instead of like the agent just like rushing, finding a spot, and like just

throwing doing its business in that spot and and pushing that up. Now we saw where it’s like it accounted for like all of these various integration points and everything like that. ⁓ I was wild the first time I tried that. Like I needed to essentially ⁓ remove this really kind of complex, intertwined feature from the code base. And so I tried it out with that. And this was like

32:52 Dan

Mm-hmm.

33:18 Angie Jones

a PR that touched like thirty two files, right? ⁓ and so when I put that PR up, we got Codex running as the co reviewer. Codex is great at co reviewing by the way. ⁓ and so it always

33:31 Dan

So you were using

Claude for the planning and then codex to review.

33:37 Angie Jones

Well, yeah, I was, but you could have did it with anything, right? It could have been the same model. but then I I had Codex like I it just automatically did the review. And it always finds like really great stuff. Found nothing, just a thumbs up. And I was shocked at a thirty two a thirty two foul PR, like, I was shocked. So

33:57 Demetrios Brinkman

No way.

34:03 Angie Jones

That that’s when I became an RPI believer.

34:05 Demetrios Brinkman

It commented All it left was LGTM

34:11 Dan

Hm. ⁓

34:11 Angie Jones

Yeah,

yeah, basically, like yeah.

34:15 Demetrios Brinkman

That’s amazing. Yeah. Have you seen just a little side note, have either of you played with ⁓ CMOOX, where you can have the two agents or two different models argue with each other or work together?

34:28 Angie Jones

I haven’t done it with CMUX, yeah, I’ve done that kind of adversarial thing. In fact, like I made a skill for reviewing, ⁓ where the the agent will basically call ⁓ a sub agent and that could be with a different model or the same model, whatever. ⁓ but you don’t review the code, like have it review the code, and then you two like talk amongst yourselves. Don’t bother me with he said that I messed this up. Like just go fix it and let me know once y’all

34:54 Dan

Mm-hmm.

34:58 Angie Jones

are on the same page. So that’s worked pretty well. And there’s this other like really cool MCP server that I actually love that ⁓ essentially had it’s a council kind of similar to like Caparthy’s like council ⁓ thing but it’s like a council of of models ⁓ where sometimes when I just need like a lot of different opinions, they’ll you the my orchestrator agent will

use this MCP server. It’ll like tell it by question. And then all of these ⁓ models have a different persona. So one is playing like devil’s advocate. One is like like, you know, the optimist and you know one is the pragmatic one or whatever. And so they kinda like all argue about it and then they vote on each other’s arguments and you can’t vote for yourself. And then it comes up with like ⁓

a result. Like okay, here here is what they all, you know, converged on.

36:00 Dan

Is this an MCP server called console?

36:03 Angie Jones

It’s called Council of Mine. Yeah.

36:05 Dan

cool.

There was some someone I interviewed who said that well he has a question, he consults the Council of Elders, Gemini, Claude, and ⁓ ChatGPT.

36:08 Demetrios Brinkman

Yeah, I

No.

36:19 Angie Jones

ha ha

36:21 Demetrios Brinkman

Yeah.

Nice. Yeah. I was gonna say back to your question, Dan, from earlier on what’s helping have more success. In my experience, I found the it’s similar vein to what Angie was saying, just I think a different implementation is the superpower skills. Shout out to Jesse who created the that skill pack because it’s kind of that same vibe where there’s a brainstorming skill and it will just go back.

Forth with you a bunch. There’s a planning skill, then there’s an implementation skill. And Jesse has done the hard work to create those skills and really optimize them. And then he’s keeping them up to date. So I kind of like outsourced that whole skill and everything to him. ⁓ and then on the other side, once you have, yeah, it that’s the wild thing. Yeah, superpowers.

36:53 Angie Jones

Mm-hmm.

37:12 Dan

Yeah, it’s only getting better in my experience.

37:15 Angie Jones

What’s that one called? Super Superpower? What’s it called?

37:19 Dan

Superpowers.

37:20 Angie Jones

I got I’m I’m I’m late to the party. I gotta get up on that.

37:23 Demetrios Brinkman

Amazing

pack. That one you’re yeah, you’ll like that one, Angie. The and Jesse is a really deep thinker on this stuff. Actually, I interviewed him on

37:26 Angie Jones

Okay.

37:34 Demetrios Brinkman

On my podcast, and he was talking about how the inspiration for the brainstorming skill and the planning skill came from him working with MIT interns. And he was like, These folks, they had so much raw intelligence. They just had all this horsepower, but you really had to just go back and forth with them to make sure they knew what they were building before they went out and built it. And so that was the inspiration behind it. But anyway, I digress. And another one that is

super valuable that I found again not my own but I will happily use it and steal it since it’s out there is from Rob the creator of Broomy and he mentioned that anytime he submits a PR he set up this skill which is like a

Checker. I don’t know what the proper name or terminology is for it. I can’t even remember what I saved it as, but it’s just like basically a verification skill. And what these agents will go and do is take screenshots of so you put the PR, it’ll take a screenshot, and then it’ll write a report as to why this should be like this. So it’ll click on all the links and all the buttons and whatever, take a screenshot, and then each screenshot it will then write a report justify.

verifying why this is correct and if it’s not correct it’ll go and you know do the loop and try and fix it.

39:00 Dan

What is this one called?

39:01 Angie Jones

Whoa.

39:02 Demetrios Brinkman

This one, I’ll I’ll have to give it to you later so you can put it in the show notes. But this is like ⁓ we put it on the MLOps community, because Rob told me this skill when we had our MLOps community coding agent conference, and then we took the transcript of that conference and then had Claude create skills from that live stream. And so then we have it on the MLOps community repo. I’ll I’ll send you a link to it though, because I can’t remember the name exactly.

39:34 Dan

Yeah, there’s another one I I use superpowers, I’m a big fan. And actually, I heard about this from

Hugo Boun Anderson and Thomas Wiki were doing it, interviewed a few folks, and Wes McKinney was talking about one of his projects called Robo Rev. And basically, if you run it in the background, at any time there’s a commit in one of the repos you’ve set it up in, it will kick off a review for that specific commit. Can use different models, but.

I mainly used it with Claude and then had Codecs to do the reviews.

Pretty amazed at what it’s found in a lot of these, even in cases where I’ve made pretty extensive plans up front, it will still find either issues sometimes in the plan or sometimes in the commits as it goes through.

40:39 Demetrios Brinkman

Yeah, the QA agent. It’s crucial. It is b amazing for saving time.

40:48 Dan

Yeah, I think is that something you both have seen this I think of it as the adversarial review pattern.

41:05 Demetrios Brinkman

It’s like go out there, break my stuff, try and find poke holes in what’s going on, make it not work. And I actually just saw a cool project. I can’t remember the name. ⁓ again, apparently that’s a common theme today for me, but it is a

Simulation of cloud environments so that instead of having to push something to production before finding out that oops, we didn’t catch that, you can replicate your whole cloud environment and it will simulate it. I think it’s called like Vera. Somebody can fact check on that. And so that’s another cool way of doing it. there’s a great blog recently from

Davis, he’s a VC at Innovation Endeavors, and he was just talking about all the different ways that agents are different than humans. And one of them, I’m think you’ll appreciate with the data background, Dan, is how agents are much more okay with writing really long and complex queries. But now the the whole argument was like

All of a sudden we went from humans writing one query, getting the data, trying to like

figure something out with it, maybe writing another query to agents fanning out, running a bunch of queries, maybe super complex queries. And some of these queries might be able to piggyback on off of each other, or some of them you already have like a subset, and so you can just run the query inside of the slice of that data. But we don’t have the mechanisms really to do that right now. So again, going back to the tokenomics thing, we’re spending a lot of money that is potentially

For it we don’t need to be.

42:58 Dan

Yeah, and I think also my experience has been especially with data analysis type problems, the specification of the problem in those is often very subtle and not something that’s probably in the training data. So it’s very easy to

this is a problem always that with data driven problem is your code may run, but whether it gives you the right answer is a very different thing. ⁓ and I think agents have in some sense for especially for data driven problems, only made that more challenging.

43:38 Demetrios Brinkman

Yeah.

Yeah, Shreya. Do you Shreya Shankar? She just wrote a great blog post too about how she w was trying to identify her basically what she does as a postdoc and like the ⁓ I think it’s qualitative research that she’s doing. And one of the things

44:04 Dan

Yeah, I’ve seen some of her

work and I th with with Hamal she is does she teach with Hamal I think, yeah.

44:12 Demetrios Brinkman

Yeah, they have a evals course on Maven, I think, or or cohort. I’m not sure but

44:16 Angie Jones

Yeah.

44:19 Demetrios Brinkman

The recent blog that she wrote was really speaking to that point of specifying the problem and how easy it is for the agent to and we see this all over the place, but she was just talking about how she did four or five different ways of trying to tag data and then come to conclusions off of that data. So it was a lot of her specific research was on, and I mean I guess recent.

Research in quotations, but it was hey, people are in this thread, there’s like hundreds of replies to this tweet or ex post, and it was a question about why are you moving off of or away from Claude code? And so generally, the process is that a researcher will go through all of that data and then tag different buckets, ⁓ and then maybe you merge these buckets if

They’re similar enough.

And then you’ll try and form conclusions from all of this and try and say, well, it looks like there’s a lot of folks that are doing it for these reasons, and then other reasons are important, right? So she was she was like, Well, I can identify this. But then she did five different ways of identifying it, very naive to like very sophisticated. And the results were pretty stunning, and she has them all there, and what the output was from the actual LLM.

when you go through the process. And you just see it goes back to our point earlier on how if you front load that design and the work of actually setting up the agents for success, you get much better output. Which I’m is not a novel concept, but it is very very clear in this c context.

46:15 Dan

You still you still have to do the work. It’s just the work the work is different.

46:20 Demetrios Brinkman

Yeah.

46:22 Angie Jones

And I think it’s the work that people don’t necessarily enjoy doing. It’s n it’s the part of the job that you don’t really want to do, you know.