Are We All Managers Now?

Dan, Angie, and Demetrios explore how the latest frontier models have turned software engineering into something closer to managing a team of agents, and why governance, tokenomics, and design now matter more than typing code.
Show notes
Dan, Angie Jones, and Demetrios Brinkman open with a discussion of the Agentic AI Foundation (“AAIF”), founded by Anthropic, OpenAI, and Block in December 2025 and now home to roughly 180 member companies. AAIF recently launched an ambassador program (apply here) and a global slate of events, from AGNTCon in San Jose to gatherings in Amsterdam, India, Tokyo, and Seoul.
A recurring theme is that the whole industry is learning agentic engineering together. So get out of your “lab” and compare notes! You don’t have to do all of this R&D on your own (well, maybe some of it, but it doesn’t hurt to collaborate).
Everything is changing. And quickly. Angie marks the release of Claude Opus 4.5 as the moment the work changed. Where engineers once obsessed over context engineering and priming a repo so an agent had a chance, the latest frontier models often just need to be pointed at a codebase and told the problem.
Drawing on her time leading agentic AI at Block, Angie describes the agent they built that can hold a world model across 25,000 codebases. Pair this agent with cloud workstations where an agent picks up a Jira ticket, clones the repo, and opens a PR without anyone babysitting a terminal.
As they say, more tokens, more problems. And some of these problems less like coding and more like management. Demetrios argues the unglamorous topic of governance — keeping teams aligned, codifying security practices, deciding what belongs in “the harness” — are the new challenges companies are grappling with. Sandboxes and cloud workers have gone mainstream.
The group pushes back on the wave of AI-justified layoffs, worrying that companies are cutting the very mentorship and middle-layer “glue” needed to steer agents. They also dig into tokenomics: budgets blown by mid-year and tools that can cost more than the engineer using them. Angie describes a hard-won lesson at Block - getting 95% of engineers onto coding agents produced no velocity until she sponsored a small group of “AI champions” to learn the tools properly. Tokens, everyone agrees, are not the same as value.
On what actually works, Angie makes the case for RPI (Research, Plan, Implement) from HumanLayer and for adversarial review. A 32-file refactor that earned a clean pass from Codex made her a believer. Alongside review skills, the Council of Nine MCP server, and Jesse Vincent’s Superpowers skill pack; Dan adds Wes McKinney’s RoboRev for continuous background review.
The episode closes on the human side: whether “we’re all managers now,” the identity crisis facing engineers who loved the craft, how Angie found the same flow state building agents that she once found writing code, and how all of this democratizes building for non-engineers. A few quick stops to discuss the token-saving Caveman skill, naming your agents, and a duck-themed calendar app. There’s still no free lunch, Dan notes, but the price has come down. At least until the next model drops.
Chapters
From this episode
We've all become these decentralized R&D departments.
You can spend a lot of tokens and get absolutely nothing done. You can't equate tokens with value at all.
Now I spend more on the tool than the actual engineer that uses the tool.
We need to kind of slow down to speed up.
I do not miss coding, which kind of blows my mind. I just want to get to solutions.
Claude isn't gonna tell you what you have to do next in your career.
You've invented employees again.
Mentioned
- Agentic AI Foundation
- open foundation for agentic AI that both guests work with
- RPI (Research, Plan, Implement)
- research-plan-implement coding workflow from Dexter Horthy at HumanLayer
- Superpowers
- Jesse Vincent's Claude skills framework for structured agentic development
- roborev
- Wes McKinney's continuous background code-review tool for agents
- Council of Mine
- MCP server where AI personas debate and vote on an answer
- Caveman
- skill that makes a model talk like a caveman to save tokens
- cmux
- macOS terminal for running multiple coding agents in parallel
- context rot
- Chroma research on LLMs degrading as input grows; sparked 'caveman rot' idea
- LLM Council (Andrej Karpathy)
- Karpathy's project where multiple models answer and rank each other
- MLOps Community
- Demetrios's community, now the AAIF's official user group
- Davis Treybig
- Innovation Endeavors VC whose blog on how agents differ from humans came up
- Quakpit
- menu-bar app that flies a duck across your screen before meetings
- Flying Toasters (After Dark)
- classic Berkeley Systems screensaver Dan recalls fondly
- Broomy
- Rob Ennals's open-source IDE for working with many agents
Transcript
I’m Dan Gerlanc and welcome to Agents and Engineers. Today we have Angie Jones, who is the VP of the Agentic AI Foundation. Previously, she was VP of Engineering at Block, where she led Agentic AI, Senior Director at Applitools, and a software engineer at Twitter, LexusNexis, Teradata, and IBM.
Welcome Angie. And we also have Demetrios Brinkman, who is the founder of the MLOps community and recently joined the Linux Foundation to represent the user community at the Agentic AI Foundation. Welcome Demetrios. Great to have you both here today. And so I actually
If y’all haven’t seen, I was I think one of the earliest ⁓ guests on in the ML Ops community. I looked back, it was in March of 2020, where I talked about Dask. now we’re talking about AI and agents all the time. ⁓ so I guess first I guess you guys have
are doing a lot of events and also have an AI ambassador program that you have opened up at the Agentic AI Foundation. Do you wanna tell our listeners a little bit about that?
Yeah, ⁓ I can start. So ⁓ yeah, the the Agentic AI Foundation, you know, is it started in December of twenty twenty five. So it’s just roughly six months old. So pretty young, but ⁓ you know, getting pretty mature. Like we have over a hundred and eighty members, which are ⁓ companies.
So ⁓ it was founded by Anthropic Open AI and Block. And ⁓ our companies consist of, you know, ⁓ the likes of Google and and AWS, you know, so ⁓ everyone who ⁓ is shaping ⁓ the you know this agentic AI space is doing so at ⁓ AAIF. So ⁓ yeah we we we
Have an ambassador program because everybody’s trying to figure out how to do this stuff. Like as an industry, we’re trying to learn how to do all of this stuff together, right? It’s a totally new way of doing software engineering. People are building, ⁓ even outside of engineering, they’re building agents to help, you know, operationalize some of their work and and things like that. And so, ⁓ and the ambassador program is a way for those who are really passionate about a genetic AI, who
have figured some things out and wanna you know share that with others and that could be via conference talks or blog posts, workshops, you know, things like that. And so yeah, we launched that yesterday. We had a lot of interest, hundreds of applications came in ⁓ for our initial cohort. So yeah I think this is gonna be really exciting. Demetrius you want to talk about events?
Yeah, and I’ll just dovetail off of the ambassador program. I think it is super cool to bring together folks who are
like knee deep in this and if they’re sharing their knowledge out and it’s almost this way for them to have an excuse to put the time in and learn things or share what they are learning with the greater community. And that’s kind of why I think we had that inspiration for like, don’t let that knowledge just sit there. Let it get out. Let others learn from it or be inspired by it.
And ⁓ as far as events go, we’ve got so many, I don’t even know where to start, but
Yeah,
it is wild. It is truly global when it comes to the events because there’s stuff happening in India in June. Then we’ve got stuff in Shanghai and Tokyo and Seoul. But what’s probably more relevant to the listener right now is there is a San Jose Agent Con that is happening in October. I think it’s October 22nd.
And that’s gonna be a pretty big event. I think we’re expecting a lot of people, we’ll say. Just ⁓ we’ll put it at that. And we also have ⁓ Amsterdam event on September 17th and 18th. That’s like the agent con of Europe. And again, these are the bigger events we’re doing.
citywide meetups too. So it’s not just like these big one off events in different continents, but for anybody that is interested in going to something that’s more regular in your city or in your local geo, we’re doing that too. So that’s kind of like the TLDR of our events.
Yeah, send a few texts. Yeah, and I
think it’s fascinating too because we’ve all become these decentralized RD departments. Everyone now is doing RD when they’re experimenting with the agentic engineering. And so being able to get together and compare notes with these little nodes of the system is very useful.
I’ll include all the links for the applications and the events in the show notes so folks can find those or search the Agentic AI Foundation.
So there’s always a lot to talk about in the world of agents. I mean, one thing I always like to ask folks, especially things or since things have moved really quickly, I think, in the last six months, is what are you doing differently today or see people doing differently today than they’re doing six months ago?
I can start. ⁓ so six months ago was probably right around the time that like the world shifted with the drops of like Opus 47, right? and that really changed the way we were doing things. So when we were doing engineering, just like let’s say seven months ago.
⁓ there was a lot of emphasis on context engineering, on prompting your agent just right, on making sure like you had everything in place in your repo so that the agents could have like the slightest chance of helping you actually do something instead of like crapping out, right?
⁓ and then everything changed with the drop of, you know, the the latest Premier Frontier models. And so the way we do engineering has changed now as well. Like there’s not as much you need to do to kind of get started. You could kind of just point it at your repo, as is tell it like your problem and it just like
does it right and so that’s been interesting ⁓ to kind of watch this shift and also I think that it’s brought in a lot more people in the fold whereas before they were like either one ⁓ this is all hype this stuff sucks when I asked it to do something it doesn’t do well or two if I gotta lay out all this context don’t worry about it I’ll just do it myself right so now I think people who were skeptical
are now giving it another look or you know they have between in these last six months and have realized, ⁓ crap, like this is actually pretty useful, right? And so they are now slowly but surely getting more involved.
And
How does that look? You were in charge of AI at block, so in terms of doing this from a larger corporation within lots of different teams working together versus individual engineers, how have you seen that look or that differ?
Yeah,
what it’s done is unlock a ⁓ a lot of capability, right? So for example, before it’s like every team trying to figure out how to get these agents to work, you know, for for their repo. ⁓ since the models have gotten so much better, now we can kind of move that up a layer and have like common
Agent that can just work across any of these repos. In fact, it can now kind of hold a world model of all of our 25,000 code bases, right? ⁓ as well as customer requests and and scenarios and things like that. And now that’s gotten to the point where everybody doesn’t have to figure out necessarily how to use the agents themselves. We can kind of
Abstract that out and bake it into the tools themselves with skills and things like that. And so now an engineer can just like delegate work to an agent. Like, hey, I’m in my sprint. ⁓ the the agent is now a member of that sprint team. And here’s your your tickets for for this sprint, right? Without babysitting at all. I don’t need to open a terminal. I don’t need to open an IDE. I don’t need to even watch the agent work.
We’ve like created these ⁓ cloud workstations where the agent will just go, you gave it the task, you know, assigned it a a JIRA issue or something like that. It goes off into that cloud workstation, clones the repo, gets the work, puts up a PR, right? And so that’s what this has unlocked for for I think enterprise teams.
Yeah.
I I
I think one thing that you’re seeing too now, as opposed to six months ago, is things becoming more popular because there’s a need for them. So the term, which is like, I joke, it’s the least sexy thing in the world, but governance and how all of our teams are using these agentic engineers. How do get these agentic engineers to work with each other? So if I have a lot of different agents,
For myself, how can I make sure that our team works well together? It’s not just that I’m off on my own with my twenty thousand agents working and burning fifty grand a month on my anthropic bill. Yeah, chilling it. Like, what does that mean for the rest of the team? Yeah.
Yeah.
Kill him.
Exactly. And it’s it’s just it in my opinion, it’s more than hey like an easy way that I think a lot of this or ⁓ the attention is going to well, yeah, now we’re generating all this code and now we have to review this code, and so that’s now the bottleneck. That’s an easy narrative that you hear folks say. I think it’s much more than that because of
Like human dynamics and also things like how are we making sure that our teams are focusing on the right things at the right time and all rowing in the right direction if we’ve got this extra firepower. So governance is one that I think became very popular in the last six months. I also think you hear and there is a lot of conversation now happening around the harness.
And what and what
Do I not do with the harness? Where do I optimize the harness versus what I’m using if I’m using some external tool? What constitutes the harness that I want to use? All of that is come into question now or has come into question. And then along the harness lines, like sandboxes now are huge. And Angie said it, kicking off cloud workers are becoming a common pattern now, which I think successfully.
Six months ago it was like very nascent still.
And I mean there are two things here actually that struck me with what you both said. One, a lot of the challenges almost outside of the coding, the delivery of code. And yet all of a lot of these big tech companies now they’re laying off the managers, they’re removing those layers that direct this.
Do you think we’re actually going and eliminating the part that we actually need more of in this case in order to be able to direct these agents? We actually need humans who can review this part.
Think ⁓ I think we definitely need that layer that and I’m really sensitive to this. ⁓ because I can see where it’s like, ⁓ okay, like man like middle layer management is kind of like a friction point, right? And so you’re slowing us down a bit. But I think some fri friction is necessary, you know, some, maybe not all, right? But
To Demetrius’ point, how do we make sure that we’re steering in the right direction if everybody’s just off doing whatever they want with their five agents or whatever? Like, man, I got an idea, whatever. And as someone who has led engineering teams, I know that not every engineer is necessarily motivated by the customer’s need, right? Some of them are in this for the craft.
And you know, whatever it is what it is. But yeah, I I just like building stuff. Like, so I’m not necessarily in tune with, you know, what’s gonna, you know, sell or you know, what what’s gonna like improve customer satisfaction or things like that, right? And so I think this is where you kind of need someone like making sure that we’re just not I had a great idea, I’m just gonna build this today.
And and ship it and it’s something that no one asked for, right? So I don’t know. What’s your take?
acting.
Right. So this is why it’s like it’s governance stuff. It’s not that new and it’s also not that exciting. ⁓ but it is a huge theme that comes up now that you have almost everyone using some kind of a coding agent at their job. Well, I mean, I say almost everyone, I realize that we are very much inside of a bubble. ⁓ because I have a friend who is also an engineer here in Europe where I live, and just the other day I was like I was talking to him about you.
It’s gonna tell you whatever you wanna hear whatever whatever is gonna
make you smile, which might not like as a people league, I have some pretty tough conversations with folks that they appreciate a lot, right? And that, you know, I’m not you can trust what I say. I’m not blowing smoke up your butt, right? And so, you know, but I’m helping you become a better ⁓ professional. ⁓ you you don’t get that. Like I I get so sick of my agents just like, Yeah, yeah, you’re right.
Ha ha ha.
Ha ha ha.
I think
there’s two pieces to this too. That is one is it feels like a lazy out when folks say, all right, we’re laying off ⁓ X percent of the team because of AI. I think that that is an easy out, and a lot of times there’s more to that story than just AI, quote unquote. And the other thing to this is that
It’s an experiment right now, it feels like, and we’re gonna see in a few years how, or maybe not even a few years, maybe six months, a year from now, how that experiment pans out and if it is actually a good idea.
Yeah.
Yeah.
yeah.
I think people are already seeing that. Like we’ve seen some companies go, like, whoa. Now I’ll tell you, at Block, I was like the one responsible for like buying our tools and things like that. So you have a a budget, which ⁓ is finite. and you know, you you allocate it toward this. And so I had like multi-million dollar deals with, you know, the frontier lab.
For their tooling. The problem is, if my budget is set in January, and now let’s say, you know, March, April, May, some new crazy model has just launched. Every time these models launch and they’re like so much more capable than the last, they’re much more expensive than the last as well, right?
And so you’re seeing now articles from folks like ⁓ Microsoft and other companies who have Uber who have blown through their budget they had allocated for these AI tools. And we’re not even at the halfway point of the year, right? And so I think now the companies start to realize that wait a minute, now I spend more on the tool than the actual engineer that uses the tool.
Yeah.
100%. But also there is something to be said too when it comes to this whole tokenomics discussion that I think is being left out of the conversation. And that’s you can spend a lot of tokens and get absolutely nothing done. And so like you can’t equate tokens with value at all.
Yeah. I’ll tell you, we ⁓ when we first started our ⁓ agentic engineering journey, right? The the goal was to get people using the tools. And we had gotten to like 95% of our engineers were using some coding agent, right? But we didn’t see any velocity at all with, you know.
⁓ features being shipped or PRs or anything like that. And so then I had to dig deeper and like say, okay, how do we actually get like a return on our investment? How do we make our engineers more efficient using these tools? And we were able to, you know, improve that quite a bit. But still you’re absolutely right. I don’t think there’s a guarantee that just because I use these tools, it makes me ⁓ like
It’s gonna increase my velocity at a rate at which you’re spending for it, you know?
I think that we’re going to soon see where this is all going to reverse. So right now, I don’t know if these CEOs have a little group chat or what, where they’re just like kind of, you know, man, you know, my people are doing this. And so they get the other CEO goes back, like, we need to do this. So now they have like these token leaderboards, which I think are absolutely ridiculous. and
So now you’re just kind of token maxing for the sake of token maxing. Again, Bill comes and you realize, ⁓ crap, like this is not great. I think what we’re gonna see is where there’s now gonna be a shift to become more efficient with your token usage. And so we still want the velocity, but we also don’t want you like, I don’t know, using ⁓
⁓ five five like like high reasoning for you know a simple file edit you know what I mean so I think there’ll be this whole wave of figuring out how to ⁓ be more efficient with our tokens while still increasing velocity and I think there’s something to that like no one wants to really invest in training their folks on how
to do this. I I and some of it is like you don’t even know yourself, right? where is the training? But I think like setting what what I did to accomplish this was set timeout. So I I got I formed a group of what I call AI champions, which ⁓ were like fifty engineers and we had thirty five hundred. So it’s a small subset, but
Small but mighty. These fifty engineers represented like our largest code bases, you know, our most ⁓ important ones as far as a business perspective. And so they would dedicate. I got clearance with their managers. I need 30% of their time dedicated to actually exploring the tools, the models, the context engineering, and gentlemen engineering as a whole.
⁓ and then bring those lessons back to their teens. And that paid off quite a bit because now they had the space to do that, right? If you don’t have time and people are just like, I want 20 PRs a day, you know, like when when do I have time to learn how to use the tools efficiently or try new things that I learned at a conference or or saw on Twitter? I don’t have time because I I just gotta crank stuff out, right? And so that’s when you get to a point where it’s slop.
The PRs are just sitting there because nobody has time to review them, you know. So even though you increase velocity, so what is still blocked? You know what I mean? So I think like we need to kind of slow down to speed up and just kind of take a beat and figure out how to use these tools effectively for your use cases and your environment. And then we probably could see the velocity that we’re looking for.
Yeah, I think I’ve seen a lot of people find success when they will front load the hard work and the thinking and the scaffolding and making sure that the environments are set up for success, making sure you’ve got the MCP servers connected and everything that you need to let the agents run. You wanna have that ready. Like i is the are all the naming conventions in the code base and like the code base structure, is that
like all consistent and how can you do the work up front so that when you let the agents wild you actually have the ability to walk away and then come back and and ⁓ I I feel like you
you want to try and think through and give everything you need to the agents or or try and think through as much as you can. You can’t always know what’s going to come around the corner. But if you can do more of that work in the beginning, then it allows for the agents to execute much more on the tail end.
Yeah, we adopted RPI, it was just research plan implement from Dexter at Human Layer. ⁓ and we found a lot of success with that. But that was like pre ⁓ the Opus for Seven movement. ⁓ so like we we found great success then. It’s still like pretty good now, but not necessary as much.
when it’s smaller changes. But if you have like a big code change, like I’m trying to refactor this entire code base, or I’m trying to like migrate from this language to another or this framework to another, or you know, like just I don’t know, like a a big huge complex feature, then I still find I’ll reach for ⁓ RPI as a technique, right? And so that’s basically where you have ⁓ the aging research first.
the code base, figure out like what’s possible, what where things are and stuff like that. So you haven’t given it anything to do just yet. You’re just saying, research this area. Like I don’t know
the cart feature. Go research that, right? And so it finds like all of the areas, all of the integration points and stuff like that. Writes that to like a research document. And then in like a fresh session, you would have ⁓ an agent’s ⁓ plan. Now you tell it what you actually want to do, right? If you tell it in the research phase, it’s gonna try to do it. You know, you don’t want to muddy it. So in the plan phase you say, okay, here’s what I want to do. I don’t want to
To implement it, but based on this research, can you now make a plan? And so it then does all of that, writes that to a markdown file, and then you can go into an implement session where it’s like, all right, here’s the plan, just go implement it. So we were filing like really great ⁓ success with that. Like the the quality of the code was what was great, right? So instead of like the agent just like rushing, finding a spot, and like just
throwing doing its business in that spot and and pushing that up. Now we saw where it’s like it accounted for like all of these various integration points and everything like that. ⁓ I was wild the first time I tried that. Like I needed to essentially ⁓ remove this really kind of complex, intertwined feature from the code base. And so I tried it out with that. And this was like
So you were using
Claude for the planning and then codex to review.
Yeah,
yeah, basically, like yeah.
are on the same page. So that’s worked pretty well. And there’s this other like really cool MCP server that I actually love that ⁓ essentially had it’s a council kind of similar to like Caparthy’s like council ⁓ thing but it’s like a council of of models ⁓ where sometimes when I just need like a lot of different opinions, they’ll you the my orchestrator agent will
use this MCP server. It’ll like tell it by question. And then all of these ⁓ models have a different persona. So one is playing like devil’s advocate. One is like like, you know, the optimist and you know one is the pragmatic one or whatever. And so they kinda like all argue about it and then they vote on each other’s arguments and you can’t vote for yourself. And then it comes up with like ⁓
a result. Like okay, here here is what they all, you know, converged on.
cool.
There was some someone I interviewed who said that well he has a question, he consults the Council of Elders, Gemini, Claude, and ⁓ ChatGPT.
Yeah, I
No.
Ha
ha ha
Yeah.
Nice. Yeah. I was gonna say back to your question, Dan, from earlier on what’s helping have more success. In my experience, I found the it’s similar vein to what Angie was saying, just I think a different implementation is the superpower skills. Shout out to Jesse who created the that skill pack because it’s kind of that same vibe where there’s a brainstorming skill and it will just go back.
Forth with you a bunch. There’s a planning skill, then there’s an implementation skill. And Jesse has done the hard work to create those skills and really optimize them. And then he’s keeping them up to date. So I kind of like outsourced that whole skill and everything to him. ⁓ and then on the other side, once you have, yeah, it that’s the wild thing. Yeah, superpowers.
Amazing
pack. That one you’re yeah, you’ll like that one, Angie. The and Jesse is a really deep thinker on this stuff. Actually, I interviewed him on
On my podcast, and he was talking about how the inspiration for the brainstorming skill and the planning skill came from him working with MIT interns. And he was like, These folks, they had so much raw intelligence. They just had all this horsepower, but you really had to just go back and forth with them to make sure they knew what they were building before they went out and built it. And so that was the inspiration behind it. But anyway, I digress. And another one that is
super valuable that I found again not my own but I will happily use it and steal it since it’s out there is from Rob the creator of Broomy and he mentioned that anytime he submits a PR he set up this skill which is like a
Checker. I don’t know what the proper name or terminology is for it. I can’t even remember what I saved it as, but it’s just like basically a verification skill. And what these agents will go and do is take screenshots of so you put the PR, it’ll take a screenshot, and then it’ll write a report as to why this should be like this. So it’ll click on all the links and all the buttons and whatever, take a screenshot, and then each screenshot it will then write a report justify.
verifying why this is correct and if it’s not correct it’ll go and you know do the loop and try and fix it.
Yeah, there’s another one I I use superpowers, I’m a big fan. And actually, I heard about this from
Hugo Boun Anderson and Thomas Wiki were doing it, interviewed a few folks, and Wes McKinney was talking about one of his projects called Robo Rev. And basically, if you run it in the background, at any time there’s a commit in one of the repos you’ve set it up in, it will kick off a review for that specific commit. Can use different models, but.
I mainly used it with Claude and then had Codecs to do the reviews.
Pretty amazed at what it’s found in a lot of these, even in cases where I’ve made pretty extensive plans up front, it will still find either issues sometimes in the plan or sometimes in the commits as it goes through.
It’s like go out there, break my stuff, try and find poke holes in what’s going on, make it not work. And I actually just saw a cool project. I can’t remember the name. ⁓ again, apparently that’s a common theme today for me, but it is a
Simulation of cloud environments so that instead of having to push something to production before finding out that oops, we didn’t catch that, you can replicate your whole cloud environment and it will simulate it. I think it’s called like Vera. Somebody can fact check on that. And so that’s another cool way of doing it. there’s a great blog recently from
Davis, he’s a VC at Innovation Endeavors, and he was just talking about all the different ways that agents are different than humans. And one of them, I’m think you’ll appreciate with the data background, Dan, is how agents are much more okay with writing really long and complex queries. But now the the whole argument was like
All of a sudden we went from humans writing one query, getting the data, trying to like
figure something out with it, maybe writing another query to agents fanning out, running a bunch of queries, maybe super complex queries. And some of these queries might be able to piggyback on off of each other, or some of them you already have like a subset, and so you can just run the query inside of the slice of that data. But we don’t have the mechanisms really to do that right now. So again, going back to the tokenomics thing, we’re spending a lot of money that is potentially
For it we don’t need to be.
Yeah, and I think also my experience has been especially with data analysis type problems, the specification of the problem in those is often very subtle and not something that’s probably in the training data. So it’s very easy to
this is a problem always that with data driven problem is your code may run, but whether it gives you the right answer is a very different thing. ⁓ and I think agents have in some sense for especially for data driven problems, only made that more challenging.
Yeah.
Yeah, Shreya. Do you Shreya Shankar? She just wrote a great blog post too about how she w was trying to identify her basically what she does as a postdoc and like the ⁓ I think it’s qualitative research that she’s doing. And one of the things
Yeah, I’ve seen some of her
work and I th with with Hamal she is does she teach with Hamal I think, yeah.
The recent blog that she wrote was really speaking to that point of specifying the problem and how easy it is for the agent to and we see this all over the place, but she was just talking about how she did four or five different ways of trying to tag data and then come to conclusions off of that data. So it was a lot of her specific research was on, and I mean I guess recent.
Research in quotations, but it was hey, people are in this thread, there’s like hundreds of replies to this tweet or ex post, and it was a question about why are you moving off of or away from Claude code? And so generally, the process is that a researcher will go through all of that data and then tag different buckets, ⁓ and then maybe you merge these buckets if
They’re similar enough.
And then you’ll try and form conclusions from all of this and try and say, well, it looks like there’s a lot of folks that are doing it for these reasons, and then other reasons are important, right? So she was she was like, Well, I can identify this. But then she did five different ways of identifying it, very naive to like very sophisticated. And the results were pretty stunning, and she has them all there, and what the output was from the actual LLM.
when you go through the process. And you just see it goes back to our point earlier on how if you front load that design and the work of actually setting up the agents for success, you get much better output. Which I’m is not a novel concept, but it is very very clear in this c context.
Yeah.
Mm.
Say
the it’s c it’s not the model’s not more efficient if you say you’re gonna switch to another model.
So outrageous. You can imagine that was somebody in San Francisco that pitched me that idea. ⁓ of course.
I immediately cut a check. You know me too well.
E. N, yeah.
Ha ha ha.
No, that’s a really interesting concept. Like the my annoyance with the models are that they’re too verbose, right? ⁓ and now that you like mention this, I’m thinking like kind of behind the scenes of like how this all works. And maybe a lot of their output is not necessarily just for me, but it’s for the next turn.
when they have to send this back to the the harness needs to send this back to the model, right? And so the model itself needs some kind of context on what have I already learned and like spits all of that out to me. ⁓ and so if if you were caveman it ⁓ and it like it’s just like minimal sentences throughout, I imagine like one, the model probably has to like kind of redo.
a lot of the same research or thinking or whatever ⁓ on reasoning people get really upset when I like say human traits about aging but reasoning about you know things over and over on every turn because it doesn’t have you know a lot of the context from its past term where it already did that reasoning. That would be a really good study. Send it to me if somebody does it. I want to see.
Hmm. Yeah,
exactly. I wanna I wanna see that and hear about it too. And
No, that’s
I agree, that’s in it’s important ‘cause a big part of it I think is it’s building up what parts of the context are important. And if you can eliminate filler words th like the or what how much of that do you actually need?
Yeah.
Exactly. Probably the majority don’t. Yeah, but where does it where does it fall off the cliff? That’s what I wanna know. Like so
Does caveman
only like kinda take out the unnecessary words but is still verbose or is it not verbose at all? Like every response is like a sentence or two?
It’s so funny too when it comes to the
convention where folks will name the their agents. Going back to Ian, Ian was talking about how he names all of his different agents. Like he’s got his technical PM agent named. I can’t remember if it was ⁓ what the name was. But that’s my I don’t know if it’s a pet peeve, but I just feel like it’s a missed opportunity for us to come up with really cool names that aren’t human names that we can call our agents. Like why are we not using R2D2 and C3PO here?
man.
Have y’all seen like the or use the pets feature in codec? Is it in codex or call it which one is it in? It’s something where you can give your agent a pet. I haven’t tried it at all. I haven’t really looked into it. I seen a couple of tweets and I’m just like, what is this foolishness? But I I I’m wondering like what what does that do? Does that make like your agent happier because it has like a dog or something? Like
The
w is that the one where you get a Tamagachi?
And so what does it do?
I did see this. Yeah, to know if your if codex is up or to know if you need like human intervention on your agent.
Nice.
Yeah. I just not
Agent related at all, but I just downloaded the most ridiculous thing that makes me so happy right now as far as software goes. And I don’t know if you saw this yet, but it is a little duck flying a plane with a banner behind it, and it goes across my screen, all of my screen, over top of all of the apps, and it says when I have a meeting coming up in five minutes. So instead of getting like the typical notification from Google Calendar that pops up on the right of your screen.
It is a full-on duck flying across the like top half of my screen. It’s a mini duck. And it makes me so happy. Like it. Yeah. She’s like, I ain’t never been to a meeting with you. You never make the meetings I’m in. Does it actually work now? Yeah, good.
What what is it what is it called? The Tony Soprano
ducks.
It’s pretty important.
Reminds me of the Fly the Flying Toasters of After Dark, the screensavers back in the day.
So one one thing, another thing I like to ask everyone and
This gets back to the job with agents has changed. And it’s different than writing code. It’s kind of more like management. How does that feel, managing agents versus writing the code yourself or working with people or the combination?
Mm-hmm.
So I’ll say I I absolutely loved writing code. Like I was like when it was people like I want to code on the weekends, where it’s like not people bugging me on Slack all day and I can like kinda cozy up, give me a like a cup of tea or something, and like put on my my lo fi music and I’m just all into a zone of coding, right?
⁓ this little game I have on the side. And ⁓ I was like, this would have taken me like a day to build out this feature. And look, I started like an hour ago. And this was like back and forth, refinements and things with the agent. But like this took an hour versus a day, right? And so ⁓ that gives me joy. Like that that makes me happy. But you know, a lot of people
are having like an identity crisis, right? Because and I I will say I think there’s a kind of buckets of folks. Like I was always the person that’s always kind of like in tune with what the customer wants and then I go find joy in building that some people like I don’t care what the customer wants. Well we got customers like who knew they they just love the craft of of of it all so those folks are having a pretty tough time right now.
But you know what, Dan, I that I I found recently. So at the AIF, we’re new, we’re the Agentic Foundation. So of course we’re building as an AI native organization, right? And so I’m like building agents all the time. And I found myself in that flow state that I used to be in in coding with actually building agents. So not
Coding with agents, different where I’m building a agent to like whatever, do some operational tasks, and this thing needs to then be deployed to Demetrios and everybody else on the team, right? And so in that I was in the same like grooves and like ⁓ you know found like those same highs ⁓ and and exercising some I gotta put this together and make this a
But like some of the same things, like where you’re, you know, ensuring like things are decomposed properly. Like that’s not something that I delegate to the agent itself to build. Like I have to think that through, right? And so I found a lot of like my architectural and like ⁓ development muscles coming out then. I gotta flush this out. I’m I’m I’m this gonna be a a new talk. I write.
I liked
it.
⁓ And you also have superpowers
too.
Yeah, I got the superpower skills
too on top of that. So all of that combines to make me feel so amazing, like I’m not working that hard at all, you know, and I’m getting this done and I’m getting like real output. ⁓ so
Yeah, that’s that’s my take on it. I I wasn’t necessarily I do I do feel for the folks who really love coding and loved the craft of it, that just wasn’t me.
Yeah, I think it really democratizes building. Like I have a friend who works at Google and in in marketing and he’s been able to build out automation software and chatbots and he’s not a software engineer. But I think sometimes
that approach solving coming in from we need to solve a problem versus we just need to do something technical
can almost have the most leverage. Not not that you don’t need both in certain cases, but I I kind of like seeing that you’re not
People aren’t just kind of constrained by the the club of you need a software engineer. Now you can actually you can go do it. Maybe you shouldn’t deploy it to a million people on the internet, but to g without having someone review it for security, but you can you you can build your idea.
Yeah, if it’s little small app for yourself,
yeah, why not, right?
Yeah, I had a whole thing on how software was moving local. And we now don’t necessarily have to think as much about this idea of going to production if a lot of the stuff that we’re creating is bespoke for us. ⁓ and even if you take that to the next extreme, it’s like it’s not even going local, it’s just becoming skills.
And that could be the next thing. But I also understand that that is a again, it’s a subset of use cases for a subset of people. It’s not necessarily like you’re still gonna have these big products and projects that need to be battle-hardened and tested for scale.

