From Supervising AI to Building Systems for It
From closely supervising LLMs to building systems that let async, cloud-based agents work unattended, Eleanor explains how her work has changed over the last six months. We discuss the primitives of an agent system, why deterministic verification is a bottleneck, and how software development is becoming a sub-branch of systems engineering.
Show notes
Dan and Eleanor open by discussing how fast software engineering has changed. In the last six months, Eleanor’s practice flipped from treating AI as a messy assistant that needs close supervision to building systems that put the agents on the path to success. She now writes essentially no code herself, arguing that the models have become good enough that her involvement mostly makes the results worse.
This journey starts from babysitting agents locally to delegating to async, cloud-based agents like GitHub Copilot, Cursor, Devin, OpenHands, or Factory. Eleanor warns that the home-grown terminal “loops” everyone is building right now are great for learning but too brittle to scale.
Next up, what does an agent engineering system actually need? Eleanor recommends starting with a sandboxed, execution environment (usually containers), careful configuration over how the agent reaches the outside world (MCP servers and selective network access), a way to see across multiple repositories, and layered rules via AGENTS.md and skills.
Eleanor makes the case that async delegation is a forcing function for better specifications. Deterministic feedback like static analysis and test suites are the single biggest factor in work quality because “you can’t control AI with AI.” She has moved to fully test-driven development and notes that current-generation models no longer find unintended workarounds to tests (e.g., deleting them) the way Claude 4 and early GPT-5 once did.
Dan and Eleanor turn to adoption and skills, including how to get better at using AI with deliberate practice. Eleanor explains why she moved using Python, which she was most familiar with from use over her career, to statically typed languages like TypeScript and Go for agent work, why supply chain risk at her healthcare company has her questioning every dependency, and why she dislikes the term “junior developer.” Curiosity and systems thinking, not tenure, are what matter now.
The episode closes on verification and scale. Eleanor distrusts any output she can’t verify, doesn’t miss hand-writing code, and argues that inventing new ways to verify, including more formal methods, is the real bottleneck now that models are cheap and strong. On team size, she pushes back on the “small teams” consensus, pointing to the success of large open-source communities. Eleanor remarks that software development has become a sub-branch of systems engineering, and anyone not practicing this now will be shocked in a matter of months.
Chapters
From this episode — Eleanor Berger
I don't write any code at all anymore.
You can't control AI with AI. It's this infinite regress where if you start going off track it will just get worse over time.
If something doesn't work, assume that the problem is not that AI doesn't work, but that you're not using it the right way.
Code is the most verifiable thing available, so as soon as I can put something in place — test suites, static analysis, all kinds of rules to verify it — I'm okay to let the AI do its thing unattended.
Software development for people has changed into a sub-branch of systems engineering, where you build systems that allow agents to do this well.
Managing people is definitely more tiring than managing AI agents.
Mentioned
- GitHub Copilot coding agent
- Cloud agent Eleanor credits with leading on mature primitives
- Devin
- Cognition's autonomous AI software engineer, cited as an option
- OpenHands
- Open-source cloud coding-agent platform mentioned as an option
- Factory
- Agent-native development platform Eleanor calls very cool
- Codex
- OpenAI coding agent Eleanor 'more or less lives in'
- Repo Prompt
- Context tool for prompt-building, recently made open source
- AGENTS.md
- Open format for per-repo coding-agent instructions
- Model Context Protocol (MCP)
- How agents access the outside world selectively
- Anthropic 'when AI builds itself'
- Blog arguing AI will soon do the actual ML research
- Lovable
- Product encapsulating best practices for simple builds
- Formal verification
- Approach she expects to expand beyond mathematics
- UML
- Older formal specification approach referenced for verification
- Jimini Health
- Healthcare company where Eleanor is on technical staff
Transcript
I am Dan Gerlanc, and welcome to Agents and Engineers, the podcast about agentic AI and software development. Today we’re joined by Eleanor Berger. She is a member of the technical staff at Jimini Health and president of Agentic Ventures. Previously, she was a principal engineering manager at Microsoft and a staff engineering manager at Google.
She has also written extensively on agentic software development and taught a popular course on Maven on the same. Eleanor, thanks for joining us today.
Yeah,
you could have asked like what am I doing differently now than I’ve done six hours ago because things are moving so fast. But but the big difference from half a year ago is it kind of flipped. So I think up until about half a year ago, really kind of the the the end of last year, the beginning of this one, my AI assisted or agentic based practice was
really anchored on the idea that AI can help. It it augments us. It will do but it can also be a bit messy. It can make mistakes. It can be problematic. So everything we do is around getting some help from the AI and supervising it very closely and staying very involved. And I think it kind of flipped in that the way I see things right now, definitely in my own practice and when I teach and when I work with with people.
is actually AI is going to do much better job at it than I ever will. It just got so good. it’s better if I build a system that allows agents to do their work with enough context, with enough guardrails, kind of boxing them in. And and then get out of the way because I’m not I’m only gonna make it worse.
⁓ and so that that’s a huge difference and it translates to to how I do things. So I I don’t write any code at all anymore. Well, maybe it’s like ninety-nine percent true. Maybe here and there I ⁓ I still write like a line or something like this, but and I try to build systems that allow agents to do the work and I try to help people get on board on this, which is not
very easy because it’s it’s counterintuitive, especially if you got to know AI like a year ago or even a bit later. That’s a huge change.
I think the best way to do this is to use one of the cloud based ⁓ sort of software as a service agent. There are quite a few that are really good. I really like the one from GitHub Copilot. I think in many ways they’ve been leading the way, they were their first and have a very mature and very rich ⁓ set of of primitives you can work with and really nice integration into GitHub.
But there’s also Cursor now has good facilities around this. There’s products like Devin and OpenHands and Factory, which is very cool. they’re all good, but I think what’s important is to kind of to start with the idea that you’re going to delegate and everything is going to work in the cloud, and that it’s not about developing on your laptop with
⁓ with an agent and maybe bouncing some things to the cloud, which is it seems like where Claude and Codex are still at currently.
I actually see more of that than I think ⁓ is advisable. I think it’s often easier for people and that that’s with every technology, like I’ve seen it with with moving to the cloud and and other kind of new technologies when people don’t ⁓ sort of familiarize themselves with new systems. It’s easier to build something because the you you you have to understand the thing that you’re building. So I see a lot of people building these
loops, right? Loop is the keyword of the week. Everyone’s talking about it now. and they’re kind of orchestrating various things ⁓ through the terminal. And this is all cool and I think it’s good for learning about how to work with these systems. I don’t think for ⁓ sort of scalable ⁓ practice this this is a good idea because it’s just brittle. So
Actually I’m seeing a lot of that and I wish more people realize that there are good products out there that you can use that that give you the sort of the Lego blocks for building a system with.
So you need an execution environment that’s controlled and configurable. ⁓ often this is containers with some definition, ⁓ so that you can set everything up for the agent to work in. You need some way to ⁓ control how the agent accesses the rest of the world, some of it is MCP servers or things like this. ⁓ at the times you wanna make network calls, but you need to be
very selective, very careful about what’s allowed. In many cases you’re not working just in a single repository. So you want to have some way to understand the larger world. Like, I don’t know, you’re developing some service and it’s calling three other services that are defined in different repositories and you want to have some way to to look into them so the agent knows what it what to do. And of course
rules like AGENTS.md skills, all these things. So you want an environment that gives you the ability to configure this easily at at every level, kind of organization, project, repository, and task. And then you want ways to trigger tasks, to to monitor them, to read the output later because you you want to learn from it. And so any system that gives you that
will work and you can build it yourself, but the best systems already have all of that available.
And what do you think in terms of difference in productivity versus running this locally, someone who’s using two or three Claude or Codex sessions versus running a remote environment like this? What from a kind of scale perspective have you seen what kind of difference have you seen in terms of what
Folks can get done.
Mm-hmm.
Mm-hmm.
I don’t know, you need to restart your laptop all of a sudden. It’s ⁓ quite stressful. but being able to consistently delegate tasks and have them work in the background, you can actually achieve more. Because I think when you run things locally, and this is more about the human mind, not so much about AI agents, you never really allow yourself to forget about something. It’s there and you’re kind of babysitting it.
The other thing that’s relevant is when you work in a system where you’re primarily delegating tasks, it forces you to be quite disciplined about how you delegate the tasks. And what I see, and I observe it in myself, I see it in so many people, when you work locally, you’re always tempted to kind of under specify and then chase with another prompt and
And trying to repair things midway, which which you’re just not going to do. It’s not possible, or sometimes it is possible, but it’s just not part of the of of the habit when you work with an async system. And so when you work asynchronously, you get used to the idea that you’ll write a very good specification and and you’ll take your time to to really work your way through it and iterate maybe also with an agent locally. And
then this is it. And if it worked, that’s great. And if it didn’t work, your basic assumption is you did something wrong in your specification. It’s a learning opportunity and you’ll do it better. And after a while you get really good at it. And that tends to result in work that that is better, that is better planned, that is better specified. And so I think even just as a forcing function, this is an advantage you get from working with
async systems. And then of course there’s just the orchestration. You know, it’s nice to do things in the cloud and not having to run them on your laptop. There’s someone is taking, you know, their SREs are taking care of keeping their container running. You don’t have to to deal with that.
So nowadays I mostly just use a local agent. I more or less live in Codex. I really love their app and I love working with GPT 5.5 I think it’s such an amazing model. And so I’ll I’ll just kind of work in a markdown document and and iterate over it. I’ve used Repo prompt, which by the way just became open source after ⁓ the developer got acquired by OpenAI. So that’s that’s nice news if you’re
waiting to use this tool and were worried about price, now it’s free. And it’s a nice tool because it’s really designed just for that, right? It’s like exploring a code base and helping you develop a prompt. ⁓ so that’s quite nice. But I think today you can just work with whatever agent you’re using, Claude or Codex, and iterate in a markdown document or a collection of markdown documents un until you’re happy with it and and think it’s ready.
And in terms of keeping the agents ⁓ with straight and narrow, so to speak, when they’re working with one repo or multiple repos, is there tooling or conventions you like to use to make that easier, like ways of specifying documentation or agent MD files?
Some some mix of all of the above.
Yeah, so definitely agents agents.md and having one for the repo that that gives enough hints to the agent. Skills are great, just because they’re they’re loaded on demand and so you can include quite a lot of stuff in them. but then I think it’s also really important to pay attention to what you can do, which is deterministic.
‘Cause the problem is you you can’t control AI with AI, right? It’s like this infinite regress where if you start going off track it will just get worse over time. Being able to use, you know, static analysis and test suites and checker scripts and things like this that are available to the agent to run in the repository, I think is the biggest ⁓ kind of factor on the quality of work.
and efficiency of work you’re going to get from an agent. This is becoming really important because the models are great. They’re they’re just getting better and better and they can interpret so much and explore your repository but without getting unambiguous deterministic feedback, they can still ⁓ do weird stuff and and make mistakes. And once they make a mistake they’ll they’ll just make it worse trying to chase it with fixes.
Basically I moved to like hundred percent test driven. I I I used to to be a bit lazy with testing and kind of thinking, okay, for simple things I won’t bother, you know, while I was writing the code because it’s it can be a lot of extra effort. Now everything is test driven and I try to do everything kind of red green, four stage and into a workflow where it needs to
first work on the tests and actually observe them failing and then not touch the tests at all and do the implementation and it really helps I find.
Yeah.
Bless you.
Plus yeah.
Yeah, I’ve seen some people s say where they’re the model’s gone off the rails and deleted all the tests or things like that or other things. I haven’t really seen too much of that, but I guess anything’s possible.
Yeah, I think it used to be a problem, like up until the the kind of previous generation of models. Definitely if you looked at things if you worked with like Claude 4 or the early versions of GPT 5, they would go off the rails. They’ll do crazy things and they’ll be like, I disabled the tests to make things work.
But I I don’t see this happening anymore. That’s and and so I think if people played a little bit, tried a little bit, kind of working with AI agents for coding maybe a year ago or even a bit later and they formed this opinion that these models are not to be trusted and they’re not that useful beyond like a local edit.
Try again. It it really improved a lot.
Do you find there’s a distribution of engineers who are still the stuff doesn’t work or isn’t useful or what has been your experience and how people are using how you’re seeing people use it today is ever
I sometimes I think, well, I talk to lots of people who are doing this, so I wonder do I have a biased view or is there still a lot of people out there who don’t believe this stuff works?
Yeah, I see everything from, you know, kind of in my filter bubble, everyone is completely pilled and and and following every change and then the people I work with, ⁓ they see it more like as another tool in their toolbox. And I also know people who are still denying that this stuff is at all useful. ⁓ I think there’s also a lot of kind of
almost covert use of AI. So you you talk to a lot of people and and you’ll realize they’re using it, but they don’t think anyone else should. They’ll be like, no, that that’s unreliable. Don’t let these other people use AI. They’ll they’ll make a mess of everything. But then of course they are. ⁓ what especially I see, which I think we will need all of us to kind of mature as in as an industry.
is a lot of people they make superficial use of AI. So they have like an IDE like cursor or Visual Studio Code with an AI plugin. or they’re using Claude Code or something like this. And it’s all very interactive, kind of borderline vibe coding. And it helps them a little bit and it also sometimes creates a mess. But they underinvest in, you know, creating the environment with rules and skills and verification mechanisms.
And that’s unfortunate. I think that’s the most important thing for people to to learn a bit more about and and to to get proficient in because yeah, if you if you’re kind of using the bare tool and you’re vibecoding with it, it can be a bit messy and and also very inefficient.
kind of use it in this way, I guess what what kind of athletes call deliberate practice. Like don’t just use it and form an opinion, but try. And if something doesn’t work, assume that the problem is not that AI doesn’t work, but that you’re not using it the right way. And take a step back and figure out what you need to improve. Like I tried to do something, the model went off the rails.
Okay, did I need to have more context? Is there a rule that was missing? ⁓ was there a verification mechanism? What happens if I try the same thing but now with a test? What happens if I try the same thing but I specify, I prompt a bit better and I specify more clearly what I wanted? And if you do this enough, you kind of develop good intuitions for what is going to work.
When you’re starting out with a new task, d is there a progression you’ll go through like start with a skill and then formalize it more, or if you needed to set it up to kind of work then into a workflow or things like that, or then you setting up an agent if you need to, how do you
Is there kind of a s standard way you approach that or does it vary based on what you’re doing?
It really depends on the context. So there are things that I do for my own kind of personal use and hobby projects. And there I have to I have to say I really let go and that’s that’s quite new. That’s something of the last maybe couple of months. I started using Hermes as kind of a main driver and it sort of goes and does things on its own and and I trust it enough by now. Things that I do
for kind of more serious work, I I try to be a lot more deliberate and ⁓ and controlling and to manage things as I do any software project with like versioning and ⁓ and reviewing everything. What I tend to do a lot these days is I’ll work with the agent and I’ll allow it to do like a first version of something. So for example, we’ll work on something and I’ll be like, okay, this
This thing you did here, that’s quite useful. Can you make a skill of it? So they can reuse it. And it will create a skill completely unguided, just based on the interaction. And then I’ll go and take a look at it. Usually it’s kind of eighty percent there ‘cause it does a good job, but it’s it’s it’s not polished, it’s not perfect, so I’ll iterate over it and some things I’ll just tell the agent go and modify, some things I’ll even
go and edit myself and I’ll start testing it and I’ll then start versioning it, becoming like like a software module that I’ll track in Git. And so a lot of things start like this. They start with some experiment and then they mature into like a software module, like a library.
Yeah, I rarely do. Sometimes
sometimes I do. kind of tests and evolves. ⁓ in many cases, yeah, that that’s high overhead to do it well and and there’s a bit of a problem in doing it not well because then you’re fooling yourself that that you have more more real feedback than you actually do. ⁓ but often I’ll just smoke test it. So I’ll I’ll create a version and I’ll try it out and usually
if it’s something I do often I’ll have ⁓ some kind of exercises in mind of what’s worth checking. So this I do quite a lot.
I think it’s the opposite. You don’t have to use what you know anymore because like you don’t have to write the code and increasingly you don’t even have to read it. I think what in my own practice I moved from being almost entirely Python, like I’m talking like decades of a career as a as a Python-first developer
To realizing that yeah, no, I still like Python I think for for some things, partly because it has such a great standard library and it’s so versatile. But for most projects it seems to be ⁓ advantageous to work with a statically typed language. And so for web and things like this I use TypeScript a lot. ⁓ I really don’t like TypeScript. Like it it it
Like my eyes bleed every time I read it and I don’t wanna write any of it myself. I don’t like typing so much. I think it makes code hard to write and hard to read. But it’s really good for agents. So and and it also has a good ecosystem. I think Go is really is really ⁓ a good choice and it kind of made a comeback for me because I used
I used it quite a lot in the past and then I moved away using Rust more for when I need this kind of close to the metal engineering. But Go, just because like it compiles so fast and it’s so easy to work and iterate on things and you still get the advantage of ⁓ static typing. I think is a great language for working with agents, especially if you build like a small service that you just need to get done and test well and and get rid of then.
⁓ but the truth is anything works. Like as long as your language is not too esoteric, if if it’s not very well known you might need to provide some additional support for the agent to work with the language, but most of the popular languages they they just work
Yeah, I’ve seen a lot of folks moving towards using Go on the backend side. I think also part of having a lot of the standard library built in, you don’t have to worry about some of the supply chain issues, things like that with node tool chains or things like that. So I wonder as well as if that becomes a reason people go to it.
Just to minimize the surface as ⁓ time goes on.
Well this is this is like I have no idea where things are going. Currently it occupies ⁓ pff it would be embarrassing to say how much of my time, partly because I work like at the healthcare company where we we are very sensitive to this and we need to respond immediately to any vulnerability and it’s becoming ⁓ a serious concern.
And so I’m starting to think like, ⁓ do we really need to import this library? I mean it’s just as this simple thing that we could program and not be exposed. And I think a lot of people are starting to think that way.
Yeah, maybe in the future instead of distributing libraries, we’ll be distributing specifications kind of I need something that conforms to this PEP or RFC or whatever it is you’re tracking and your agent will implement it and then you then have don’t have to worry about you know someone poisoning the distribution. Of course then you’ll have to worry about the specification. You have to read it carefully to make sure it doesn’t say
and send all the private information back to to this server.
I don’t know. I I I find the term junior developer really awkward actually. I I don’t like it that much because I don’t think it’s about tenure. Like I worked with people who are complete new entrants to to the industry but are so motivated and so curious they’ll very quickly learn really a lot. And I think these people it’s it’s their moment, right? ‘Cause you can learn so much and it can really ⁓
using AI you can sort of do things that maybe in the past you needed like three to five years to get into. Maybe now you can do in a few months. And then of course there are people who expect to do you know, to be told what to do and then go and write code to spec. And the truth is, yeah, I don’t I don’t think it’s gonna be very interesting to these people because the AI actually does a better job now. It’s not that they AI
can replace them. It’s like we should replace them with AI because the code is better. and so it’s I think it’s less about tenure and more about are you approaching it from a place of like curiosity and and thinking about the bigger picture or are you just trying to accomplish a very specific technical task and in programming and increasingly in other parts of the work if you just
Doing work to spec. Yeah, sorry, AI does that better.
Yeah, I fully agree. Tenure is generally not a good proxy for skill in software engineering. Yeah.
So do you think this skill set is really being able to explain what we’re doing more than writing the actual code going forward? Like what are there still places where having that
classical software engineering experience comes more in handy.
I think it’s definitely narrowing the the the band of like these things where you still want someone to handcraft the code or to know the the specific details. It will always exist, it’s just becoming less and less. Yeah, I mean, we just had this ⁓ the other day, this ⁓ blog published by Anthropic where they said, Well kind of doing the actual ML, the research
That’s around the corner. We’ll we’re we’re gonna get AI doing this. And OpenAI are now saying the same. And this was always the the kind of the premier example for something where we’ll still need expert engineers. So who knows? Is it a year? Is it two years? Is it five years? At some point even that is gone. What is really important now is systems thinking, kind of thinking about a big b bigger picture, how does it all fit together, how does it
fulfill the the business needs and and what we’re trying to achieve. And people who are good at that, translating, you know, that meeting they just had with the with the designer or with the customer and translating it into a specification that can be carried out by an agent, that’s an amazing skill that’s not going away that soon. I’m sure at some point you’ll be able to just feed the the meeting transcript to an AI, but we’re not there yet.
And I think one thing, and this is where maybe a little bit more experience helps people, just knowing what’s possible. ‘Cause I think what gives me an advantage. I’m I’m no better than anyone else in getting AI to write the code for me because it’s the same AI. But I often know what to ask for because I know that it’s possible. And when I work with people who are less experienced, sometimes they just don’t know that it is something you can do.
or that there are different variants and you could ask for, yeah, do it with this language or with that language. Use this library or that technique. This really helps a lot. And so I think people who know at the very least how to search for these things, do a little bit of research, they they they can get better and and more results.
I don’t know, that’s r that’s really interesting to think about. There’s some some aspect where I think
Being aware of like the legacy, everything that’s out there, I’m pretty sure is going to become less and less relevant for people because you can index these things and they’ll be even today, right? We have the beginning of things like this when you look at systems like Lovable or the stuff Vercel are doing, where okay, at least the very simple stuff you don’t n actually need to know because the best practices are encapsulated in some product.
And it’s very simple stuff, but probably there’ll be more and more of that. So I don’t think today if I need to build like a complex cloud based distributed system, there’s anything where I can go and it already encapsulates all the best practices. Here I’m still relying on my own experience, but I’m sure someone will will do it sooner or later. And it’s probably sooner rather than
Then there’s the thing where creativity, kind of being inventive, coming up with new ideas, is often based on knowing what’s already there, right? Kind of knowing the rules to to break the rules or to invent new ones. And I don’t know how how long it’s going to take until AI does this better than us. But for now I think that’s there is advantage in just having this ⁓
this wide palette of of different things you can work in to do something new.
Yeah, I s sometimes wonder if the place where the human knowledge becomes strongest is where there’s just inherently not a lot of data that you can train an AI AI on because it’s new or you just don’t have there’re some things where right, you can’t should I start this business or another business? You might know.
inherently based on something you’ve seen or think, but that information just isn’t out there and that’s always a place where it’s like with traditional machine learning, if you don’t have a lot of data going into it, you probably can’t make a great model.
True. And at the same time, I think it’s important to recognize that it is the AI we have now is generative and and can come up with novel solutions and novel data. I’m trying to be very careful, both with myself and also when I talk to other people, and not broadcast this kind of cope of there’s there’s something unique that us humans do that will never be replaced.
‘Cause I don’t know if that’s true. I suspect it isn’t actually, but we don’t know yet.
I mean anywhere where I can’t verify the output.
I don’t trust it fully. I sh you shouldn’t, right? Th that’s true also for humans, right? So if I if I delegate a task to a human and I have no way of verifying it I’ll I’ll probably go and and sort of go over it myself. The same with AI. And so the in tasks like
Yeah, where need taste where you you’re where you can’t run run a script and verify something. So that I have by now a lot of tasks where AI is writing stuff, like, you know, not novels or poetry, but like documentation or things like this. And that’s great, it’s very helpful. My d my projects never had so much great up to date documentation as they do in the last few months.
‘Cause I just get the AI to work in the background and improve the documentation. But I still find that I I wanna go and take a look and sometimes retouch it. Whereas with code, I mean code is the most verifiable thing available and so I can as soon as I can put something in place, like test suites, static analysis, all kind of rules to verify it, I’m okay to to to let go and let it let the AI do its thing unattended.
No, not really. It’s kind of it’s kind of funny. I did it most of my life. Like that’s what I did. I know I think some people report that they enjoy the activity itself. That is I guess like some people like knitting or that there’s like there’s something soothing about the experience you get in the zone. It was never like that for me. It was always like a means to an end.
So I’m actually quite relieved that I don’t have to do it and I enjoy building things. I enjoy kind of this process of getting from I have an idea to here’s how I’m going to do it to cool, it’s done. I’m so happy, let’s move on to the next thing. And the fact that this path doesn’t pass with me typing character by character into a text editor, no, I’m I’m fine with that.
Well, the question is what is the base like where’s the baseline? What am I comparing to? So previously, like the the previous decade, well I had a a short stint in in consulting, which is also very tiring in a way, because you have to talk to people a lot. Before this I was a manager for like a decade. Yeah, managing people is definitely more tiring than managing AI agents ‘cause
people they have motivations and all kind of complex thoughts and feelings and ideas and you have to actually be nice to them, otherwise they really don’t like you. ⁓ and it’s very tiring. Agents are a piece of cake in comparison. then in comparison to working kind of editing file by file, I I guess it can be but you need to keep a lot of state in your head, which is again another reason why
I prefer to create a system where I can fully delegate something and forget about it ‘cause if I’m running it locally in my I don’t know, my my codex or something like this, it’s it doesn’t really leave my mind. It it’s still there and this can be a bit exhausting. And the other thing is it’s just I’m very curious about things. I want to learn everything new and it’s just been crazy, right? So I spend a lot of time
Reading about new things and trying new tools and new techniques. ⁓ so yeah, that’s that’s a lot of work, but no, I’m I’m really enjoying it so I can’t complain.
Yeah, I just sort of
I think there’s like this multi level filtering, so I’ll I’m trying to be quite good at this, for example, not like pulling on every thread because that’s not sustainable. So I’ll read X or or some other things, some blogs and I’ll look at things and like ninety percent of this will be like okay, like I can’t evaluate this fully but
Some heuristic that I can’t completely understand in my mind tells me, Don’t worry about it, move on. And then there’ll be a few things that I think, yeah, that looks like I probably should look into this and I’ll I’ll save them for later. And I have like a collection of things and I’ll go once or twice a day and review that list. And again, probably like ninety percent of that will drop after a short review. And I’ll still have after this, like
a dozen things to to look into where I’ll actually and now I have also AI doing like summarizing and things like this. So I have summaries of everything working in the background is my Hermes agent. Often I’ll kind of first read the summary and then decide if I’m continuing to invest more time. And then if I think that it’s really interesting, I’ll give it a try myself. I think that it’s very hard to learn about these things without gaining
direct experience. So if it’s like a new technique someone is proposing, I’ll be like, okay, what’s the most minimal exercise I can do now to try this technique and see if it works for me? Or if it’s like a new tool, I’ll download it or go to the website or whatever and give it a try for ten, fifteen minutes just to get an idea of what it is I’m dealing with.
It the main thing that I’m getting out of it is working with students because they ask questions. Often I have really terrible blind spots around what’s really a fundamental question. Like it seems obvious to me and then we’ll we’ll do office hours or someone will respond to the blog or and we have like a Discord server, people ask questions and be like
Wow, yeah, actually that’s a really interesting question that I haven’t properly thought about. Let’s let’s work for it together because now I owe these people an answer. To me, that’s the most valuable thing about teaching, except for the fact that it’s it’s fun and fulfilling, but the questions I get are are gold.
I think a really interesting set of questions. More and more I’ve been talking to people who are not quite doing just software development, but are doing things that are adjacent, like ⁓ data analysis or all kind of planning and specification work that kind of lives ⁓ in itself, not just to to do the next ticket, but but to plan a larger project. And that’s something I haven’t thought about enough.
till that happened and and really caused me to to think and try to exercise it myself and see what are ways to do it. And it it’s actually still quite challenging, exactly because we don’t have good verification mechanisms like we do for code. So it’s very hard to say, okay, my agent wrote or fleshed out a specification. How do I know that the specification is any good? There isn’t
really good understanding of how to do it. So I think that’s a very interesting problem that’s still unsolved, but I’m interested in it right now.
Yeah, certainly, because with data science code a lot of times you don’t know the answer going into it. So you could get something out that the code may run, but doesn’t mean that it answered the question that you wanted it to answer, which is a problem in building software too, but there’s sometimes
One number versus another number can be a little more subtle than is this feature implemented or or not.
Yes, for sure. But it’s different in software like in in code because
I mean in a way, if it compiles it works. But may maybe the logic is all wrong, it’s not doing what you wanted. But there is like you can get pretty close to verification, right? You could have one of these I don’t know, there used to be these things like UML ⁓ diagrams or specification languages that are formal. but what if you’re yeah, describing a product that I we don’t know how to verify that. That that’s a really, really interesting problem.
Yeah, it’s the most important thing because the models you have today and even previous generations you can you know, you can work quite efficiently and more cheaply with models that are not as strong, as long as you have verification because they’re kind of then you know, you just you just need to to box them in and let them bounce against the walls and eventually they’ll get it right. If you don’t have verification
that is unambiguous and deterministic, you can’t do that. And who knows where the the agent is going to run to and what weird ideas is going to come up with in one of its reasoning traces. So yeah, that’s everything we can verify will get a lot better. ⁓ and I guess that means a lot of the work now should should be around inventing new ways to verify, figuring out new ways to verify.
Yeah, I mean even depends what you call evals, but evals there is something that’s a bit subjective. And so they’re already kind of ⁓ a secondary tool. Sometimes is the only thing you can use. But also formal verification. there’s probably a lot that can be formally verified, which we haven’t tried yet because it was something that only really belonged in ⁓
quite narrowly in mathematics and and and in general kind of ⁓ the the exact sciences. But there’s probably more we can do. Like I think we will need a lot more kind of formal, unambiguous ways to verify things.
Yeah, I know I’ve seen for parts of S3 or parts of database systems that formal verification has been used in some of those, but outside those cases are very narrowly defined high impact cases that not as much effort has been put in on the software side.
And in terms of team size, do you think that
With agent development, it’s better to have a smaller team these days or the ideal team size has decreased or do you think it stays the same and it just changes the scope in terms of what people could work on?
Yeah, I I have opinions that I think many people consider radical. I don’t because they’re my opinions. I think they’re very straightforward. ⁓ but ⁓ a lot of the work I’ve done earlier in my career was ⁓ around open source communities. I worked as part of the Ubuntu community for for a while and and and others. And I’ve seen how huge amorphous teams work really well.
And I don’t think that’s changed. In fact, I think AI enables more of that because you can automate things and you can help people achieve more consistent results. And so I don’t see I hear a lot of people talking about the need to work in smaller teams. I don’t know. I think you can work probably in teams of like five hundred people now. Because AI will help you communicate, they’ll how AI will help you kind of make the work more consistent.
And and you see that in open source projects. You just don’t see that in in the kind of teams that huddle together to but probably for designing stuff, ⁓ for thinking out new ideas, yeah, like a a handful of people is enough. Beyond that just gets inefficient. But for actual software development, I don’t think there’s a limit. It’s just that I don’t know if you need that many people anymore because increasingly every person can
delegates to so many agents.
times where they’re getting thousands of of pull requests like per week and so they went w went all the way with it and it seems to be working fine. Well it’s like a very healthy, very healthy project, moving really fast. there’s probably more I don’t know. I think the lesson that I’m learning from from this is you can’t you can’t go halfway.
When people try to force kind of using AI into all the processes, it becomes very frustrating, right? So for example, if you have an open source project where you go and open your GitHub every day and review every pull request one by one, kind of reading through the entire diff, it’s gonna be very frustrating. I know it from my own open source projects, like all of a sudden you start getting such a high volume of pull requests, some of them are low.
quality, some of them are maybe they’re good, but they’re just not something that was in the roadmap and I don’t want to integrate. And so if you try to do it like that, it’s gonna be frustrating. If you accept that, yeah, things have changed now and our roles as as developers, as engineers, our role is creating a system that can manage this, not managing it ourselves, then it can actually be ⁓
much more efficient, much more effective, get better results.
Well I don’t know so
I’m definitely not the only one, but I think it’s still the case that way too many people don’t take agents seriously and don’t see what’s possible and where we are and where things are going.
I really think that software engineering, software development for people has changed into like a a sub branch of systems engineering where you build systems that allow agents to do this well.
I think anyone who doesn’t see that and isn’t investing now in learning about this, in exercising, practicing this, figuring out new ways to do it, they’re going to to be quite shocked in not not a matter of years, but a matter of months, if not weeks, ‘cause things are moving really fast. So I think that’s really important. Some people I tell this to, it sounds, yeah, pretty trivial.
‘cause they already seen that. Many people ⁓ feel that this is a very radical way of seeing things. I don’t think it is. I think you should find out for yourself if if you don’t believe me, by by actually trying to kind of put on these glasses and and and looking at the world for this perspective for a little bit.

