Episode 005 · July 2, 2026 · 01:07:43

When Software Gets Cheap, Focus Gets Expensive

Dan Gerlanc
Dan Gerlanc
Podcast Host
Greg Ceccarelli
Greg Ceccarelli
Co-Founder & CPO, SpecStory

Dan and Greg discuss the evolution of agentic engineering and the biggest challenges teams face as they become agent-native. Learn about Greg’s new book, “25 Patterns in Agentic Engineering,” Greg’s preference for local agents, and why writing copy and judging product quality remain difficult to delegate.

0:00 −01:07:43

Dan and Greg open with how agentic development has changed since the early days of Copilot. At the time, Greg was at GitHub, and he saw AI mostly help with boilerplate and editor completions. Cursor-style agents were the next widely-used advancement bringing session history and integrated team-wide practices. By June 2026, capable models and harnesses are common inside engineering teams, so the gap between teams increasingly comes from context engineering, repository structure, and whether old team shapes still align with the new ways of building software.

For small teams and startups, the leverage of AI is a double-edged sword. Greg describes how SpecStory’s original extensions required real sweat equity to reverse engineer chat-log formats across Cursor, Copilot, Claude Code, Amp, and other tools. Now, much of that surface can now be maintained by a fraction of one person’s time. The danger is that easy MVPs can trick founders into believing they have validated a market. When the marginal cost of software falls, founders have to spend more of their scarce attention on demand, willingness to pay, distribution, and the routes to customers.

The conversation turns to Greg’s book, 25 Patterns in Agentic Engineering. He explains how he mined roughly 1,300 preserved SpecStory sessions and nearly 5,000 commits to extract durable patterns from his own agentic practice. Two patterns stand out. First, when code becomes free, verification becomes the bottleneck. Second, between agents turns, docs are the persistent API of the system. For Greg, as-built architecture documents are practical maps that let both humans and agents recover the shape of a subsystem without re-reading the entire codebase every time.

Greg’s development practice has changed accordingly. He favors trunk-based development and says his team uses almost no pull requests for everyday development, partly because agent-generated diffs arrive at a volume he does not want to review line by line. He prefers local agents over cloud agents that containerize the repo and open PRs later, because steering an agent while it runs keeps his mental model intact. Long unattended runs still make sense to him, but only when they start from a clear goal and a more detailed rider, with phased commits and verification points he can inspect after a walk or a night away.

Dan and Greg also dig into coordination at larger scale. Greg is skeptical that issue trackers were ever clean or current enough to describe day-to-day engineering, but he sees issues becoming useful as specs with provenance and evidence that can be handed to agents. Personally, he runs several projects at once, usually three to five, with local agents in permissive modes, and rotates attention while long runs execute. That power is not free. He describes the dopamine loop of watching ideas come to life, the temptation to keep agents busy overnight, and the scarcity mindset created by subsidized access to frontier models.

The episode closes with where Greg still does not trust the tools. Copywriting and visual design still require heavy human intervention because the models can blur rather than sharpen the message. He frames taste less as a mystical trait and more a selection amongst trade-offs and the ability to connect ideas in understandable ways. Coding has benefited from benchmarks and verifiable answers; much of the rest of the world is less tractable because there is no single ground truth for what “good” means.

The most experienced engineers realize that code is in fact a liability. It's going to decay, and you want to write the least amount possible.

Verification is the job.

Writing is one of the highest leverage things that you can do.

Being able to steer agents as they are actually running is one of the most effective ways to keep your mental model intact.

Most people are more enamored with the sophistication of their setup versus building something that's useful or people want.

What constitutes good at any given time is not something that's very easy to actually verify.

Hardcore Agentic Engineering for builders who ship
Greg's upcoming agentic engineering course
SpecStory
Greg's company preserving AI coding context and session history
Stoa
SpecStory build room for collaborative agentic engineering sessions
25 Patterns in Agentic Engineering
Greg's book mined from agent sessions and Git history
AI Essentials for Tech Executives
O'Reilly book Greg co-authored for technology executives
Meditations on Tech
Greg's weekly blog mixing philosophy and technology discourse
Beyond Code-Centric
Greg's white paper on specification clarity for agentic teams
Goal Engineering
Greg's practice of pairing concise goals with detailed riders
WebRTC
Browser real-time communication used in Stoa's sandbox subsystem
CRDT
Data structure approach for Stoa's shared local file versions
Trunk-based development
Workflow Greg favors over PR-heavy branch processes
Steve Yegge's Gas Town
Software-factory essay Dan asks Greg to compare against
Dead Reckon
Greg's harness for running and verifying unattended agent runs
Devin
Cloud coding agent Greg cites as a shared Slack-style agent
DORA
Engineering delivery metrics discussed through GitPrime's category
Jellyfish
Engineering intelligence tool in the GitPrime-style category
LinearB
Engineering measurement product Greg names with Jellyfish and DX
DX
Developer intelligence company cited in software measurement discussion
GitHub Issues
Issue tracker Greg tried through GitHub CLI automation
Bear
Greg's personal note-taking app on Mac
DeepSeek
Cost-efficient model family Greg mentions for coding tasks
Qwen
Open model family Greg cites as cost-efficient competition
Deepgram
Voice AI provider Greg mentions as task-specific model choice
Nano Banana
Gemini image model example for task-specific model choice
Yann LeCun
AI researcher Greg mentions regarding world-model approaches
Transcript

I’m Dan Gerlanc and welcome to Agents and Engineers, the podcast about agentic AI and software development.

Today we’re joined by Greg Ceccarelli the co-founder and CPO of SpecStory. Previously, he was Chief Product Officer at Pluralsight and Director of Data Science at GitHub. He also worked at the League, Dropbox, and Google before that.

He is the author of AI Essentials for Tech Executives and most recently 25 Patterns in Agentic Engineering. Greg, great to have you today.

Thanks for having me, Dan.
So what do you think are the hardest problems facing teams that are trying to use agentic development or LLMs today?

That’s a great question. I’ll start, because it’s like multifaceted, I’ll start a little bit earlier than where we are. And it will be a lead up to, I guess, where we are. And I say this coming from experience, having worked at GitHub, being there whenever Copilot was first being developed, having a hand in that, and seeing what the reception was to all of a sudden you have AI augmentation to…

chat completions or text completions in your editor, right? I think initially, you know, the adoption of that was very enthusiastic, but wasn’t really doing net new real work. Like a lot of the use cases were around boilerplate type completions. And that sort of made, you know, adoption spotty, right? You have people that have extreme amounts of their identity and craft tied into the

to the role and they might be the source of authority on teams. And all of a sudden you have people being able to kind of tab, tab, tab their way to something that was going to get reviewed in a PR. And you sort of fast forward a little bit and I’m trying to think back to probably when this happened, know, 2025 kind of October-ish had the sort of introduction of, you know,

Claude 3.5 Sonnet at the time, which was sort of a step leap over prior generations of models. And that’s where you first got agents, right? You got agents within cursor, cursor composer. And that’s where the current crop of problems probably first originated. And I think when something kind of like this, I think the first challenge was how do I actually drive this model?

this agent, what is an effective way to prompt? And I say this coming from experience based on the company that I co-founded and sort of the initial problem we set out to solve, which was to make your agent session log durable so that you could actually share learnings. And we saw this with a lot of very early users where people just really didn’t know the threshold and the ceiling.

in terms of how to give work to models and have it return useful output. And this is before outcomes where it would just one shot it. So the first problem, which I still think is a problem, is shared learnings and best practices in terms of how to even drive the model or harness effectively. I’d say now, fast forward again, it’s June, 2026.

We have much, much more capable models, much, much more capable harnesses. And I think the sort of penetration and adoption in engineering teams, you know, is probably at least greater than 80%, maybe 90%. Like if you’re working at any SMB or larger enterprise, you clearly have access to one or more of the chat GPTs or…

Claude Anthropic’s, models, or some other set. And now I think because more and more people are becoming agentically-pilled, it’s moved from how do I prompt better, people all understand context engineering. I think now the dynamics that separate people who are perceived as 100x engineers versus 10x versus one are probably starting to compound of

a different set of problems because people have had enough time and reps to be able to actually kind of learn, you know, this is how I need to organize my repository. This is where I’m going to put the context I’m going to reference in my harness. And I think now the question is like, do we actually need this number of people working on this set of functionality or this set of problems relative to, you know, what we had in the past because

the amount of work that you can get done if you know the direction you’re heading is rapidly progressing. And so I think now it’s a question of like, well, what do I do if before I was on a team of eight, like the canonical one PM, one designer, six engineers. And now really this product only needs maybe a half of a designer, a PM and two engineers.

And so I think the problem is like, so what do I do? And I’m talking about existing companies. I’ll take a breath though, because you might have some questions.

Yeah.

Yeah, so in an ex right, ‘cause I think if you’re in an early stage company, right, you’re starting from scratch, so there’s then you might be trying different things. But yeah, within an existing company, where where do you go? Is it you’re just maintaining the existing product or are you adding new features to it? How how have you seen that

with your work b with because you have I mean you guys started with Spec SpecStory, which you’re now maintaining, but have also built out a new product. Stoa, how how have you seen that even within your own work for what you’re maintaining versus starting new is have you experienced that team size kind of dynamic?

Yeah, to a certain extent. mean, we don’t have that many people that work with us as a small team. I would say the question or the problem or the challenge now with agentic engineering, with agents, you know, sort of being able to fill in where you might have needed many more hands on keyboard is sort of different for a very small startup. I think in my experience where it enables you to, of course, like

has a lot of potential ideas to get to rough MVP form way faster, that can be dangerous in its own right because you might end up convincing yourself that a potential direction you might want to build towards is something that’s viable when it’s not because you actually haven’t done the right market research and talk to enough customers or potential customers to sort of validate willingness to pay or the true feasibility and usability of what

you you think they want versus what they want. But I definitely have seen, even within our small own arc, the ability for, you know, individuals to take on more surface areas that’s accumulated, which has is the flip side, which is good, right? We used to have a few people sort of working on the early spec story extensions because the whole work there had been sort of reverse engineering all the different formats between cursor, copilot, Claude Code.

know, AMP, like Gemini, like all the different ways that chat logs were stored. that was before the models became much more capable and a lot of the tool uses themselves evolved. That was a lot of actual like sweat equity we put in to like getting that right. But now like, you know, that whole surface area, while it was built by few people can be managed like, you know, in half a person’s time. So it has certainly created leverage, but also in some cases, distraction.

Because it is easy to just, you know, find something else to do at any given point.

What is the answer to that? And if I mean it might I guess it might be obvious just try to stay focused, but I guess when it becomes so much easier to build software, what how should we be thinking differently about what we’re doing every day?

That is the existential question of the day, right? I I sort of, not claiming credit, but I foresaw this. There’s a video of me whenever I was the CPO of Pluralsight talking about the marginal cost of software coming down very rapidly. And I think that’s net good, not necessarily for emerging.

you know, startups or SaaS businesses that don’t want to just be a thin wrapper on top of models. But I think that, you know, there’s so many problems to solve out there specifically in niches, vertical domains that can really benefit from, you know, more custom application. think the hard part is one, first understanding where those opportunities lie and then validating that, you know, you have

you know, not just the willingness to pay the demand, the sort of the sort of preconditions, the product market fit. And doing that work, I think, becomes not only harder, but sort of in some ways, less enjoyable. Because if you think about starting a company before, you know, like, let’s just say, 2024, as a cutoff point, both things were really hard. So you didn’t think about

the sort of distribution side as much because building the product itself was really hard and there wasn’t so much product being built. So I think that’s, you know, I don’t have the answer to what do you do, but I think the focus should be on, you know, figuring out how to reach the people that might pay for the thing that you’re building unless you’re just doing it out of the generosity of your own heart or your…

independently wealthy or something, right? And just maniacally focusing on that and figuring out how to maybe engineer the rest of the parts of the business, whether it be, you know, how you create top of funnel marketing to, you know, how you find people to testing creative angles for social content, like, right, the whole gamut that probably needs to be custom to you.

So it seems like if you’re interest if as personally you’re interested in doing the whole stack of things now, there’s opportunity to do it. In your own or there’s more opportunities in early stage companies, though. If you just want to be out writing code, like is that

going to be a job that still exists at a big company, or do you think that’s just gonna be less and less of what people are doing, except in very narrow areas that LLMs aren’t working well on for whatever reason.

Yeah. And I assume when you say writing code, mean like handwriting code or using maybe assisted assisted generation, not like full agentic, you know, development. Yeah. I mean, I have to think, I don’t think, I mean, part of the problem is, I think both of us exist in a bit of a bubble, right? A little bit on the bleeding edge of what exists in the real world, right? Cause I can walk outside and I’m sure you can and, you know, walk down the street and no one’s heard of Claude or Anthropic before they’ve heard of ChatGPT.
Yeah.

You know, have just given where I live, I’m in the DC area. I know a number of sort of people in tech slash software that work in regulated industries for defense contractors, that sort of thing. I know from talking to them, they’re not yet using very much AI assisted programming. Some of them are working on things like chips, for example. So I think there’s always going to be, you know, some pocket.

corners of industry that are slower on the uptake. But eventually when, you know, private models are more permeated, it’s not just the frontier, open source is more widely adopted and sort of customized for, you know, that specific industry’s need. Like even the people that right now, I won’t say are holdouts, but maybe are just limited by, you know, organizational procurement rules or data privacy concerns or, you know, security clearance will eventually

not probably be writing most of the code that they are responsible for. I don’t personally think that that’s a bad thing. I know that there’s a arc of conversation on the internet right now about like, you know, the death of craftsmanship, you know, people who had their identity tied to their ability to master obscure languages, really feeling, you know, sort of distraught that that that sort of skill feels like it’s, you know, being obviated.

and I can totally like relate to that, but I think at least in my mind, sort of being focused more on the product and business side of things, like, yes, your job was maybe the right code, but to write code for an outcome, to, to serve a need, right? Like software is normally not generated to, just, be this magical, you know,

perfect apparatus expression of logic, it needs to make contact with the real world. so I think more experienced engineers understand that, right? They’re trying to write code to solve a problem. And the most experienced engineers realize that code is in fact a liability, right? And it’s going to decay. And it’s like you want to write the least amount possible. LLMs are not that great at writing.

non-verbose code at the moment, but you know, you get the drift.

Yeah, for sure. I think viewing code from the asset liability framework is always a important way to think about it. Is maintaining is having this code worth more to you than the cost of writing it and maintaining it.

So you r recently released a book, Twenty-Five Patterns in Agentic Engineering. How did you get to writing this and of what are the patterns that if you if you had to pick two or three to s say, here’s where you start, where where would you

Start of the twenty five.

Yeah. so first I’ll answer the question about like, how did I get around to writing this thing? and then we can get into the other questions. I like to write just in general, I have a weekly habit of writing this blog that I’ve done for a long time called meditations on tech, which is sort of like philosophy meets the present days, like confluence of hacker news discourse, what I’m thinking about, what I’m feeling about.

And I’ve written some other short books, like this is a short book. And one of the things that I wanted to reflect on after spending six to seven months deep in the hole of bringing our new product to life was, which by the way was built completely agentically. So a mix between.

Opus 4.6, 4.7, 4.8, as well as the codex models on the other side of the fence. I wanted to have a way of actually sort of memorializing a number of the heuristics that through so many reps and sessions, I felt would be maybe beneficial to other people. And so the way I actually brought the book to life was

So at spec story, our extensions allow you to basically preserve all of your chat sessions as Markdown files at the most basic level. When Workflows released for Opus 4.8 which was very recently, I was like, huh, I wonder if I can mine. Yeah, it was like two, it was literally two weeks ago. I was like, I wonder if I can mine, extract and understand.

Yeah, like two weeks ago, right?

like through pattern analysis and obviously augmented by LLMs, you know, like what I actually did that could be sort of durable patterns, right? Because I just had so much exhaust. And so that’s literally what I did. The day it came out, the workflows sort of command, I pointed it at 1300 odd sessions in my entire Git log of…

It’s like going on almost 5,000 commits at this point and had it run for like, I think it ran for like 12 hours or something. And it gave me a huge candidate list of patterns that then I reviewed, curated, copy edited down to make them make sense, that sort of thing. And that’s how I brought it to life. It would never have been possible to make.

Nice.

something like that through probably self introspection and like metacognition. Like I just wouldn’t have had the head space to do it. But it was very useful in this case, because if you think about a long enough timeframe, like temporally, like you end up sort of subconsciously repeating a lot of behaviors. like, you know, the book starts, it has like six parts and there’s, you know, 25 odd patterns, right?

But the thing that I know I did, it was very obvious whenever I saw this sort of evidence, is verification is the job. Most of what my loop looks like, and I’m not talking about slash loops, I’m about what human loop looks like, is either I’ll design a design file that I use instead of any of the embedded plan functionality.

It might be based off that, but I like to actually preserve it because again, helps with recall. It helps with being able to remember what you did, explain it to others. I designed one of those for a piece of functionality. Like for example, one of the things in Stoa is a fully self-contained sandbox agent environment that allows for collaborative chat in a way that like most products don’t. That’s like a full subsystem of Stoa, which combines WebRTC,

like CRDT based, like local versioning of files that can also be shared on the web, right? And as you’re building something like this sandboxed agent container, you know, you’re constantly, you know, making changes with an agent on the code base, looking at the logs, trying stuff out sort of in the runtime, you know, understanding if it actually now does what you thought it did, if it matches your intent.

and then, you know, correcting any errors, you know, through that process. And I like, I mean, the number of times I did that for any part of the product is like innumerable. Almost it’s like all I did. So I think that’s like, just as a pattern, like realizing that, interactive sort of verification of the thing that you can. You know, test against, don’t mean necessarily running tests. although there are plenty of those.

is why it’s number one, because that’s the most consistent thing. I think another thing that I firmly believe in, and this is one of the reasons I just write generally, is that I think writing is one of the highest leverage things that you can do. And there’s a section in a pattern about docs or the APIs between turns. And I think about that a lot.

One of the things that I’m sort of like most, I don’t know if the right word is proud of, excited about, think is the most useful thing is what we call these as built architecture documents. So the way to think about it is like, instead of having to ask the agent to go and fan out a bunch of sub agents and read all your code over and over over again, as the system grows in complexity, you can maintain

an as-built architecture document for every subsystem of the system that is like a map of the territory that you can keep fresh so that instead of always having to expend a ton of tokens whenever you start a new session, you can first point it at that thing so that it can understand, okay, you know, what’s the project structure of this? Like, what are the routes? What is the data model? Right. And, you know, get up to speed without having to explain a lot to the model.

That used to matter a lot more, I think, before the introduction of, you know, sub-agents and agent teams and workflows, right? Because you just didn’t have the ability to, you know, point the main session orchestrator at everything. But I think it is just generally a good practice because it’s something that of all the documentation I produce, I actually do think about those documents a lot because it helps me keep my mental model of,

I want to change this part of the client UI. What do I have to remember in terms of the vocabulary and the way that this is organized in order to effectively write a prompt or a series of prompts or write a document to give to an agent to work? I’ll stop there right now, but those two things, think verification of the job and keeping your mental model up to date are some of the most important patterns that are…

also sort of part of the mindset of doing agentic engineering well.

And how do you set up your docs? Do you start out writing them manually or iterate on it with an LLM, a mix of both?

Yeah, I almost always start iterating on them with an LLM. Yeah, like almost always. So, the start might be from a variety of different angles. Like I might do a bunch of like web research through one of the harnesses, consolidate that, review that, think about it, and then sort of plan from some number of angles, you know, that emerge through that. But

Whenever I’m thinking about building something, I’m very rarely handwriting the doc. Very, very rarely. It’s always like a conversation. It’s a conversation that produces the doc that then I review that either discard or say, okay, we’re going to go to this direction.

And will you h have them automated or updated automated or will you update them automatically through something like CI hooks or when an agent session finishes, kick off a check or do you just run it manually as you’re going through?

Yeah, so I’ve built a number of skills that, for example, can keep the as-built architecture docs up to date. Those are the primary ones that I care about keeping up to date because a lot of the other things are more like historical artifacts. If you think of either a design plan that got implemented and then an implementation doc, unless you’re going to go and modify that feature set, it’s not changing as much as

you know, the total architecture of the system for like that one specific thing. But all of those docs, in this, your question was about like automatic, freshness, I guess all of those docs though, follow a very standard format, like, and they’re all versioned obviously in like Git and GitHub in our repo and a docs folder. So there’s like a design, there’s a docs folder, like there’s, you know, the as-built architecture docs, some of which are actually in.

You know, we have a mono repo for this project or actually in the sort of different, like subsystem folders, but then there’s implementation docs and then there’s like all of our raw, like SpecStory histories. Like there’s at this point, like thousands of docs, but it’s, but it’s, it’s, it’s, it’s, it’s pretty scalable. Like it hasn’t slowed us down. Like it’s always needed a reference. And I think.

A little hack is like, you know, committing those things, those docs with a month with verbose commit messages, in the, in the sort of batch or, or sequence of commits that you’re making as part of some phased implementation plan, because LLMs are really good at grepping Git logs and then finding like the documentation that will compliment, you know, a verbose commit message. So you can remember.

five months from now, like what you did and what the trade-offs were, that sort of thing. Yeah.

Yeah, that because of the c agent git grepping, I’ve switched to semantic commit logs to make it easier to find all of those or just search for docs or f bug figs, things like that.
Mm-hmm.
How has your coding development workflow changed as the models have progressed kind of from November through to today?

Yeah, the one practice I wrote, I guess you could call it a white paper. It was in June of last year, actually, June 2025, called Beyond Code Centric, which laid out a number of different sort of theses on some different premises. But the one thing that I think is probably most radical from where we started to where we are now is almost the complete

obviation of PRS like we don’t do any PRS at all. mean, there are some for different reasons from like releasing from dev to main, for example, in like a structured way. but when we’re developing, you know, I think that like. Trunk based workflows, which it’s not like they’re not popular. They’re just not as in vogue as, you know, a Git flow or GitHub flow are.

is the way to go with LLMs because in my mind, it’s so cheap and easy now to kind of reverse mistakes. And it’s not to say that we don’t work on branches because we do work on branches. We just integrate them into the trunk very frequently. Or if it’s some experimental thing, we’ll have some like dev branch, depending on the environment we’re deployed to, you can go see the app with.

that current state of change or whatnot. But I think that PRs at this point, if you’re having an agent write the majority of your code are kind of pointless in a lot of cases because I’m not reading, it’s impossible to read the volume of the code that’s getting produced anyway. And if you can just rebase or semantically rebase, which is kind of like a term we made up.

you know, changes by replaying your intent on top of, you know, your working trunk, it tends to work a lot better than trying to gate things on PRs getting merged that just take a lot of time. This is probably not a practice that works well everywhere, but if you’re a small, closely knit team that kind of knows like what’s going on, I think it’s extremely effective. And it’s probably the most radical thing from when we started to where we are now that we do way differently.

Yeah, I think there’s with agents there’s less and less excuse not to rebase.
Yeah. And I also just want to comment on that. I, the thing, the thing that’s also, it’s not like different in my development workflow, but the thing that I guess I have a strong opinion on, is I really don’t like, cloud-based agents. Maybe my tune will change, but there was a point where, you know, I think cursor first came out with cloud agents and then like jewels from Gemini.
So that was our god.

you know, made some noise and of course there’s all of like cloud web and all of that. I’m, I’m not a fan of delegating tasks to agents that then because of their very nature have to, you know, containerize your repo and then open PRs on your behalf, because I don’t want to read those PRs. also I find that being able to steer agents, you know, as they are actually running.

is one of the most effective ways to keep your mental model intact. And if you spawned off like 10 or 20, you know, remote tap and then you have 20 different PRs that land at different times to look at, like, I just, for me, it’s not, it’s not sustainable. I’d much rather be in the flow and sort of be in the moment at the moment, or at least proximate to the moment.

So do you think things like Steve Yegge’s Gas Town or ones where it’s kind of more software factory directed is your do you

Do you have you know people who’ve had success with them or is it what how do you feel f relative to working locally, steering it, that that’s kind of the the opposite where it’s just send the agents off yellow and either check or just merge stuff in?

Yeah.

Well, so I guess I’ll caveat it and say, so I have no problem with like unassisted long runs, assuming that you formulated a plan that you’re like confident in before you actually like go and give it to the agent to go do that. However, it’s going to orchestrate it. And normally the way that I do that is now, now.

it wasn’t always this way with like a goal and writer pair, right? Because there’s now this new goal sub command that allows you to define like a four up to 4,000 character goal that should have, you know, what the goal is, what the sort of posture is, what your constraints are, what the invariants are, all those things. And then like attach it to what you can think of as like a very comprehensive spec. So I really like that sort of paradigm. I think it works.

If you, one, for example, in your goal or your writer, ask your agent to do phased commits that you can go back and look at the span of and kind of ask, like, you know, what did you do? Like, how did the user functionality change? How did the data change after the fact? So you could run one of those, go on a walk, come back. But what I normally do in the way, I sort of get the best of both worlds is I’m working on multiple different separate projects at a single time.

And so it’s like, you know, that’s how I make it work. Now on the question of like, Gastown and some of these like dark factory patterns. I haven’t personally used Gastown. I’ve read about it. I know that there’s mayors and all these certain different characters. I think like the Starcraft or Warcraftification of agentic developments, like slightly weird. Like I appreciate that like

Mm-hmm.

Codex and ChatGPT give their subagents funny names. I know like Codex does, like I’ve seen Feynman and Prometheus and stuff like that. But I can’t imagine actually using one of those systems right now. And I also think that whenever I’ve tried to kind of break out of the, at least I verify stuff that I can see or run or use.

That it very easy to stop. It’s very easy to slide from. I actually have an intent that I’m trying to express. That’s going to make a change that I can reason about to, just going to start vibe coding and completely delegating all of my decision-making, which is a spectrum, right? It’s very easy to run into that whenever you’re like, I’ve built this complex orchestrator system that, you know, I don’t need to know what happened because then how do you.

choose the next thing to do. And so I’m sure some people are finding some amount of success with those things, but I think most people are also more enamored with just the sophistication of their setup versus building something that’s useful or people want. And that’s part of the problem because it’s easy to go down that path, right?

Yeah. How do you keep track of what you’re going to work on next? Do you still use kind of standard issue GitHub issue tracking or something like that or another method?

That

I shouldn’t mention that we have almost we’ve never used any issue or task tracking system at all. And every time we have tried to do that, it’s sort of just failed. At one point, I was implementing like payments and billing for Stoa. And I was not using it to decide what I was going to do next, but just attempting to use it to see if it would make sense. I was using the GitHub CLI to automatically create

GitHub issues that was tracking the progress of every part and piece of Stoa that needed to be instrumented from a cost perspective. then like when we were setting up the billing side of the house, all of that, because it was, there was just a lot of moving parts to it. Cause we had sort of infused LLMs and different agent runtimes and a lot of parts of the thing. there were different models being used from different providers.

tracking all that was complicated. And then it was like, why am I doing this? Because we’ve already versioned the docs and stuff in our repo. And so a priority to doing anything the way that we work is we normally have like a weekly document that’s markdown that we just update and say, we’re gonna do X, Y, Z thing. Not even with a lot of detail, just to make sure we all know what we’re doing. And that’s kind of how we keep.

track of stuff to do. I also just keep a lot of stuff in my head, which is like not the best productivity hack, but it’s just kind of, it’s just, it’s just how I work personally. I mean, I have, you know, a note taking app, like I use Bear like I’ve used Bear forever on Mac. But from a team coordination standpoint, we normally meet in the morning.

you know, founders, we talk about what we’re going to do, you know, whether it’s on the product and tech side or the go to market side and write it down and then talk in Slack. And yeah, I mean, it works for us. I don’t think it works for, I don’t think that sort of process works for a much larger team. Um, but that’s, that’s my experience at the moment.

What do you think if you were within a larger team, how would you organize this? I mean, of course you are STOA is built for teams to work on these kinds of problems, so there is all you do

It is. It is. I

mean, it’s interesting. Coming from both the technical side of the house as well as sort of the product side of the house, one thing that might have been a little bit under the radar, but when I was at Pluralsight before I was the CPO, I actually ran a business unit. And that business unit was an acquisition that Pluralsight made of a company called GitPrime, if you’re familiar with

those types of products like software engineering measurement products that normally hook into your ticketing systems and your version control systems and in some cases your build systems and whatnot and attempt to measure metrics like DORA metrics, for example, right? Like there’s a whole crop of these tools, Jellyfish, LinearB, DX, all of them are the same category. And

One of the things that I found very interesting because I had a little bit of an insider view across many, many companies that, you know, had engineering teams prior to LLMs was like, yes, people use JIRA. Yes, people use GitHub issues. Yes, people use other things like Planview whatever. Is any of that data ever really kept up to date, well manicured, accurate representation of what’s happening? And I was.

like sort of really perplexed, whatever, you know, you can kind of see our own issue tracking in the lack of verbosity, verbosity, lack of detail in these things. And so I know your question was like, how would you organize this if you were like working on a larger team? I’d say like prior to LLMs and agents, like people weren’t even organizing it well at all to begin with, like the amount or lack of labels, status updates.

Signing ease to things is just pervasive. The lack of backlog grooming, depending on the way that you work, if you’re working in sprints or in Kanban or whatever your workflow is, just horrible. But that’s all to say. I think if you’re on a scaled team, one way that we’ve seen some partners that we know, sort of friends with spec story work, and I think your mileage varies, is for NetNew Greenfield projects,

and this was about eight months to a year ago, they had been attempting to leverage issue tracking, not as an assignee to an engineer to go tackle, but as the unit and medium and mechanism to collaborate on effectively at that time specs that they could hand off to cloud agents to actually do all the coding. And I think

that was effective for this particular team. I don’t know how well that scales per se. you know, there was sort of like a benevolent dictator in the mix that would like sort of judge all of these issues that were actually just specs on the web to hand off to the sort of agents and coordinate, you know, intent from various contributors to this project from the product side, design side, engineering side. I don’t think

That’s going to be, that’s probably what’s happening today. I think more and more people are probably using integrations through like Slack to use like shared agents, like the Devins of the world or, you know, anything that you can integrate in Slack. know Linear has a Slack integration that, you know, kind of parallels what I just described. I just don’t think that like issues as a format for assignees to people matters very much anymore. It’s like issues.

that have some documented provenance and evidence that you hand to an agent is what probably matters. And being able to surface those over time, recall them, find them, some durable log is what matters, not necessarily the tool that you use.

And you’d mentioned before when you’re working on something, you’re usually managing a few different projects at once. Have you found an ideal count for how many things at once you can do? Or whether it’s complexity varying?

Yeah.

I don’t think a lot of the way that I do things is ideal. But I would say that, you know, it depends on maybe how new the project is to me compared to the rest of the things that I’m trying to parallel process. If it’s a brand new project or a brand new concept, I probably wouldn’t be

running that many other things in parallel because there’s just more cognitive load. But let’s say, you know, I’m working on my, in my spare time on this harness that I’m calling Dead Reckon, which is like a harness for harnesses actually. And that thing’s kind of far along. And then I also have my personal site I just updated yesterday. And then obviously I have Stoa and then there’s a couple of new things. You know, I’d say between like,

three and five is probably where I find myself most days, but it doesn’t mean my concentration is equally split, if that makes sense. And so I think at any given time, you can probably only concentrate well on one thing. But the way I think the pattern works is if you’re running agents across multiple disparate projects, what you’re doing is basically waiting for

And the only way you can do this effectively is not to be interactively conversationally prompting, it’s to be defining and refining plans that then you give to agents and you tab away and you say, okay, that’s probably going to take 30 minutes or something. I can go plan this other unit of work over here. Right? Like that’s, that’s how I find myself working.

And do you run these all locally on your local machine or do you with dangerously skip permi permissions or are you running it somewhere else?

Yeah. No,

yeah. mean, part of the prior bit of conversation, like almost 100 % local. I can’t remember the last time I did a cloud run for anything that wasn’t like, you know, a GitHub action build that’s just sort of deterministic, even if it’s leveraging an agent. All local, almost always dangerously skip permissions or auto mode or whatever the equivalent YOLO mode is for the other harness.

And I think I do that now way more than I used to because the worst thing would be to tab away and realize that the agent running over here had been prompting you with an ask user question for the last 20 minutes and you didn’t realize it. And it was just to have access for something, like outside of its sandbox or something. It’s like, I don’t care. Like whatever. Yeah, of course I’m going to hit yes on that. So yeah, always now in those modes. and I think.

with all the like cool allowlist configurations you can do. Like I’ve, I’ve never once had a horror story situation like you see on X of, like Claude deleted my production database. Like it never even anything close to anything like that. so I, I, I’m personally just like not worried about, you know, dangerously skip and, know, if I was, I would be more cautious about.

but I allowed it to have in its environment, but it’s never really happened to me. Although I know it happens to others.

Right, yeah, I’m always saying why does why are you giving your agent unrestricted access to your prod database?

Yeah,

mean, there’s just some, you know, that’s where I think experience levels in judgment sort of come into play. mean, it’s, you know, that helps having been there, done that a little bit.

Do you find managing a group of agents more or less tiring than writing the code yourself?

Well, at this point, honestly, across the languages and tech stacks that I work in, I would never be able to write all this code myself. I used to be fairly proficient at Python, but I don’t even work in Python at all anymore. So I guess it’s kind of an interesting question. I do find the general practice of managing

you know, the cognitive work that goes into turning your intent into something useful in terms of an output to be demanding and it can be draining. and I do check myself sometimes to say, okay, I got to get up from the computer and go walk outside and, know, eat some food and, you know, have a glass of water or whatever. it is, I think very easy to get addicted to this because of the sort of like dopamine reward of seeing

Yeah.

you know, your idea come to life. think that’s actually more of the risk for a lot of people using these products than maybe they even consciously realize because I’ve been using these products since their very origination. And I have felt the way I think Marc Andreessen described in a recent podcast, like an AI vampire, he described this concept of being like an AI vampire, like

you know, like people staying up really late into the night because they felt like they were being so productive because of the dopamine reward that they were getting that they like just didn’t sleep. That has certainly happened to me. And I think for all of us that are using these types of products on the daily, it’s more and more, it feels, it might not be in reality, but it feels so much more rewarding that that’s what I’m worried about more than, like how mentally taxing it is. It’s like, can I even stop? I can always,

I’m up with a reason for something I should go do, right? And that’s partially one of the reasons outside of my desire to be in the flow, I don’t use cloud agents because if I did, I’d be teleporting into, know, Claude on the web through my phone and typing at night to Claude what to do. And so I’m like, I don’t need that because I already spend 12 hours a day looking at my computer screen, right?

Right. That it gives you a clean separation between the two.
Yeah, not that the computer is that far out of reach though. It gets easy to get there. Yeah.

yeah, there’s there’s always that danger.

If you’ve gotta put one of those locks like they have on the vault to your office. It only opens every so many hours, something like that.

Yeah, yeah, yeah. Yeah, I used to have one of those from my iPhone, but it was like literally like a physical device that was like I have this like fridge over here. It’s a physical device that like, would brick your phone. I think it might have been called brick. I don’t know what happened to it. I’m like looking for it. Anyway, yeah.

Yeah, I think I

I think I’ve seen it. It’s something like brick or something like that that yeah stops you from using it. Yeah, sometimes sometimes I’ll turn I’ll turn my phone off and put it upstairs and that’s the that extra two minutes of having to wait for it to turn on is enough to to beat the temptation.

Yeah.

Yeah.

Yep.

Mm-hmm.

Yeah, I I know from my experience, especially in the early days, thinking, I’ve gotta make sure Claude has something to do overnight but

on that note, mean, recently with like, you know, the emergence of goal engineering, right, the ability to basically have a number of gates, even if they’re model judge that like will keep the model running for a long time. I have run, I have basically been up at like 11 or 12 o’clock at night and, you know, kicked off a goal. And sometimes those goals, depending on the tech stack like

building this one thing in Rust and so the compile loops are very long and you know whatever it takes a long time. I’ve woken up and it was still running like you know eight or ten hours later but yeah this is a big risk I think for all of us because it’s you know as you get more agent-pilled like I can you know I can build anything and I should just spend all my time like every time every part of the day I’m not in front of the

know, LLM, sending it instructions is wasted, but it’s not true in reality.

Right, I wonder it’s almost is it better to just be more focused in that time and then let it be, right? Or avo avoid the productivity addiction. What like it’s it’s not easy, but it’s it’s something that becomes more important in this kind of world.

It is, and you know, was thinking about this specifically yesterday because Anthropic dropped Fable 5 and they’re like, max subscribers get access to this until June 22nd. And then you might not get access to it and you might have to pay for usage until we bring out more capacity. immediately, you know, I’m like, I know the input output token rates of this, it’s unsustainable to actually pay the API costs. And I’m like,

How much can I get done in the next 12 days before it might or might not actually get taken away? Right. And so like they’re clearly not doing anything to limit the fear of missing out. Right. They’re trying to drive as much subsidized adoption as possible, which, everyone knows. So you have a bunch of different forces working against you, right? You have the intrinsic, you know, this is really fun.

I was never able to do this much this quickly before I have all these ideas. need to get them in front of me. You also have the sort of scarcity mindset of compute and rising costs and subscriptions that may or may not exist forever. And then you also have whatever other external demands are on you, especially if you’re in a startup situation or something is like, I gotta be fast anyway, because beat is maybe one of the only advantages that I have.

So yeah, it’s certainly challenging across a number of dimensions.

And do you s what do you think is gonna happen once all these token costs get

Yeah.

Well, maybe that’s the forcing function we actually need in order to be more mindful of what we choose to work on. mean, inherently, these subsidized subscriptions, if you’re able to have one at your company or personally, are part of the problem in terms of, well, the worst that can happen to me is I just have

I’ve exhausted my usage limit and I have to wait five hours or whatever. Maximally, can just buy another subscription. I know there people online, especially on Twitter, like there’s this guy Doodlestein. I don’t know if you follow this guy. He’s like, he has like, I think he literally has like 10 or 20 Claude Max subscriptions or something. I don’t really know what he’s producing, but it sounds cool. Yeah. What’s going to happen when the token costs, you know, aren’t subsidized effectively to the rate that they are now? I mean, I think

Yeah.

That might be net good because the amount of app creation and software creation right now is just obviously explosive. But I think the actual adoption and activation of a lot of products like is not following suit. Like there’s a limited attention span and utility for a lot of things. One of the, I guess the lineations in my mind right now about like what products even make sense to build in a world where

you know, you have to get benchmarked against a Claude or ChatGPT is, you know, a product that kind of works in the background and does stuff frictionlessly for you or something that does actual work for you. And that’s a hard, that’s a really hard bar to pass, given the fact that agents can do anything on the computer and, you know, cost what they do at this point. I guess a side thought.

to follow up and finish up on that thread is I think the good thing is for coding applications, which is obviously the premise of this entire podcast and what a lot of your listeners and other speakers are gonna be thinking about is now there is enough competition and there are enough open source models like QAN and other sort of frontier models like DeepSeek and whatnot that are

extremely cost efficient from a token economic standpoint that people will just migrate to. It’s like, don’t need Claude Opus 4.8 or Fable 5 or even ChatGPT 5.5 to do the majority of what I do. But why wouldn’t I use extra high intelligence? Like, why would I choose to use low intelligence if it’s effectively free for me to use extra high intelligence? It’s like, what’s the point of that? But I think

you know, when token costs become less subsidized, as they likely will, people will actually put more thought into not only building the right thing, but finding the right model for the task, which right now it’s almost like, why would I put in that work? As an individual building software with agents, I think that question is already on the minds of many for AI powered products, right? Like if you…

If you are paying the cost yourself because you can’t use the subscription, like you’re going to find the most efficient or the best suited model, whether it’s Gemini for video or sorry, Gemini for a Nano Banana for image generation or, you know, it’s like a very specific voice to text model, like deepgram offers, you’re going to find the right thing.

Yeah, I think I always find it interesting how it’s changed from building applications with LMs to coding where you have the super high powered loop. Like everything you’re using is like a battleship first. If you’re building an app, you wanna do it in the least expensive way possible. So

Yeah. Yeah.

Yeah.

I don’t need to run my skill to like rename a file with Opus, but I can, so why not?
Yeah, like I just I use Fable 5 to recenter my divs. like I did, like any model at this point can do that. you know, yeah.
Well well, I mean we’re almost at an hour. is there I mean anything else you wanna chat about or go over?
I mean, unless you do, I’m all ears. mean, was there anything you wanted to go deep wrong? Or that was like an annoyingly long answer because I tend to talk a lot.

yeah, let me Well I there’s I guess one well there’s one question I I do like which

is where do you still not trust these tools?

Mmm.

I don’t know if it’s a question of trust, but it’s a question of where I find myself doing the most manual labor, I guess, where you could conceive of it as trust. Anything to do with copy editing, copywriting, or visual design, still, I don’t really trust any model to ever be able to quote unquote, one-shotted or get it right without a lot.

of intervention, you know, if you compare that to some sort of like backend tasks, like building a bunch of routes that connect to persistence store, like a DB or something like more often than not, like if you have your spec or your intent well-defined, it’s going to get that, that right. But anything that’s like, you know, and this is obviously a cliched answer because I think a lot of people agree with this, but you know,

even through ideation with LLMs on what I’ll call creative, like copy. It’s sometimes not even worth, it’s almost sometimes not even worth the effort. I find myself like, I’m going to go and test a bunch of potential H1s for this marketing copy, right? And then I’m like, I don’t even

And I go through this long process and I’m like, I need to just write this myself. Like I can’t even use the model to do this because it’s like polluting my mind with it’s not that they’re that sycophantic anymore, but like it’s almost like pollutes your own ability to reason about what is the actual message you’re trying to convey or like, you know, how should this thing look? Right. So that’s where I definitely don’t like put full trust in where I spend.

outsized amount of time still sort of manually or human driven effort, you know, fixing.

And I have seen people say this, but do you think that taste really becomes one of the things that these models still aren’t good at and I mean as I’m saying that I’m thinking of some of the things that people posted in the last day with Fable from a design standpoint. I’m like, this got really good but until

Yeah.

I mean, I think it’s like so easy to say that like, taste is all that matters. It’s like, what is taste? then like, you know, it’s like, what does that even mean? Right? Like, what is the definition even? I think, I think, I think about it maybe more about making things legible, like, like easy to understand, relatable, as being something that humans are

maybe inordinately good at. So you can think about just like, how do you convey a message and value in as few seconds as possible, right? And I think that puts a little bit of a point on it, because I think that requires taste, but that’s like a concrete example of like what taste might be. I do think that that matters. I also think that connecting the dots between ideas or things or

products or systems is something that like LLMs are never going to be good at because you need to like have some idea what the like possibility set is. idea, like idea generation is something that like really obviously matters. think ultimately, even though there’s all this talk about designing for the agent experience and designing things for agents and the agent economy and whatever, like, you know,

still we live in a world dominated by humans selling things to humans and so like at some point you have to relate to other humans in some way shape or form that is going to obviously still matter greatly. Yeah so I do think that taste matters but I like to put a point on what that means.

Yeah.

Yeah, taste. And I I mean as we’re talking about it here, I’m thinking one aspect of it is also trade-offs, because there’s taste I think also speaks to s in some sense to you have to pick among these different competing things you can do.

Yeah, exactly. Exactly. you have to, it’s like, this is a little bit of an aside, a little digression, but I’ve tried to use agents to come up with names. It’s like, it’s like an effort and futility, right? Like not even just copy, just like names for things, right? I guess that is the cliched joke about the hard things in programming, right?

like naming things and cache invalidation or whatever the statement is. And then off by one error is like the.

The the two th the two things

that are the the two hard things, that and off by one effort.

Yeah,

yeah, exactly. Those things. mean, it’s like naming is really hard. It’s like impossible. I don’t know. Yeah.

Well and it’s possible someday, sure, that the model has access to all the information there is, but I think part of a big issue is always what information what the scope is defined to be makes a big difference in the answer you get. Like that there’s no like pure

fact, right? There’s here’s the facts or constraints we present. Right. There’s in theory could be an infinite number of things that might be relevant. So at some point we have to make that decision.

Yeah. And

I mean, I think that’s the last maybe comment on this is, you know, compared to something like coding, which has ability to be benchmarked and verifiably right answers, which is obviously why it’s dominated the application of LLMs to this point, much of the world is pretty illegible by definition. It’s like, you know, what does like, what constitutes a good book?

Right? Is it, is it its popularity? Is it the eloquence of its writing? Is it, you know, the environment that it was born from in some historical era? Like, like what, what constitutes good at any given time is not something that’s like very easy to, actually verify because good means something very different to a lot of different people. and I think that’s where it’s like, as long as we have autoregressive next token predictor.

type models, you know, it’s like, you can’t really encode some of those things and its weights and you can’t verify it through post-training extremely accurately because the like solution set is almost infinite in terms of what the ground truth verification is for that particular task. And so I think, you know, I’m not qualified at all to speak on this topic because I know very little, but I believe that’s why like Yann LeCun and

you know, the world model idea of trying to build a different type of AI is at least getting airtime because there are just inherent limitations to the actual technology in terms of how you can score it, train it, pre-train it, right? Like it’s just, there are limitations. It can’t do everything, even though it feels in some cases like it can do most creative work that’s out there, whether it’s

video generation, image generation, text coding, right? Like most of the things that humans have spent a lot of labor and education getting good at.

Well Greg, thanks again for joining us. It was great to have you.
Thanks so much Dan, it was a pleasure. I really enjoyed the conversation.