Growth Dispatch

Who is making decisions around AI in your Org?

piyush sagar mishra — Sun, 26 Apr 2026 16:44:46 GMT

Disclaimer: The views here are my own and do not represent my employer or anyone else.

There’s a line in a piece i read this week that i haven’t been able to stop thinking about.

“the people with authority over how organizations adopt AI are the people with the least firsthand experience of what these tools can actually do.”

I’ve sat in enough rooms to know this is true, and i think the second-order consequences for enterprises are more serious than most leadership teams are willing to say out loud.

The traditional argument for seniority rests on three things: pattern recognition built over years, the ability to retrieve the right analogy at the right moment, and the judgment to know when to commit and when to hold. These were genuinely scarce skills for a long time, and they were scarce because building them was expensive - in time, in exposure, in the cost of being wrong enough times to learn something.

AI is compressing all three simultaneously, and faster than most senior operators have noticed.

A junior analyst who can generate five competitive positioning scenarios by end of day isn’t slower than the senior strategist who’s seen this before; a second-year lawyer who can surface every relevant precedent in minutes isn’t at the retrieval disadvantage she used to be; and a product manager who can launch, kill, and relaunch in an afternoon doesn’t need to spend six months building the case before she earns the right to try something.

What’s left, then, is the part of seniority that was never really about skill - the accumulated credibility that makes public wrongness expensive, the identity tied to past decisions, the unconscious filter that discards honest insights before they finish forming because the environment has trained you to run it automatically.

The enterprise implication is uncomfortable - most organizations are making their most consequential AI decisions through people whose daily experience of the technology is the furthest from the frontier. The CIO setting the AI strategy who hasn’t opened the tool, the CMO approving the roadmap who learned “openclaw” from a conference deck, and the CFO evaluating AI ROI using frameworks built for a different kind of investment entirely.

This isn’t an argument that experience is worthless - Real judgment (the kind that comes from having been genuinely wrong in consequential situations and having learned something from it) is still rare and valuable. But it arrives in the same package as accumulated aversion to risk, protected decisions, and the version of the story already told to the board. And i see it around - even the person carrying it can’t always tell which is which.

The organizations that will navigate this well are the ones that find ways to get their most experienced decision-makers into genuine daily contact with these tools - not demos, not summaries, not filtered briefings, but actual use. And the ones that create enough psychological safety for what a 22-year-old figured out in an afternoon to actually reach the room where the decision gets made.

The gap between the two groups is widening by the month, and the decisions being made in that gap are not small ones.

The most underrated job in enterprise AI

piyush sagar mishra — Sun, 05 Apr 2026 12:31:01 GMT

At a founder workshop this weekend, someone asked what the most underrated role in an enterprise would be 18 months from now. I said it was the person who knows how to rebuild workflows for agents from scratch, and it sparked more conversation than I expected. Not the AI strategist or the prompt engineer, but the person who actually goes into an organization, understands how work really flows through it, and redesigns that work for a world where agents are doing meaningful parts of it.

I’ve been doing this hands-on for the past year, and what strikes me is how much genuine effort it requires and how rarely anyone talks about that honestly. Like a lot of things in life, the outcome gets all the attention while the setup gets none of it

Before an agent can do meaningful work inside any business function, someone has to make the unstructured data legible - years of documents, emails, notes, and tribal knowledge that lives nowhere an agent can read. Someone has to map the actual workflow, not the sanitized version on the org chart but the one that really runs, with all its workarounds and undocumented judgment calls baked in over years. Someone has to figure out which parts the agent handles well, where it breaks down, and where a human needs to stay in the loop. Someone has to connect systems that were never designed to talk to each other. And then someone has to rebuild the process itself, because the old one was designed around human constraints that no longer apply.

This is what the conversation in that room kept coming back to. Everyone is focused on what agents can do, and very few people are thinking seriously about the work required to make them actually useful inside a real organization. It doesn’t make for a good conference talk, and it won’t show up in anyone’s AI transformation case study. But it’s the work that determines whether any of the AI investment produces something real, and the people who can do it well are genuinely rare right now.

What I find most exciting is what this means for people early in their careers. This is one of those unusual moments where curiosity and willingness to get into the operational weeds matters more than seniority. The person who spends the next 18 months going deep on this - learning how to set up and redesign workflows for agents across any business function - is going to be valuable in a way that compounds pretty quickly.

Every organization is going to need people who can do for AI agents what good engineers did for software in the early 2000s: go in, understand the domain, and rebuild the infrastructure from the ground up. That wave created a generation of people who became indispensable fast, and this one will too.

there is something wrong with the "AI productivity" conversation

piyush sagar mishra — Mon, 16 Mar 2026 11:38:21 GMT

Disclaimer: i work at twilio. views here are my own.

Every AI ROI conversation in marketing (and I am sure pretty much all of GTM) right now sounds roughly the same - “we used to spend eight hours on this report, now it takes two”, “we used to produce four content variants, now we produce forty”, “we used to take three days to build a campaign brief, now it’s three hours”.

These are real numbers and certainly represent real value, but they are also almost entirely the wrong thing to be measuring.

Time saved on existing tasks is the floor of what AI makes possible in marketing, not the ceiling. The teams treating it as the ceiling are optimizing themselves into a comfortable version of the same limitations they’ve always had.

When you apply AI to an existing task, you implicitly accept that the task was the right one to begin with - that the workflow was correctly designed and that the output was the right output and the only question was how fast you could produce it.

But, most marketing workflows were not correctly designed.

They were designed around constraints - of time, of headcount, of data access, of technical capacity - that shaped what was even considered possible. You didn’t build a process to analyze every support ticket for product marketing signals because no team could do that at scale. You didn’t build a process to research every target account before outreach because the math didn’t work. You didn’t build a feedback loop between win-loss data and campaign messaging because nobody had the bandwidth to close that loop weekly.

Those weren’t strategic choices but rather capacity choices that hardened into process.

AI doesn’t just speed up what you were already doing but makes previously impossible tasks viable. The teams that understand this aren’t asking “how do we do our current work faster.”, but instead “what work could we now do that we couldn’t before” - and then rebuilding their function around the answer.

The marketing teams that are genuinely ahead right now share one characteristic: they’ve identified at least one capability they now have that simply did not exist for them twelve months ago. This isn’t a faster version of something old but a new thing entirely.

Signal coverage at a scale that wasn’t workable before - e.g. monitoring buying intent across hundreds of accounts and routing it in real time. Content personalization that wasn’t economically viable - tailored by industry, role, buying stage, and account history without a dedicated writing team. Research depth that wasn’t feasible - competitive and account intelligence synthesized before every meaningful sales interaction, not just the big ones.

They are new capabilities, not just efficiency gains, and they require not just new tools but new process logic - designed around what AI can actually do rather than retrofitted onto what humans were already doing

The reason most teams stay stuck in efficiency framing is that efficiency is easy to measure and capability expansion is not. Hours saved has a number - “we can now do something we fundamentally couldn’t before” doesn’t fit neatly into a productivity dashboard (which your CXOs care about a lot ;) )

But that’s the metric that will actually separate the teams that look transformed in three years from the ones that just look faster. Not how much time they saved on existing tasks but what new tasks they built their function around.

Most teams are optimizing for the former - i hope you are not :)

Your AI memory is portable now, but the platforms are about to make that very complicated

piyush sagar mishra — Thu, 05 Mar 2026 01:45:38 GMT

Disclaimer: The views here are my own and do not represent my employer or anyone else.

There’s a Claude feature that shipped recently that i think has more enterprise implications than most people are treating it as yet.

Claudes memory import - you run a prompt inside Chatgpt or Gemini, it exports everything those models have learned about you - your communication style, your projects, your preferences, your working context - and you paste it into Claude. One copy-paste and claude picks up where another AI left off.

On the surface this looks like a switching cost feature, a consumer convenience, or competition between AI vendors fighting for retention.

I think it is more than that → at least three things simultaneously - a governance problem, a competitive intelligence risk, and the opening move in a platform war that is going to get ugly in ways the tech industry has seen before.

Memory is not a knowledge base

This is the distinction most enterprise AI conversations are still wrapping their head around, and it matters enormously for what comes next.

A knowledge base is explicit - the the system prompt your team wrote, the style guide you uploaded, the instructions someone encoded: always write in second person, never use jargon, do not use choppy sentences, our ICP is mid-market SaaS. You can read a knowledge base, audit it, version control it and hand it to a new employee on day one.

Memory is different- it is what the model infers from watching you work over hundreds of conversations - not what you told it, but what it noticed. The way you tend to reframe a question before you answer it, the fact that you write three drafts before you’re satisfied and you always cut the first paragraph, that you soften feedback with context before you deliver it, that when you say “interesting” you usually mean “i disagree.”, and that your strategic instincts run ahead of your data and you know it, so you ask for pushback.

None of that is in your skills.md or system prompt. You could not encode it if you tried - partly because you’re not fully aware of it yourself, and partly because the value is in the accumulation and texture, not in any single rule.

This is what twelve months of AI interaction actually produces —> not a better chatbot, but a model of how you think, and now that model is portable.

Umm, how is this an enterprise problem? what do we call it?

Most enterprise AI deployments assume a relatively clean boundary between the tool and the user. The company licenses the platform, sets the guardrails, owns the data layer, and the employee uses it within that context.

Memory portability breaks that model in at least three ways:

The first is context leakage: when an employee exports their AI memory from a company-licensed Chatgpt enterprise instance into a personal Claude account, what exactly are they moving? Technically it’s their preferences and communication patterns, but practically, those patterns are built on months of work conversations, internal project context, proprietary framing, strategic language absorbed from working on confidential things. The memory is not the data, but the memory is shaped by the data. That distinction is going to get tested in legal contexts that i don’t think have good precedent yet.
The second is institutional context loss: enterprises have spent the last two years trying to capture organizational knowledge - what sales learned from a lost deal, what a departing engineer knew about a system, what a senior marketer’s instincts were built on. AI memory is, quietly, becoming one of the richest repositories of individual working context that has ever existed. When someone leaves and takes that memory with them, or when a team migrates tools and the memory doesn’t transfer cleanly, enterprises lose something they don’t have good language to describe yet - it’s not a file, or a document, but closer to losing the person’s judgment - a distilled version of how they approached work, sitting inside a model they’re taking with them.
The third is governance without visibility: most enterprise AI policies are tool-level policies, i.e. which platforms are approved, what data can be uploaded, which outputs need review etc. Memory portability makes the unit of governance the individual rather than the tool - and enterprises are nowhere near equipped for that. Your CIO/CISO can audit what your employees uploaded to Chatgpt but cannot audit what Chatgpt learned about your employees over eighteen months of conversations, or where that learned context went when someone switched platforms last Friday.

Wait.. is there a GTM implication we need to talk about?

Of course we do..and i think sales and marketing are probably the most exposed here, and not for the obvious reasons.

The obvious reason feels like customer context: e.g. if a sales rep builds twelve months of deal history, objection patterns, and relationship nuance into their AI memory, and then leaves - or switches tools - that context walks out the door in a way that is harder to track than a downloaded Salesforce export.

The less obvious reason is that AI-assisted communication is becoming personalized at a level that reflects the individual, not the company. When a sales rep uses an AI that knows them deeply - their persuasion style, their customer vocabulary, their instinct for when to push and when to wait - the output starts to reflect a cognitive fingerprint as much as a company playbook. The line between “company voice” and “individual voice enhanced by AI” is dissolving; that has implications for brand consistency, for training, for what you actually lose when someone leaves.

The even less obvious reason is competitive intelligence - your AI memory reflects what you worked on. If a competitor hires your VP of Demand generation and that person imports their AI memory into the new company’s tools, you have not lost a google slides or power point deck, but potentially a distilled model of how your best marketer thinks about your category, your customers, and your strategy. The subtle stuff - the framing instincts, the prioritization patterns, the things that made them good - compressed into a transferable file.

It is already happening right now, invisibly, at companies that have no policy framework for it, and it’s only going to get harder from here.

Here’s the platform war i think this is about to trigger

This gets interesting at the vendor level, because the AI companies are about to face a strategic tension that every major platform company has faced before - and most have handled badly.

Right now Claude is making memory import frictionless because they are the challenger - they want to make it easy to leave Chatgpt and that is rational. Every platform that is behind on market share has played the interoperability card. Google made it easy to import contacts from Yahoo Mail, Spotify made it easy to transfer playlists from iTunes, Notion made it easy to import from Evernote and so on. The message is always the same: your data belongs to you, switching is painless, come on over.

What happens next is also predictable, because we have seen this movie before :)

Once the challenger becomes the incumbent - or even before, once they have enough users invested - the export experience quietly degrades. Note that it doesn’t eliminate, just degrades. The import button stays prominent, but the export button moves two menus deep, the data format becomes slightly proprietary, the exported file works technically but loses fidelity etc.

Facebook made it easy to import contacts for years, then made the exported data progressively less useful anywhere else. Linkedin imports your resume beautifully and exports a PDF that no other platform reads cleanly. Apple’s ecosystem is the canonical example of this - every piece of hardware imports from competitors gracefully, and exports to them in formats that technically comply with data portability regulations while practically making migration painful enough that most people don’t bother.

The AI memory version of this is going to be more subtle and more consequential than any of those. Because what degrades in export is not a contact list or a playlist, but an inference layer - the subtle cognitive pattern that made the memory valuable, and is genuinely hard to serialize cleanly. Which means vendors have a convenient technical excuse for lossy exports that will be very difficult to distinguish from deliberate friction.

The savvy AI vendors will also start building memory experiences that are structurally hard to replicate elsewhere - and i don’t think it will be done through lock-in of data, but through lock-in of depth.

The longer you stay, the more the model understands the things that cannot be encoded in a prompt, and thus the more it knows, the worse the cold-start problem feels when you switch. That is not a moat built on data, but built on accumulated inference, and it is significantly harder to regulate than traditional data portability because nobody can agree on what you would even export.

Enterprises buying AI platforms right now are making decisions that will look very different in three years when the export experience has quietly evolved. The procurement checklist for AI tools is going to need a new section: not just “can we export our data” but “can we export the thing the model learned, and does the export actually work.”

So, what good looks like?

Most enterprises are not ready for this conversation, but i think there are a few things the fast movers would do:

They would treat AI memory as an asset class, not a feature - - this means asking: what is accumulating inside these tools, who owns it, and what happens to it when people and platforms change?
They would update their offboarding processes: the same way a thoughtful legal team asks a departing employee to return physical materials and revoke system access, some companies would begin to ask: what did you export, and what did you bring with you? This sounds invasive today, but is probably going to become standard.
They will start thinking about memory architecture at the team level - the goal won’t be to prevent employees from having useful AI context, but to make sure that context doesn’t live entirely in individual memory stores. For example, shared projects, shared prompts, shared context documents that sit at the team layer and survive individual transitions. The challenge is that the most valuable parts of memory - the subtle inference layer, the cognitive pattern - would resist this. You can share a knowledge base, but cannot easily share what the model inferred from watching someone think.

And the smarter procurement teams would begin to start asking vendors harder questions about memory portability before they sign. Not “do you support export” - every vendor will say yes. But “show me what the export looks like in 18 months of usage, and show me whether another platform can actually use it.”, “Can i control what the export looks like in Enterpise versions vs. Consumer/personal versions”, “Can i track how many employees have hit the export/import button in last 3 months?”

I think the last question is going to separate vendors who believe in portability from vendors who are using portability as a growth lever while quietly building the walls.

But, oh the irony

The feature that makes AI more useful - continuity, context, not having to start over - is the same feature that makes the governance problem harder and the platform lock-in deeper. An AI that truly knows how you work is more valuable to you precisely because it captured the things you couldn’t have written down yourself; that’s what makes it powerful and that’s also what makes it ungovernable by conventional means, and what makes the cold-start cost of switching grow invisibly over time.

We have spent decades arguing about data portability at the file level, and now we are about to have a much stranger version of that argument at the cognition level.

The policy frameworks, the legal precedents, the offboarding checklists - none of them exist yet for this. For sure, the platform playbooks exist, and we’ve seen every move before.. it’s just that we just haven’t seen them played with this kind of asset.

We spent twenty years keeping bots out. Now they're the ones we need to let in

piyush sagar mishra — Mon, 02 Mar 2026 01:16:04 GMT

Disclaimer: The views here are my own and do not represent my employer or anyone else.

There’s a quiet assumption baked into almost every B2B SaaS product ever built, and nobody wrote it down because it was pretty obvious: the customer is a human.

You can see that assumption everywhere once you look:

signup flow that requires a verified email and a credit card (oh yes, that card has to be linked to a human)
onboarding sequence that sends three welcome emails and waits for someone to click through them to decide what’s the next best event-based nurture they should put you in (we’ve all built event-based nurtures, haven’t we?)
the permission prompt written for someone who is reading it carefully
Billing page designed around a person who will notice the charge, react to an anomaly, and make a decision

Every friction point, every design choice, every assumption about how a new user enters and moves through a product - all of it was built for someone with eyes, opinions, and a mouse.

But, that assumption is quietly becoming the most expensive technical debt in B2B software.

For most of the history of the internet, bots attempting to automate account creation were a problem to be eliminated, not a segment to be served. They were spammers, scrapers, fraudsters running volume plays, so the industry responded rationally: CAPTCHA, email verification, manual review queues, credit card requirements on free tiers, and tons of startups and enterprises who helped you solve for it. There were still gaps in the tooling, and that’s why you saw your monthly activations climb to a 25% Y/Y growth on Sep 1, but after 6 weeks of manual bot/fraud tagging and threshold adjustments, it’d dip to 12%. Every tooling was designed to ensure a human being was on the other end of every new account.

This is what Jared Friedman shared last week:

Agents are being deployed inside enterprises to research vendors, evaluate tools, spin up trial accounts, run integration tests, and return a recommendation. They are doing work that a junior analyst or solutions engineer used to do, acting as the first touch in a procurement motion before a human ever gets involved. And they are hitting walls at every step - because the entire infrastructure of B2B software was built for someone who can read a modal, click a verification link, and fill out a form.

The shift that is coming is not about adding an API to a product that was built for humans- most SaaS products already have APIs.

The gap is everything around the API: the human-assumed infra that agents cannot navigate and were never meant to:

Account creation requires clicking a verification link sent to a human inbox
Core functionality sits behind an onboarding flow designed for people who need to be taught
Scoped access requires navigating a permissions UI that assumes someone is reading and deciding
billing requires a credit card attached to a human account
Error states return modal dialogs written for a person to read and respond to, not structured data a system can parse and act on.

An agent encounters all of this and stops; not because it isn’t capable -but because the product wasn’t designed to be used by anything other than a person. The product either supports programmatic interaction from the first touchpoint or it doesn’t, and there is no patient middle ground where an agent eventually figures out your welcome email sequence.

Agent-native software is built differently from the ground up - (a) account creation via API without human verification steps, (b) scoped, programmable access controls configurable without a UI, (c) usage-based billing attached to an agent identity rather than a human credit card, (d) onboarding expressed as documentation and endpoint behavior rather than guided tours and drip campaigns, (e) errors that return structured, actionable data rather than messages written for human eyes etc.

For most SaaS companies, closing that gap means revisiting assumptions that run years deep - the signup architecture, the billing model, the permission layer, the way errors surface..none of it is cosmetic, ha! All of it was designed around a user who shows up, clicks around, develops preferences over time, and has a relationship with the product. Alas, agents don’t do any of that.

The second order implication is the one most product teams aren’t thinking about yet: who gets chosen. In a world where agents are assembling their own stacks, the selection dynamic changes fundamentally.

E.g., a human evaluating tools does research, reads reviews, sits through demos, and builds vendor relationships over weeks. An agent picking a stack reaches for the tool it can use without friction - fully API-accessible from the first interaction, programmable, no human-in-the-loop required to get started. The decision happens in a planning loop, not a buying committee, and it is based entirely on what is available and what works.

The developer tools that understand this are not treating agent-readiness as a roadmap item for next year. They are rebuilding the front door - signup, access, billing, error handling - as if the first user through it might be a system rather than a person. That is not a small change in how a product is built, but rather in who the product is built for.

The companies that move fastest here don’t just acquire agent users. They get chosen first, repeatedly, at scale, by systems that don’t browse review sites or respond to outbound sequences. That is a fundamentally different kind of product-market fit than anything B2B SaaS has optimized for, and the window to build for it before it becomes table stakes is closing faster than most roadmaps reflect.

Your best future customer might not be human…the question is whether your product is aware of that.

What happens when your AI tokens talk?

piyush sagar mishra — Tue, 24 Feb 2026 01:30:22 GMT

Disclaimer: The views here are my own and do not represent my employer or anyone else.

Tom Tunguz wrote something quietly important last week - his inference spend went from $7k to $100k annualized in two quarters, and he framed it as the emergence of a fourth compensation component, salary, bonus, equity, and now tokens.

As i finished reading the article focusing on the fact that inference/tokens is potentially turning into a soft-compensation data, it also occured to me that for the first time in history, thinking is leaving a trace.

Every prior attempt to measure knowledge work collapsed into proxies, outputs, and a lot of theater. For example, OKRs measured what you shipped, not how you thought. Stack rankings, something i continue to detest to date, measured perception as much as performance. Even the most sophisticated people analytics tools were fundamentally backward-looking, reconstructing cognitive effort from artifacts that were already weeks old by the time anyone looked at them.

The work itself, the actual motion of a mind engaging with a problem, left nothing behind. Inference spend changes the texture of that problem in a way that’s easy to underestimate.

For the growing layer of work that runs through observable infrastructure, what is emerging is a real-time record of cognitive activity:

A solutions engineer who spends three hours in Claude iterating on a technical proof-of-concept, running fifteen variations before landing on the right architecture, leaves a very different token signature than someone who generates one response and pastes it into a deck unchanged
Or, a marketing strategist who uses AI to pull competitive intelligence, stress-test messaging against six different buyer personas, and rewrite a positioning brief four times before sending it to the CMO looks nothing like someone who asks AI to clean up a paragraph

The signature reveals how fast someone moves from problem to action, how deeply they iterate, whether their tool usage reflects genuine problem-solving or the appearance of it etc. These signals are imperfect, but once signals exist, organizations find ways to use them (e.g. Github commit logs for engineering productivity, email response time as a performance and team satisfaction signal, Slack activity metrics becoming a management tool, and so on).

This lineage adds a lot of resolution to the gap between perceived and actual contribution. Every organization carries this gap, between the person whose perceived contribution matched their actual output, and the person who was believed to be indispensable based on presence, confidence, and the complexity they added to simple things. That gap persisted because knowledge work was unobservable. Inference data doesn’t close it overnight, but it introduces a pressure that compounds quietly over time.

I think this matters most for the knowledge worker whose value lived in the white space, the person who coordinated, held institutional memory, and shaped decisions without owning them. That contribution is real, but it was evaluated mainly through social proof rather than evidence, and opacity was very protective. It’s becoming something you now have to actively maintain.

The other deeper irony worth talking about is that the same AI infrastructure compressing the cost of execution is simultaneously making the quality of judgment more legible than it has ever been. These two forces are usually discussed separately, one as a productivity story and the other as a surveillance concern, but they are the same phenomenon viewed from different angles. The token is both the unit of production and the unit of measurement, and that duality is entirely new.

Tom got to 12% of his original inference cost over a weekend with identical performance. The person still burning $100K in tokens had better be producing something meaningfully different, and now there are tools to check.

We spent fifty years trying to measure knowledge work and failed because thinking left no trace. The token doesn’t solve that completely, but it solves enough that the underlying assumptions of how careers are built and how value is attributed are quietly due for revision. Worth thinking carefully about what those tokens will say about you 🤖

Your attribution model is perfectly measuring the wrong journey

piyush sagar mishra — Sat, 21 Feb 2026 11:07:02 GMT

Disclaimer: The views here are my own and do not represent my employer or anyone else.

Marketing attribution has always been a mess, but it was a manageable mess. You had a finite number of channels, a rough methodology, and an unspoken agreement across the organization not to look too hard at the math 🤫. Multi-touch models distributed credit across touchpoints, position-based models weighted weighted the first and last interactions more heavily, time-decay models favored recency, and the most sophisticated teams experimented with causal and incrementality frameworks to get closer to the truth. Everyone had a preferred model, and everyone moved on. Budgets got allocated, campaigns got greenlit, and the imperfection was tolerable because at least it was consistent.

AI has broken that truce, and it has done it in several directions at once.

Let’s start with a buyer journey you and I can relate to.

A VP at a mid-market company starts researching your category. She opens ChatGpt and asks it to summarize the competitive landscape. Your brand appears in the output, framed in language you didn’t write and can’t control yet, and she forms an initial impression before ever touching anything your team can measure. That interaction never appears in your attribution stack; it happened, it mattered, and to every platform in your revenue stack it simply doesn’t exist.

A week later she visits your website after a colleague mentions your company in a slack channel. She lands on a page dynamically generated by your new AI content engine based on her firmographic profile. She spends four minutes on it, reads the case study section, and scrolls to pricing. Your CMS logs the session but has no way of knowing the experience she had was entirely different from what your last visitor saw. The personalization worked, but the signal is lost.

Your AI SDR then sends her a sequence two days later, triggered by an intent signal from a third-party provider. She doesn’t reply, but she reads it carefully and forwards it to a colleague with a note saying “this is exactly what we need.” That forward, arguably the most valuable signal in the entire journey, generates no data whatsoever. Three weeks later, after attending an AI-summarized recap of a virtual panel she never actually watched live because the agenda felt too boring, she books a discovery call. Your W-shaped or time-decay attribution model assigns the majority of credit to the content syndication touchpoint that happened to fire the day before she converted.

That is not a hypothetical anymore. That is tuesday.

The most sophisticated revenue teams today have obviously moved well beyond single-touch models. Not promoting any of these, but several platforms such as Hockeystack, Dreamdata, and Segmentstream etc. have done some really good work around account-level journey mapping, connecting CRM data to multi-touch influence across buying committees rather than individual contacts. The shift from contact-level to account-level attribution was the right move, and the teams that made it early are measurably better at understanding pipeline than those still working at the lead level.

But pretty much all major platforms are running into a structural wall that better tooling alone cannot solve. They can track what happens inside the observable infrastructure (your website, your Crm, your ad platforms, your marketing automation etc.). What they cannot see is the rapidly expanding layer of AI-mediated research that happens before a buyer ever touches anything you own. Quotes from research firms have been varying, but the median range suggests B2B buyers complete somewhere around 60-70, in some cases up to 75 percent of their evaluation before engaging with a vendor directly. AI tools are accelerating that shift, and the gap between where influence actually happens and where your attribution platform looks for it is widening every quarter.

The deeper problem is a closed loop that is going to compound over time. AI-powered attribution platforms are being used to measure journeys that are increasingly shaped by AI-generated touchpoints. The model tells you what’s working, you invest more in it, the model gets trained on that investment pattern, and the cycle reinforces itself regardless of whether the underlying causal logic holds. An account-level journey platform might correctly identify that accounts engaging with your thought leadership content convert at a higher rate. What it cannot tell you is whether your thought leadership is influencing those accounts, or whether accounts that were already inclined to buy are simply more likely to consume content along the way. The correlation is real, but the causality is still pretty much assumed.

The reframe i think matters more

The instinct when measurement breaks down is to find a better measurement tool; that instinct is understandable but increasingly insufficient. The more durable shift is a different philosophy entirely - one that accepts more uncertainty at the individual touchpoint level while deliberately getting sharper at the account and revenue level.

What does that actually mean in practice? It means treating account-level engagement velocity as a more reliable signal than any individual touchpoint. When multiple stakeholders at a target account are consuming content, responding to outreach, and engaging with your SDR motion within the same thirty-day window, that cluster of signals tells you something meaningful that no single attributed touchpoint can. The question worth asking isn’t “which channel sourced this opportunity” but “what combination of signals, across which roles, over what time horizon, correlates with accounts that close and then expand.”

It also means reorienting around outcomes you can measure with higher confidence rather than influence you can only approximate. Demand lift from ICP, pipeline velocity, the rate at which accounts move from first engagement to opportunity to close, is more honest than channel attribution because it reflects the aggregate effect of everything marketing did rather than a model’s best guess at decomposing it. Accounts that experience coordinated, multi-threaded engagement across marketing, SDR, and content tend to move faster. That observation is actionable even though as of today you can’t fully attribute why.

It also means figuring out innovative ways to invest in qualitative feedback loops that no platform can replace. win-loss interviews conducted within two weeks of a decision remain one of the richest sources of influence data available to any marketing team - what buyers say shaped their decision, which competitors they seriously evaluated, where they first heard about you etc. Most teams treat these as a nice-to-have, but the best ones are building systematic programs around them in close partnership with sales.

Unfortunately, i know very few teams doing genuinely good work here - and the ones that are have stopped arguing about which metrics to pick for annual planning, or how to combine MMM, MTA and other frameworks to do channel allocation in isolation.

The budget meeting nobody is ready for

Q1 is almost about to end and before we know it, we will be Q3. At that point, most marketing teams will start thinking about FY2027 planning with attribution data that is confidently directional but structurally incomplete in ways they cannot detect. The channels that receive the most credit will be the ones easiest to observe, not necessarily the ones doing the most work. There will be an ask from CMO to cut down the budget, and when it comes to annual budget reductions, no one would care about the attribution model anyway - the investments most at risk of being cut would be the ones operating furthest from conversion, building the brand presence and thought leadership that shapes AI summaries, peer conversations, and dark funnel research months before a buyer ever raises their hand.

The measurement gap is real and it will not be closed by the next generation of attribution tooling alone, at least not in the near term.

I am getting back to the world of marketing strategy and analytics after a bit of focus on Growth and Innovation. And, i am very excited to build novel approaches to think about where should $$ go and be tracked and validate some of the new world hypotheses i have, along with my marketing, sales and broader GTM friends.

Will share more as i learn more.

Why are enterprise GenAI deployments hard

piyush sagar mishra — Tue, 20 Jan 2026 12:03:03 GMT

Most of the enterprises have a similar GenAI story - a senior exec sees a demo on Twitter and declares “we need this for our business.” A team builds a prototype in two weeks that looks incredible.

Three months later, it’s still not in production. The team is firefighting issues nobody anticipated, and the business is wondering why this “simple” AI project which was live on Twitter in a week consumed so much time.

GenAI models are powerful, but they know nothing about your business. They don’t know your products, customers, processes, or institutional knowledge. That information has to be found, organized, kept current, and delivered to the AI at exactly the right moment.

This is the knowledge layer problem. And it’s the #1 reason production deployments take longer than anyone expects.

The pattern is familiar across enterprise technology. New capabilities arrive with impressive demos. Then teams discover the real work is in integration, data, and processes - not the technology itself.

The technology always works. The integration work is where teams spend their time.

We’ve seen this movie before, haven’t we?

There’s a pattern in technology that keeps repeating - a powerful new capability arrives and early demos are magical. Then reality sets in, and the bottleneck turns out to be something unglamorous that has nothing to do with the technology itself.

The internet democratized information but we got more confusion alongside more access. Cloud computing made servers trivial to provision but companies found their AWS bills spiraling and systems just as complex as before.

GenAI is following this exact path. The models are incredible. Using them in production requires solving a massive knowledge management problem that most companies didn’t know they had.

We’re calling this “AI deployment” but the AI is actually the easiest part. What we’re really doing is confronting decades of poor knowledge management. The AI just makes the problem impossible to ignore.

The five knowledge layer problems breaking GenAI

(1) your knowledge lives everywhere (and nowhere)

Companies look at their documentation and think “Oh, the knowledge problem is solved. We have Confluence, CRM, support tickets. Just point the AI at it”.

This is like looking at books thrown across different buildings with no catalog system and thinking you have an organized library.

What usually ends up happening is that the Product documentation in confluence hasn’t been updated since the last release eight months ago, latest features exist only in Jira tickets and Slack channels.

Sales methodology is scattered across google drive with 47 folders. Files named “Sales Playbook Final v3” and “Final ACTUALLY FINAL.” Nobody knows which is current. Different regions have contradicting versions.

Customer success processes are in Notion. But the real knowledge - the workarounds that work - lives in Slack.

“Ignore the official process, here’s what you actually do.”

Pricing is in spreadsheets by region. Special exceptions in email threads. Enterprise deals have custom terms in PDfs in someone’s drive.

Marketing campaign performance is split between google analytics, hubspot, salesforce, and that custom tableau or looker dashboard. Campaign strategies? Buried in quarterly review slides nobody looks at again.

Legal policies are in sharepoint or confluence behind access controls. Product roadmap exists in four places with four different versions.

Your GenAI needs all of this to answer one question. But first you need to connect 15 systems with different APIs and authentication, figure out what’s current, resolve conflicts, and handle the fact that half these systems don’t have APIs.

You spend three months getting access. Three more building connectors. Then you discover the real knowledge isn’t in any system - it’s in people’s heads and slack conversations.

(2) when you do find it, it contradicts itself

Product marketing says you integrate with “all major CRM systems.” Technical docs list salesforce, hubspot, pipedrive. Sales materials say “salesforce and others.” A case study mentions custom dynamics integration as well.

Which version should your AI tell customers?

Your return policy says 30 days on the website. Support knowledge base says 30 for standard, 90 for enterprise. An internal memo extended it to 60 for a promotion that maybe ended. Customer success tells people 45 days as compromise.

These contradictions exist everywhere. Policies change faster than documentation. Different teams optimize for different things. Special cases create exceptions never properly codified.

Humans navigate this through judgment and knowing who to ask. Your AI sees conflicting sources and guesses. The AI isn't always hallucinating - it's accurately reporting the conflicting mess you've been papering over with “just ask Sarah” for three years.

Most companies don’t have knowledge; they have information. Knowledge implies organization and a single source of truth. What they have is a pile of information with no systematic way to separate signal from noise. So, every company thinks their knowledge problem is “we need better search.” The real problem is you're searching through garbage.

Building a knowledge layer means someone makes hard decisions about what’s true and authoritative. This isn’t technical. It’s organizational - confronting how poorly managed your information actually is.

(3) keeping it current is harder than building it

Let’s say you do turn brave and manage to clean up contradictions, create one authoritative knowledge base, and now everything is consistent and up to date.

It’s out of date in a week.

Product launches a feature → pricing changes → legal updates a policy → you close a deal with special terms support needs to know.

There’s no pull request for “update the AI’s understanding of our pricing.” No automated test to verify the knowledge base reflects reality. No deployment pipeline.

Instead, someone is supposed to remember to update confluence, training materials, support knowledge base, tell the AI team, update sales decks, and notify everyone. Except they’re busy. They forget. Or update one place but not others.

The product team launches a feature and updates their docs. They don’t tell the AI team. Two months later, a customer asks your AI about it. The AI says it doesn’t exist. The customer emails support saying your AI is useless.

Or marketing runs a promotion changing pricing, updates the website, but forgets the knowledge base. Your AI quotes old pricing to prospects for three weeks.

This happens constantly. The knowledge layer degrades daily as business evolves faster than documentation.

By the time, teams reach stage 3 (if they do reach this stage), they realize that keeping a knowledge layer current is harder than building it. It requires processes, ownership, and ongoing maintenance. Most companies are terrible at this even without AI.

(4) different teams have different documentation standards

Engineering documents well. They have to. Code breaks if documentation doesn’t match. They use git, track changes, write specs, and all the other good stuff.

Sales and marketing are the opposite. And this is where customer-facing AI struggles the most with information.

Marketing launches campaigns constantly. Where’s the documentation of what worked? Scattered across post-mortem slides looked at once. Campaign briefs in one place, creative assets elsewhere, performance data in a third system, strategic reasoning nowhere.

Sales is not much better (umm, sorry my friends). Deal info is in salesforce but incomplete. Reps log the minimum required. The real information - why deals were won or lost - is in their heads or email threads not connected to CRM yet.

You want your AI to suggest sales strategies based on similar deals? The information doesn’t exist in consumable form. You have data points - deal size, industry, timeline. Not the story - what customers cared about, what objections arose, what messaging resonated. It’s easy to get that info but integrating is hard. Maintaining that integration is harder.

In the past a lot of teams got away with not documenting because they work through relationships and conversations. Engineering can’t because code either works or doesn’t. Sales and marketing could because success is fuzzier and knowledge is personal.

GenAI is forcing these teams to document things they never had to. They’re discovering it’s incredibly hard to retroactively capture years of institutional knowledge that only exists in heads.

(5) retrieval is a really hard optimisation problem

Even if your knowledge is centralized, consistent, current, and comprehensive - you still have retrieval.

Your knowledge base has 10,000+ documents. Someone asks: “What’s our refund policy for enterprise customers in europe?”

Your system must understand what they’re really asking, identify “enterprise” means specific customer segments, recognize “Europe” might have regional requirements, search 10,000 documents, find the right policy (not consumer returns, not general terms), find the section on geographical variations, check for recent updates, rank by relevance, assemble into context within token limits, and hope it’s right.

This fails constantly:

You retrieve a pricing document mentioning refunds in passing. It matches search terms but isn’t about refunds. The AI uses it and gives incomplete answers.

You miss the actual policy titled “Customer success remediation guidelines” that doesn’t say “refund” prominently.

You get the right document but wrong section - the policy differs for hardware vs software. The question didn’t specify.

You retrieve too much context and hit token limits. You drop the section about european requirements because it seemed less relevant, but that was the key part.

The policy was updated three months ago but the old version remains. Your retrieval finds both. The AI picks wrong.

Companies spend months tuning retrieval - trying different embedding models, adjusting ranking algorithms, experimenting with chunking strategies. You fix retrieval for one question type and break it for another. The issue is that you can't patch institutional knowledge debt with a better embedding model. This is like trying to fix a hoarder's house with a better filing system.

And what nobody talks about is that even with perfect retrieval, you’re constrained by context windows. This is definitely getting better but you’re always going to be trading off (at least for next 12-18 months) comprehensive context against some form of constraints (context, pricing limits etc.)

Intelligence without knowledge is just expensive guessing

Understanding this truly takes lots of doing the AI deployments in enterprises 👇:

Human expertise is valuable because humans compressed years of knowledge into intuition. They don’t retrieve documentation to answer questions all the time. They just know.

AI doesn’t work that way computationally - every answer needs explicit context retrieved fresh. No traditional human form of intuition, no compression, no shortcuts. Intelligence without internalized knowledge is inherently expensive and fragile.

Every interaction made without hardwired knowledge costs money - not just the API call, but the entire context assembly as well. Search your vector database - compute cost. Retrieve documents - storage and processing. Rank and filter - more compute. Assemble into context - 5k tokens before the AI thinks.

That $500 pilot becomes $15,000 monthly at scale. The context tax is real and unavoidable.

Knowledge is also political

Deciding what goes in the knowledge layer is also political. Sales wants competitive positioning emphasized. Legal wants compliance disclaimers. Marketing wants brand consistency. Customer success wants accurate timelines over optimistic ones.

When marketing says you “work with all major platforms” and engineering says “we support these three integrations,” which is authoritative?

The dirty secret is that companies don’t have a single source of truth because different groups benefit from ambiguity. Sales can promise features “on the roadmap” without committing. Marketing can claim capabilities that technically exist but aren’t production-ready. The reason sales doesn't document is the same reason they'll resist AI: the ambiguity is a feature, not a bug. Pinning things down reduces flexibility.

Making this explicit forces conversations organizations avoided for years. The AI project stalls not from technical problems but because nobody wants to resolve underlying ambiguity.

So how are some companies making it work so far?

Companies succeeding aren’t boiling the ocean but rather picking very narrow domains where knowledge is manageable. Instead of “answer any question,” they start with “help sales reps with pricing questions.” The knowledge layer for pricing is finite - 50 documents that change quarterly. They can keep this current.

They’re realistic about accuracy, build human review for important decisions and accept the AI will sometimes be wrong and design workflows that catch or mitigate errors.

They (and the leaders in the company) invest as much in knowledge management as AI, create ownership for the knowledge base, build update processes and treat documentation as a product requiring ongoing investment.

Most importantly, they stop calling it an AI project. They call it what it is: a knowledge management project that happens to use AI.

GenAI deployment is exposing what companies ignored for years: terrible knowledge management. Information scattered across incompatible systems. Documentation contradicting itself. Nobody knowing what’s current. Critical knowledge only in people’s heads.

This was always a problem. You got away with it because humans navigate ambiguity through relationships. AI can’t. It needs explicit, structured, current, consistent information.

GenAI is forcing companies to confront their knowledge management debt. The bill is higher than anyone expected. The demos work because oftentimes they use curated information in controlled settings. Production fails because reality is messy and nobody solved the underlying knowledge problem. We’re not in an AI deployment crisis. We’re in a knowledge management crisis that AI made impossible to ignore.

i have said this before as well - companies that succeed won’t have the best models or biggest budgets. They’ll be the ones that finally do the unglamorous work of organizing and maintaining their institutional knowledge.

The AI is the easy part. Everything else is hard.

How does SaaS evolve when building takes a weekend

piyush sagar mishra — Sat, 17 Jan 2026 06:07:47 GMT

Aaron's observation here captures a shift I've watched unfold firsthand. When I started working in tech in the early 2010s, building software meant something completely different than it does today. I watched engineering teams routinely spend months building internal tools because packaged software never quite fit their specific needs.

Take project management for example - Microsoft project and Basecamp existed and worked fine for traditional project planning. But engineering teams doing agile development needed something quite different - sprint planning, story points, burndown charts, continuous deployment tracking. The initial artefacts of existing tools were built for waterfall, not iterative development. So I saw teams build their own: custom jira plugins, homegrown dashboards, internal sprint management systems.

By 2012-2013, enough companies had built similar solutions that entrepreneurs recognized the pattern. Atlassian evolved jira to support agile workflows. Asana and Monday.com launched with native sprint support. The custom solutions had revealed what the market actually wanted. This first cycle took roughly 5-7 years from widespread custom building (2006-2011) to mature packaged alternatives (2012-2014).

Then the cycle repeated: By 2016-2017, even these agile-native tools weren’t enough. Teams needed cross-functional coordination - linking github commits to jira tickets to production incidents to slack notifications. So they built custom integrations again: bots, webhooks, Zapier workflows, internal dashboards pulling data from five different tools. The pattern emerged again, and by 2019-2020, companies such as Linear, Height, Shortcut launched as “engineering-first” tools with native git integration and workflow automation built in. Another 5-year cycle.

For decades, this is how SaaS evolved through a predictable cycle: customers built custom solutions to fill gaps, entrepreneurs spotted patterns across these custom builds, and new packaged software emerged to serve the market. It was methodical and slow. That world is gone. AI is compressing these cycles from years into months - or in some cases even weeks.

The AI compression

AI hasn’t eliminated this evolutionary cycle but has dramatically accelerated it and changed how it manifests. Here’s what’s different now:

Micro-SaaS explosions are happening in weeks, not years. Someone tweets “I built a GPT wrapper for sales emails” and within days, fifty similar apps launch. The market that took years to develop now consolidates to 2-3 winners within months. Look at AI SDRs for example: companies were building custom AI email workflows in early 2023, and by late 2023, 11x, Artisan, and AiSDR had all launched as packaged products. Six months later, companies are already building custom layers on top again - proprietary data enrichment, vertical-specific messaging, multi-channel orchestration.
Configuration is becoming the new customization. You don’t need to build custom software anymore - you configure AI behavior through prompts and context. But this creates new standardization opportunities. Someone will package “the 50 best Claude prompts for sales teams,” people will customize those for their specific needs, and someone else will package those vertical-specific versions. The cycle runs faster because the barrier to both customization and packaging has collapsed.
The cycle is oscillating continuously instead of proceeding in clean stages. Take tools built on Claude - companies build custom workflows, share what works on Twitter, others package those patterns into templates, users immediately extend those templates with their own modifications, and the cycle repeats weekly. Companies and their users are iterating together in real-time, over weeks instead of years, with each side learning from the other continuously.
Distribution is becoming the moat, not product differentiation. When anyone can spin up an AI sales email tool over a weekend, Lavender, Instantly, and Smartlead all do roughly the same thing. Custom solutions using Claude and a CSV file work fine technically. But companies buy Lavender anyway not because it’s uniquely capable, but because enterprises trust known vendors. The cycle shifts from “build better capabilities” to “build trust and integration ecosystems.”

The fundamental insight remains true: packaged software emerges when enough customers want the same thing. But “enough customers” now means dozens instead of hundreds, and “emerges” means months instead of years. The cycle hasn’t disappeared but rather fractalized into multiple layers iterating simultaneously, each moving at AI speed.

So, what happens when building becomes free?

We’re not just seeing faster product cycles now but also watching the line between using software and building software disappear entirely. When you can describe what you want in plain english and have it exist seconds later, the old startup question “what should we build?” stops making sense. The hard part isn’t building anymore. It’s getting people to pay attention to what you built.

Software is becoming so easy to create that the code itself barely matters. What matters is whether people trust you enough to use your version instead of building their own.

Think about why salesforce still dominates CRM despite countless competitors with better features - not because salesforce has the best UI (umm!) or the fastest performance. It’s because your sales team already knows it, your CS team has dashboards built in it, your revenue ops have years of data there, and your finance team’s commission calculations depend on it. Switching would mean retraining 200 people, rebuilding 50 integrations, and risking your 2026 pipeline during migration. The switching cost is so high that even a meaningfully better product can’t win.

This is what “becoming the place everyone agrees to meet” means in practice. Products win by accumulating gravity - more users means more integrations, which means more consultants who know the tool, which means more templates and best practices, which means more new users. Once this flywheel starts, it’s almost impossible to stop.

The same logic applies to trust - when your procurement team evaluates AI tools, they’re not just comparing features. They’re asking: Will this vendor exist in two years? Will they keep our data secure? Do other enterprise customers use them? Can we get them SOC2 certified? Will our auditors approve this? A startup with better AI might lose to an established vendor simply because the buyer needs to justify the purchase to their boss, and “nobody ever got fired for buying [established vendor]” still holds true.

This dynamic creates several specific implications for what comes next:

Vertical software might disappear. Why would a hospital buy “AI medical scribe for radiology” when they can take Chatgpt Enterprise and load it with radiology protocols in an afternoon? The value shifts from vendors who hard-code industry knowledge into software to platforms that make it trivially easy to add any context. Doximity and Athenahealth spent years building healthcare-specific features. Now a general AI platform with the right prompt library might be good enough.
Professional services will become way more valuable than software. If the code is commoditized, the humans who configure it capture more margin. Look at what’s happening with Zapier and Make - the platforms are cheap, but companies pay consultants $200/hour to design the right automation workflows. As AI makes building easier, implementation and strategy consulting could become more lucrative than selling licenses. The Accentures of the world might matter more than the SaaS vendors.
Integration access will become the real moat. When twenty AI SDR tools use the same underlying models, the winner is whoever has the deepest access to salesforce, outreach, linkedIn, and zoomInfo APIs. The product moves beyond AI to having OAuth connections to 50 enterprise systems and the ability to read context from all of them. This is why salesforce and hubspot are building AI features aggressively: they already own the data pipes.
Software will become ephemeral and generated on-demand. Instead of maintaining one product for thousands of customers, why not generate a custom version for each buyer? Imagine an agentic layer that spins up a bespoke CRM for your specific sales process, generates it fresh when you sign up, and throws it away when you churn. Software starts looking more like consulting deliverables than traditional SaaS - built once per customer, not built once for everyone.
Compliance timelines will end up becoming the biggest bottleneck. In regulated industries, getting SOC2, HIPAA, or FedRAMP certification still takes 12-18 months no matter how fast you ship features. First movers who clear compliance hurdles get 12-month head starts that are almost impossible to overcome. By the time competitors get certified, the market has already moved to the next thing. Harvey (legal AI) and Nabla (medical AI) are winning because they got through compliance first and not because they have better models.
Community becomes uncloneable. I love this one! When your product can be replicated in a weekend, the community around it can’t be. Notion’s 10,000 template creators, airtable’s consultant network, figma’s design system libraries - these took years to build and can’t be copied by a competitor with better features. The software becomes the excuse to join the community, not the product itself.

I believe that this evolutionary cycle will keep going. But when evolution happens this fast, we’re not watching slow adaptation over years - we’re watching new species appear every week.

The companies that survive won’t necessarily be the best builders or the fastest movers. They’ll be the ones that understand what actually matters has shifted: from “can we build this?” to “do customers trust us enough to use our version instead of the fifty others launched this month?” Building the product is table stakes now. Everything else - the integrations, the compliance certifications, the consultant networks, the community - is what’s going to determine who wins.

While the world waits for AGI, let us get B2B funnel metrics optimized in 2026

piyush sagar mishra — Sun, 04 Jan 2026 15:51:10 GMT

Something always bothered me about the marketing funnel - the way we built entire strategies around Lead volume, velocity, MQL-to-SQL conversion (or MQA-to-SQA, for my more sensible friends) as if these stages reflected how people buy, when really they just reflected what our systems could track.

Here’s what actually happened:

Relational databases in the early 2000s couldn’t efficiently store unstructured relationship data, so CRM platforms adopted stage gates from traditional sales methodologies as the core data model. We invented discrete stages - MQL, SAL, SQL, Opportunity - because that’s what our systems could process and what aligned with how databases needed to structure information.
Sales teams couldn’t engage with thousands of inbound leads simultaneously, so we created lead scoring that assigned points (50 for a whitepaper, 100 for pricing page visit), and later progressed to more sophisticated scoring, as a prioritization mechanism, even though these scores often didn’t correlate well with actual purchase intent.
Marketing automation platforms had no way to generate personalized content at scale, so we created segmentation frameworks and called them “buyer personas.”

These were reasonable engineering solutions to real constraints. The problem is we forgot they were engineering solutions.

We optimized everything around these constraints. Lead scoring became increasingly sophisticated - ensemble propensity models with demographic, behavioral, engagement, firmographic, and third-party signals (guilty) - but it remained just a prioritization mechanism for scarce sales capacity. Nurture tracks became elaborate multi-touch sequences, but they were still batch processing. A VP of Engineering at a fintech company and a CFO at a healthcare provider would both get the same lead score and enter the same nurture track, receiving identical emails over six weeks even though one cares about technical architecture while the other cares about ROI.

We knew this was suboptimal, but our systems couldn’t handle more. Platforms slowly evolved - we added personas and channel mix, built dashboards to show funnel by those dimensions - but these were incremental and siloed improvements within the same constraint-based framework.

With what I’m seeing at Twilio and elsewhere, I’m hopeful we’ll replace this narrative soon.

AI systems can now maintain complete contextual memory for thousands of accounts simultaneously - every interaction, signal, and conversation across years. They engage at whatever velocity each account needs and generate genuinely personalized narratives. The technical limitations that necessitated the funnel are disappearing.

What I’m most excited about is the shift to continuous relationship orchestration. Instead of “this account hit grade A, route to SDR,” we’re seeing systems that understand complete account context and determine optimal next actions dynamically:

One account needs a technical architecture discussion because their engineering team is evaluating alternatives and their previous vendor implementation failed due to integration complexity.
Another needs CFO-focused ROI content because they just entered budget planning season and historically make purchasing decisions in Q4.
A third needs implementation case studies from their specific vertical because their new VP of Engineering is particularly risk-averse after a bad competitor experience.
A fourth account that’s been quiet for 6 months suddenly shows API documentation traffic from multiple IP addresses - their dev team is actively evaluating, even though no one filled out a form.

The system knows all of this not because someone manually updated Salesforce, but because it’s maintaining continuous context across every touchpoint.

What should we start measuring soon?

Sales and marketing leaders still need to run businesses, forecast revenue, measure productivity, and justify budgets, don’t they? The question isn’t whether we measure, it’s what we measure.

Here are my top metrics and ideas that I think should slowly replace the traditional funnel metrics:

Stakeholder coverage: Are we connected to the right people for this account’s decision process? For enterprise deals, you need procurement, IT, security, finance, and business stakeholders. Coverage metrics should show gaps in relationship mapping. Having a heatmap that shows % of accounts with 1/2/3+ contacts is pretty ‘90s. So is adding random contacts to Salesforce from your whitepaper downloads list.
Account engagement depth: Not “did they download something” but “how complete is our understanding of their buying context?” This should get measured by some sort of context completeness scores - do we know their technical requirements, budget constraints, decision timeline, key stakeholders, and past evaluation patterns?
Intent signal strength: This is already established in most of the larger B2B/SaaS companies as some form of lead scoring model, but would be interesting to have more 3rd party signals go into a real-time composite score - based on actual buying behavior - industry changes, company’s org structure, technical documentation access, multi-stakeholder engagement, pricing page visits, API evaluation activity, competitive research patterns. Sort of signals that actually correlate with purchase intent.
Time-to-relevant-engagement: How quickly can we get the right message to the right stakeholder based on their actual needs? This would likely replace “time through funnel stages” but would measure speed to value, not speed through arbitrary stages.
Account readiness scores: Dynamic assessment of buying signals across the entire account, not individual lead scores. Is there budget movement? Are multiple stakeholders engaging? Is there technical evaluation activity? This would tell reps which accounts to prioritize right now.
Pipeline quality indicators: Predictive close probability based on engagement patterns, relationship depth, and historical win rates for similar accounts - not which arbitrary stage they’re in. An account in “discovery” with strong multi-stakeholder engagement and clear budget might have higher close probability than an account in “proposal” with a single champion and unclear timeline.
Revenue per account engaged: Sort of ARPU (Avg revenue per user) but for each seller interaction: how much pipeline and revenue are reps generating relative to the accounts they’re working, AI-enabled interactions they are making? This would be a proxy to SQL-to-opportunity conversion but would measure actual revenue efficiency.
Context accuracy: How well do our systems actually understand each account’s situation? Best measured by sales feedback on AI-generated account insights, accuracy of next-best-action recommendations, and relevance of automated outreach. Over time, this can evolve to more sophisticated A/B testing with reinforcement learning from sales feedback.
Relationship velocity: How quickly are we deepening relationships and moving toward purchase decisions? Measured by stakeholder engagement expansion, technical validation progress, and commercial discussion advancement - not movement between stages.
Revenue influenced by orchestration: What percentage of closed deals had meaningful AI-driven engagement that advanced the relationship? This becomes the ROI metric for the entire system.

So, while traditional marketing would report “generated 500 MQAs this month, 20% converted to SQL/A.” The new approach would (very simplistically) report -

“Engaged 200 target accounts, buying committee coverage up 5% MoM, achieved meaningful stakeholder conversations with 45 accounts (22.5% interaction-to-meeting rate), identified 12 accounts showing strong intent signals (technical evaluation + multi-stakeholder engagement), sales is actively working 8 of those with average context completeness of 85% and 72% context accuracy”

From the Marketing analytics and RevOps folks, the job should become more like managing a trading algorithm than helping teams run campaigns - they should be setting parameters, monitoring performance across thousands of concurrent threads, and optimizing based on signals that actually predict revenue rather than executing pre-defined sequences and counting form fills. And no more lead scoring please.

What will stay the same is that buyers still need progressive trust-building, evaluation time, and consensus development. Complex B2B purchases won't suddenly become impulse decisions, even though we are seeing the buyer journey collapse into fewer screens and touchpoints. Enterprise software deals will still take months because you're coordinating across procurement, IT, security, finance, and business stakeholders - AI doesn't eliminate that complexity yet. We'll likely need agent-to-agent interactions between vendors and buyers for deal cycles to compress dramatically, and that's probably a 2028 story. For 2026, let's focus on getting the funnel metrics right while everyone else waits for AGI.

How do we bridge the gap between enterprise demand and AI supply

piyush sagar mishra — Sat, 06 Dec 2025 11:50:44 GMT

Disclaimer: The views here are my own. They don’t represent my employer or anyone else. Just a personal take based on what I’ve been building, breaking, and fixing across teams and companies throughout 2025.

2025 went by too fast, and it was a big one. The global AI market hit $391B and forecasts now point toward $2T by 2030. Cursor went from niche to absurd growth, hitting $65M ARR with 6,400% YoY expansion. GPT crossed its three year mark (i know, right!) with GPT 5.1 dropping a month ago. World Labs started generating full 3D worlds from text. The tools jumped several leagues in a single year.

And yet, enterprise progress stayed weirdly slower relative to the foundational growth. That is what this year taught me - we unlocked insane front end and generation capabilities while the backend, the workflows, the operations, and the enterprise plumbing stayed stuck in 2024. We built great demos but not as great systems as we could have. We focused tons on generation, not enough on integration. We hyped autonomy while enterprises desperately needed reliability. The model layer evolved faster than the actual organizational ability to use it.

I spent this entire year building AI workflows across marketing, ops, engineering, and product. I watched agents break in every possible way, developers use AI daily but not trust it, non-technical teams unable to convert their knowledge into working instructions etc. Google’s DORA report showed ninety percent adoption but only twenty four percent trust, Stack Overflow showed sixty six percent frustration with agents that are almost right but not quite. That matches everything I saw firsthand 🥲

So instead of another hype post, here is the wishlist for next year - the infrastructure gaps, workflow gaps, a bit of cultural gaps (more on that in another post though). The things that actually broke in 2025 across the five stages of the real lifecycle: Build, Deploy, Distribute, Engage, Scale.

Build: the context collapse and the “build for non-programmers” problem

The build phase went through a total reset in 2025 - tools like Cursor, Replit, and Github Copilot made “type a sentence, get code” boring. Start of the year got Cursor’s 0.43 update strongly adopted with the composer Agent, Windsurf added voice interaction, Claude Sonnet 4.5 hit 75%+ SWE-bench scores.

But as i have mentioned in the past as well, generation was never the hard part - review was, still is. And as more low/no code tooling has evolved, allowing non-engineers to build applications, the problem has gotten much more severe. The cost to write code has almost dropped to near zero, but the cost to understand what got written went through the roof. I’ve seen folks in my team personally debug 600-line agent-generated orchestration logic where no one remembered the original assumption. That’s not development; but borderline archaeology.

The enterprise reality check from Google’s 2025 DORA report was sobering - 90% of developers now use AI (up 14% from last year), but only 24% trust it. Stack Overflow’s survey shows 52% of developers either don’t use agents or stick to simpler tools. We’re in this bizarre “trust paradox” - using tools we don’t trust because they boost productivity by 80%.

And even before you get to code, you hit the “translation gap” I kept running into all year. The people who know what the system should do - marketing managers, ops leads, customer success folks - still can’t turn that knowledge into prompts that produce working software. Gap Inc partnered with Google Cloud to embed AI across operations, but they’re the exception, not the rule.

For 2026, we need to solve this. Not with better generators, but with better reviewers. Here’s some of the big gaps i think would matter a lot -

AI-native code review: Tools like Cursor’s Bugbot and CodeRabbit are pushing boundaries - Bugbot now generates PR summaries automatically and achieves 42% accuracy in detecting runtime bugs, while CodeRabbit hits 46%. Qodo (formerly CodiumAI) is my personal favorite for now, offering context-aware, test-aware, and standards-aware reviews that actually understand our codebase patterns very well. But i think what’s still missing is ensuring regression errors don’t leak through workng features and abstracting code into business context for reviews - who decided what, why did that choice matter.
Guardrails that understand AI-written code: I love SonarQube - it remains a solid enterprise standard with AI-enhanced detection capabilities, while newer tools like DeepSource and Graphite evolving to modern PR workflows with AI-powered context management. The next step in my mind is continuous, contextual validation that goes beyond just linting - tools need to understand architectural decisions and team-specific patterns.
Diffs that explain themselves: When multiple agents propose changes, I want to see not only what changed, but why, and what it affects downstream; crucial for iterative builds, especially as the codebase is refactored from upstream low/no code tools.
Business logic abstraction: Still no good way for non-engineers to express “if X happens, do Y” at scale in plain language and have it generate maintainable logic without breaking something else. Specificity in scoping is hard. When released, i found the new MCP projects sponsored by GitHub copilot and VS Code teams in October very interesting - they’re building frameworks that let AI interact with tools, codebases, and browsers in revolutionary ways.

Deploy: the configuration explosion

Deployment used to be predictable - now every app is a mess of model endpoints, vector stores, auth layers, API keys, and observability hooks. Vercel makes it smooth for our standard Next.js setups, but the moment you need custom routing, compliance, or multi-region data handling, you’re in the weeds again.

Everyone’s building AI infrastructure like it’s the new arms race - which honestly, it kind of is, but getting “Infrastructure as prompt” right is one of big dreams to make upstream devs work - describing what you want (”a multi-region setup with SOC2-compliant storage and automatic rollback”) and letting the system generate the boilerplate to manage integrations. Amazon’s new Nova Act IDE extension is interesitng, Salesforce’s MuleSoft Agent Fabric for orchestrating AI agents across enterprise systems is also interesting, but there’s lots to be done. These layers in the stack would be very useful:

Infrastructure-as-prompt: Everyone’s building demos, no one’s cracked the version that’s secure, predictable, and enterprise-ready - and fits into custom infra
Verification-as-a-service: Microsoft’s MMCTAgent which can reason over hours of video and massive image collections is interesting - imagine that level of verification for generated code. Tools like Qodo’s test generation and CodeScene’s behavioral analytics are getting closer, but we need this as a standalone API any generator could call.
Enterprise guardrails: Rollbacks, audit trails, compliance policies - every team rebuilds these from scratch, all custom-built. Such a waste.
Data residency automation: Still no simple way to describe regional rules and have infra handle routing, caching, and storage automatically from a knowledge base. Lot of custom dev needed for separate instantiation and deployments

Distribute: the discovery layer inversion

This is the part most people underestimate. AI-native distribution doesn’t look like sales, SEO, or App store listings anymore. It’s about being discoverable by other AIs.. when someone asks an AI assistant, “what’s the best way to automate approvals,” the models’ answer is your new channel. That means your documentation - not your ad copy - is the entry point.

The October launch of GPT-5 with its 400K context window and multi-modal capabilities was a decent step up - being able to process text, images, audio, and video simultaneously - they’re not just reading your API docs, but understanding your entire product ecosystem. Baidu’s ERNIE 5.0 is also neat, claiming to beat GPT-5 on visual understanding benchmarks, while Google added Deep Research to NotebookLM, turning it into an autonomous research assistant.

Most of the docs online today are human-friendly and LLM-hostile. They’re full of adjectives, testimonials, and vague promises, but LLMs want clean schemas, examples, and integration specs. Some of the new MCP projects are addressing this - fastapi_mcp was interesting but slowed down, context7 pulls version-specific documentation straight from code into AI prompts. But there’s still a lot of missing standardisation across tools and interoperability. Some thoughts for this phase:

Agent-readable documentation standards: We NEED a shared format for docs that AIs can consume - think JSON-like schemas, explicit input/output examples, and transparent auth steps.
Programmatic capability marketplaces: App store built for agents, not people. Notion 3.0’s AI Agents with memory and connectors show what’s possible, but are too limiting (and i am not a big fan of the review system they have in place)
LLM visibility ops: Meta bringing Llama to federal agencies shows how critical government and enterprise visibility is becoming.
Procurement bridges: Enterprises will expect structured metadata for security, pricing, and compliance that their internal agents can parse when deciding which vendors to recommend.

Engage: so much that could be personalized, but should it be?

Every AI-generated app can personalize itself infinitely - which sounds great until you try to measure engagement. Traditional metrics like MAU or session duration don’t mean much when each user’s product behaves differently. Legacy tools like Customer.io and Braze (I love them both though) assume uniformity, and don’t know what to do when each user gets their own app logic.

According to Stack Overflow’s 2025 survey, developers are most resistant to using AI for high-responsibility tasks like deployment and monitoring (76% don’t plan to) and project planning (69% don’t plan to). The biggest frustration? 66% cite “AI solutions that are almost right, but not quite.” We’re personalizing the wrong things while avoiding the areas where AI could actually help.

Very soon, we’re going to see engagement handled directly by AI agents - personalized outreach, onboarding, and lifecycle management - all grounded in live behavioral data. Perplexity’s new app connectors and enhanced memory, along with Enterprise Max features, show where this is heading. I’d love to see some of these:

Variant-aware orchestration: Engagement systems that understand product usage/engagement/variance per user and can adapt flows dynamically
Agent-owned communication: WhatsApp rolled out message translations earlier this year - imagine agents that can communicate across languages and contexts under brand guardrails.
Knowledge that stays alive: ElevenLabs’ Scribe v2 Realtime with sub-150ms transcription across 90+ languages shows what’s possible for real-time knowledge systems
Rethinking metrics: The shift from “how many users came back” to “did this user achieve their intended goal.”

Scale: where all the generated code breaks

This is where everything that looked fine in dev blows up - code that worked for ten users collapses at a hundred. AI-generated code (so far) tends to optimize for “it runs now,” not “it scales later.” Current tools like SonarQube and CodeGuru help with static analysis, but they’re reactive, not predictive. The newer wave - Qodo with its enterprise-grade reviews, CodeScene with behavioral analytics - are getting better at flagging scalability risks before they happen.

The November LogRocket rankings show GLM-4.5 debuting at $0.35/$0.39 pricing with MIT license and self-hosting capabilities - 90.6% tool-use success rate beating Claude 4 Sonnet. The economics of AI are changing fast. When you can get frontier-level capabilities for pennies, the bottleneck isn’t compute anymore - it’s understanding what actually scales.

Then there’s multi-region deployment, consistency, compliance - all the enterprise stuff AI doesn’t understand yet. Oracle’s new AI platform for government agri risk forecasting and Suki’s nursing consortium for healthcare workflows show some interesting trends in sector-specific solutions emerging. But we need more general solutions:

Predictive scalability checks: Tools that simulate real usage before launch and surface bottlenecks proactively. Qualcomm’s Snapdragon X2 Elite Extreme shows the hardware is there - we need the software to catch up.
Data-policy compilers: Declarative frameworks that encode legal and regional constraints into routing and caching decisions automatically
Continuous performance governance: Systems that enforce latency and cost budgets at PR time, not after users start complaining
Security and provenance by default: Traceable lineage for every artifact or decision an agent produces - not optional, not bolted on later.

The obvious opportunities for 2026

A lot of the gaps I’ve called out above will definitely be tackled in 2026, especially the ones which have built a good foundation and will easily manage the distribution layer too -

Dev tools for AI-generated code will expand far beyond generation. The shift to “review-first” development is already happening. Cline offers open-source BYOK (bring your own key) flexibility, while tools like serena provide semantic retrieval and editing capabilities. Qodo achieving enterprise-grade context awareness, CodeRabbit with 46% bug detection rates, Graphite with modern PR workflows - show that review and verification are becoming more valuable than generation itself, and will have several BIG evolution in the next 12 months
Enterprise knowledge infrastructure will mature rapidly. Companies are already building AI agents for customer service, sales, and support. Google NotebookLM’s Deep Research feature that browses hundreds of sites and creates comprehensive reports shows the direction. We need model-agnostic middleware to allow non-programmers to manage knowledge
Context engineering platforms will expand AI capabilities accessible to non-technical users. Microsoft’s Copilot Studio Wave 2 for no-code agent building is just the beginning. Stability AI’s Image Services on Amazon Bedrock deliver professional-grade editing as APIs. I am pretty sure good things will ship here.
Operational excellence automation will abstract away complexity. GitHub turning Teams conversations into code with Copilot shows how seamless this can become. Amazon, Oracle, and Salesforce all launched enterprise AI platforms in last few months alone.
Engagement infrastructure for adaptive applications needs a rethink. 75% of developers say they’d still ask a human “when I don’t trust AI’s answers” according to Stack Overflow. Building that trust layer is the opportunity.

What 2026 needs to deliver

Looking back at 2025, the pattern is clear: we built amazing generators but (almost) forgot about everything else. The AI value chain keeps getting more lopsided - the build stage gets easier while complexity cascades downstream into deployment, distribution, engagement, and scale.

The market will grow from $391 billion to that projected $1.81 trillion by 2030, but most of that value won’t come from better code generation. It’ll mostly come from solving the unglamorous problems - the integration nightmares, the trust gaps, the review bottlenecks, the scaling failures.

My prediction for 2026 is that the winners won’t be the companies with the best models or the fastest generation, but the ones who finally crack the “last mile” problems that 2025 exposed. The ones who build:

Trust infrastructure that makes that 24% trust number climb to 80%
Review systems that actually understand context and architectural decisions
Deployment platforms that work with real enterprise constraints, not just demo apps
Integration layers that let non-technical teams actually use these tools without breaking production
Scaling intelligence that predicts and prevents the failures before they happen

2025 taught us that anyone can generate code, but 2026 needs to teach us how to ship it. And since everything is plateauing based on where the stack sits today, foundational companies will evolve to solve this, or go vertical-first (where they’ll surely make a lot of money, but maybe not solve enterprise problems)

Amidst all that’s happening, one of the uncomfortable question I keep coming back to is “if creation cost truly drops to zero and anyone can build software, should we even be building software the way we do today? If the constraint isn’t “can we build it” but “should we build it” - then what?”

Maybe a big part of 2026 won’t be about better tools, but about better judgment about when to use them and who should truly use them.

Retention: the growth work nobody wants

piyush sagar mishra — Sun, 28 Sep 2025 06:31:36 GMT

Growth feels good because it's visible - more users signing up, more logos on the homepage, more pipeline, more charts trending upward. But most PLG businesses are running on a treadmill - users leak out as quickly as they come in, so acquisition just papers over churn while the underlying machine runs harder to stay in the same place.

This is why retention is different from every other metric companies obsess over. Acquisition can be manufactured through Ads, content marketing, SEO optimization, and referral programs; expansion can be engineered with freemium upgrades, upselling/cross-selling, clever pricing; but retention asks a simple question: did you build something people actually want to keep using? It determines whether you are compounding or just backfilling losses. Acquisition can make you feel faster, but retention determines whether that speed has direction; whether each new user becomes a permanent part of your business or just a temporary visitor you'll need to replace next month.

The math is pretty obvious, but also makes this concrete - think of your user base as

where Ut is your current users, R is retention, A is acquisition, and V is virality.

At low retention rates, that first term collapses and you start from scratch every month. At high retention rates, the base compounds: each month builds on the last. E.g, consider two companies, both adding 1000 new users monthly. Company A has 50% monthly retention. Company B has 80% retention. After 12 months, Company A has 2000 users total - barely double its monthly acquisition despite a full year of growth. Company B has 8000 users. Same acquisition, 4x the result, entirely due to retention.

PLG makes this equation more brutal - the funnel is intentionally frictionless: no demos, no contracts, no sales reps filtering for intent. This is PLG's superpower - anyone can try the product in seconds, but it's also the harshest test. The same open doors that eliminate barriers to entry also eliminate barriers to exit. Students, hobbyists, competitors, and casual browsers all count in your denominator, making your retention curves look worse than sales-led businesses where reps pre-qualify commitment. But that harsh reality is actually useful - PLG shows you immediately whether people are getting value; if they're not, they disappear, and you see the truth in real time.

why teams abandon retention work

Retention optimization feels uniquely frustrating because it violates the fast feedback loops that make other growth work satisfying - launch a Facebook ad campaign and you'll see signups in no time, complete with click-through rates and cost per acquisition data; roll out a new pricing tier with annual discounts or clever bundling/pricing and expansion revenue appears in the next billing cycle. But ship an improved onboarding flow today and you won't know whether it actually improved 90-day retention until three months pass.

This delay creates a measurement problem that compounds over time - unlike acquisition, where you can run controlled experiments with statistical significance in days, retention optimization often requires making multiple simultaneous bets across different parts of the user journey. You might improve page load speeds from 4s. to 2s, simplify signup from eight fields to three, redesign onboarding with interactive tutorials, and launch behavioral email sequences all in the same quarter. When retention improves six months later, isolating which changes actually mattered becomes nearly impossible - not because the math is hard, but because the experiment data lives with different teams who tracked different metrics.

The systemic nature makes retention especially challenging organizationally - acquisition can be optimized channel by channel, e.g. Google ads has its own team optimizing keyword bids, facebook has specialists running creative tests, content marketing has dedicated writers tracking organic conversion rates. Each channel has clear ownership and measurable daily outcomes, btt retention emerges from the intersection of product design, backend engineering, customer success, content strategy, and email marketing. A user might churn because signup is confusing (design team), APIs are slow (backend engineering), they can't find help when stuck (content team), they never discovered key features (product marketing), or they hit javascript errors (frontend engineering). No single team can own retention because improving it requires coordinating across every touchpoint simultaneously.

To add to it, human behavior adds complexity that product optimization can't solve either - users change jobs, budgets get cut, teams consolidate tools, new management arrives with different vendor preferences. The 2022 economic downturn saw many PLG companies experience retention drops not because their products got worse, but because customers scrutinized every software expense more carefully. This creates false signal problems e.g. users might appear retained because they log in weekly to check notifications, but they're not creating projects or building anything meaningful.

Lastly, and this bit is fairly crucial, the organizational dynamics make retention work politically difficult - acquisition improvements generate visible dashboard spikes that executives celebrate - “Wow, our signups doubled Q/Q with the new campaign release”. Expansion ties directly to revenue growth that makes finance teams happy. Retention work is slower, less dramatic, and often requires saying no to exciting features so engineering can focus on mundane problems like reducing signup abandonment or fixing edge case bugs. These improvements don't generate press releases but quietly determine whether unit economics work.

Most retention problems are actually activation failures; users who never send their first message, build their first dashboard, or connect their first integration were never truly retained - they were just temporarily present. This is why successful PLG companies obsess over time-to-first-value, removing every unnecessary step from onboarding and personalizing flows so a hobbyist doesn't see the same experience as an enterprise buyer.

Expansion drives retention as much as revenue - when users invite teammates, connect integrations, or upgrade plans, they embed the product deeper into their workflow. Slack reported that teams with 2k+ messages sent had 93% retention rates, while teams with fewer than 2k messages had much lower retention. The difference isn't just usage but rather organizational embedding - once Slack becomes how a team communicates, switching away requires changing fundamental workflows for dozens of people.

measuring what matters

Companies track retention in different ways, but not all metrics tell the same story. The standard approach focuses on percentages - "our 30-day retention is 42%" - which captures whether users return but not whether they're getting value. Dollar-based retention sounds sophisticated but can be misleading for PLG businesses; a small startup might get massive value from your free tier while a large enterprise might pay thousands but barely use the product. Login-based retention is the most common metric but also the most deceptive; users might check in weekly out of habit without accomplishing anything meaningful. Feature usage retention gets closer to value but can optimize for the wrong behaviors - if you track users who create five documents per month, you might encourage busy work instead of solving real problems.

The most predictive retention metrics track completion of core workflows rather than raw activity - e.g. Figma measures users who share their first design with teammates, notion tracks users who create their first database. These behavioral milestones correlate strongly with long-term retention because they indicate that users have experienced the product's core value proposition.

the retention advantage

Retention buys you pricing power and competitive moats - when users depend on your product daily, they become less price-sensitive. Salesforce can raise prices because switching CRM systems is painful, slack can get to per member pricing for the same reasons; high retention creates switching costs that competitors can't easily overcome. Several pieces of research show that increasing retention rates by just 5% can increase profits by 25-95%. The math works because retained customers cost nothing to acquire and typically expand their spending over time. Companies with strong retention can afford to invest in longer-term bets because they're not constantly fighting to replace churned users. This is why retention ultimately determines market position - companies that solve retention can focus on product depth instead of marketing spend, build for power users instead of casual browsers, and create features that increase switching costs instead of just driving trial conversion.

Product ≠ Software

piyush sagar mishra — Sun, 03 Aug 2025 16:31:47 GMT

I came across a tweet from Steve Krouse recently that captured something important about the current state of AI-assisted programming. We’re in the middle of a vibe coding explosion, where AI tools excel at producing locally coherent code snippets. But many are misreading this progress, extrapolating it into grand claims about the end of human programmers

There's a hidden trap in this apparent productivity boom: the cost to write code has collapsed, but the cost to review code has exploded. In the past, if someone submitted a 200-line diff, you knew roughly how they got there. Now, people paste in a vague prompt, and out comes a 600-line orchestration module with imported packages no one asked for and assumptions no one agreed to. Reviewing that output is like spelunking in someone else's hallucinated architecture. Every review turns into vibe debugging, where you're not fixing a bug, you're fixing a mood.

This mismatch between generation speed and review complexity leads to two dangerous outcomes. First, systemic fragility: codebases fill with low-context patches that work locally but struggle to compose globally. The AI makes choices no one remembers making. Second, architectural debt by accretion: teams conflate "code that runs" with "code that scales." Vibe-coded systems optimize for the constraints they have dealt with oftentimes resulting in momentary shippability, not maintainability. Try prompting Claude to generate a Seaborn plot for churn insights- you’ll get 500 lines of syntactically perfect code, followed by 100 minutes of debugging just to make it compose properly and make business sense.

Sourced from Steve K’s tweet

The fantasy of spec-to-code automation imagines programmers as mere translators who mechanically convert requirements into implementation. But this strawman programmer doesn't exist in practice. Every competent engineer I've encountered actively participates in problem definition, challenges assumptions, and iteratively refines solutions through the act of building them. Simon Willison has a brilliant articulation on this topic (the entire thread is gold).

Local optimization vs global coherence

I really liked the concept of local optimisation vs global coherence when i first came across it (here). Extrapolating it to what we’re witnessing now, Vibe coding excels at local optimization: solving bounded problems with minimal dependencies where correctness is contextually obvious and the blast radius remains small - consider writing a webhook handler, formatting API output, creating SQL queries, or building a frontend that allows ingesting a bunch of user inputs, making a few OpenAI calls and recommending the best Cafes and Restaurants. The problem space is well-defined, success criteria are explicit, and validation is immediate.

Complex engineering represents a fundamentally different class of problem: achieving global coherence across multiple interacting components under shared constraints like scalability, security, and failure recovery. This isn't merely one of scale; it's a categorical distinction.

The core issue is the context explosion problem. While local optimization requires understanding a bounded problem space, global coherence demands reasoning about exponentially expanding interaction surfaces. Consider building a compiler where each decision creates cascading constraints: the lexer's token design affects parser complexity, which influences AST structure, which determines optimization possibilities. A locally optimal choice at any level might make global optimization impossible.

Vibe coding doesn’t do great here because it operates through pattern matching against previously seen solutions. But truly complex systems often require novel constraint satisfaction, finding solutions that balance competing demands in ways that have no direct precedent.

Engineering at scale is often constraint-oriented: you're not only teaching the system how to do something, but you're also defining what must be true (latency < 300ms, idempotent retries, zero downtime deploys). These constraints interact across components and force tradeoffs that can't be easily vibe-coded. They need negotiation, abstraction, and architecture, not only generation. Good engineers solve for it by exhibiting mechanical sympathy - an intuitive alignment between the software they write and the underlying systems. Vibe coding has no such sympathy. It can write valid Kafka consumers that don't understand backpressure, or spawn microservices without accounting for network costs. It treats code as text, not as behavior in a live system.

This explains why solopreneurs can thrive building profitable products through vibe coding while enterprise systems struggle. The distinction is fundamental: products optimize for user value and market success; software systems optimize for reliability, maintainability, and evolution. The solopreneur model works through problem curation, selecting challenges solvable via local optimization while outsourcing global coherence problems (or, in a lot of instances even ignoring those, shipping demos that work on local machines and look good on AI created landing pages) . Crucially, solopreneurs can afford exploded review costs because they're reviewing their own AI output, personal context that doesn't transfer to teams where code review becomes reverse-engineering someone else's AI conversation.

Beyond code generation

The misconception stems from conflating code generation with software engineering. Code is the artifact, not the activity - real software engineering emerges from the dialectical relationship between problem and solution domains, where implementing solutions reveals new problem aspects.

I remain deeply optimistic about AI's potential to accelerate software delivery and help more people ship products quickly. But the timeline for truly autonomous software engineering is slightly longer than current hype suggests. The reasons are fundamental: mechanical sympathy can't be pattern-matched, constraint satisfaction requires domain expertise that emerges from years of operational experience, and architectural judgment develops through repeated exposure to system failures and recovery.

Obviously, we will train the models to build sympathy in due time, but even then some aspects of software engineering will never disappear: the need to understand how code behaves in live systems, the ability to reason about failure modes that haven't been documented, and the creativity required to find novel solutions to unprecedented constraint combinations. I don’t think these are implementation details that better models will eventually subsume. They're the irreducible core of what makes software engineering an enduring human discipline.

How should your Team work with AI?

piyush sagar mishra — Sun, 20 Jul 2025 10:24:05 GMT

The internet is flooded with articles about prompt engineering, and more recently on context engineering (which should have been the de facto since day 1), but they're all focused on the same things - either (a) helping YOU —> helping you get better outputs, optimizing your personal AI workflow, making you more productive, or (b) building complex full-stack LLM infra for enterprises.

Almost nothing addresses the intermediate, messier, and more interesting challenge: how can teams actually work with AI together?

Goes without saying that AI is definitely a huge leveler - speeds things up, eliminates grunt work, and gives everyone superpowers, but there's wild variance in how people within the same organization, or even same team use it. While some folks ask brilliant, strategic questions, others treat it like Google. The query and context patterns are all over the place. And if you're a manager trying to upskill your workforce or get your team an edge, you quickly realize the real question isn't "how do I use AI better?" It's "how should my team work with AI?"

This braindump is my attempt at filling that gap. The principles we'll explore are the building blocks for everything from team collaboration to sophisticated AI agents and complex LLM infrastructure that organizations are building.

We'll go through these progressively - none works the ideal way in isolation, but together they create the foundation for systematic AI capabilities. We'll keep "context engineering" at the core since prompt engineering without proper context is pretty useless for real work. I always use this metaphor that a poorly crafted context is like sending the world's best translator into a room with muffled audio- the outcome will only be as clear as what it hears.

So, here's what I've learned about making this work - some of this may sound obvious, but bear with me.

Principle 1: Define clear team goals with measurable success criteria

Teams need shared definitions of what success looks like, but most teams have never actually written down their objectives in a way that's both human-readable and machine-actionable. The difference between "improve conversion rates" and having structured goals that capture constraints, context, and success metrics is massive.

The magic happens when teams stop assuming everyone understands what "good performance" means and start having explicit conversations about it. Your growth lead might think success means hitting conversion targets, while engineering thinks it means not breaking anything, and product thinks it means user experience improvements. These aren't wrong perspectives - they just need to be reconciled into shared definitions.

Start by having each team member write down what they think the goal is, what constraints they're operating under, and what context affects success. Then spend time aligning on these definitions until everyone agrees on what you're optimizing for. Most teams discover they've been working toward slightly different objectives without realizing it. When engineering mentions they only have 4 hours per week for optimization work, and product shares that the mobile app launch affects web conversion priorities, suddenly your AI recommendations can factor in these real constraints instead of suggesting impossible solutions.

The key insight is making this centralized and accessible to everyone. When anyone on the team prompts AI for analysis or recommendations, they're pulling from the same shared understanding of what success looks like and what constraints exist.

Principle 2: Engineer context like infrastructure

Think of LLMs as exceptionally knowledgeable consultants who know nothing about your specific business unless you explicitly brief them. They weren't in last quarter's strategy meeting, don't know your CEO's pricing philosophy, and have no understanding of what "activation" means in your product context.

Most teams treat context as optional background information rather than essential infrastructure. They feed AI raw data and expect meaningful insights, forgetting that data without narrative is just noise. This leads to what context engineering experts call "context failure modes" - e.g. context poisoning/distraction/confusion/clash etc. LLMs are like sous-chefs who don't know what's in your fridge - you must inventory your ingredients, write the precise recipe, and keep the knives within arm's reach.

But AI needs structured access to this intelligence to generate useful insights. Think of this as building a centralized context repository that everyone contributes to and benefits from.

The beautiful thing about building shared context is that you're probably already tracking most of this data - it's just scattered across different systems and people. Your CRM has sales cycle data, your customer success platform has churn indicators, your project management tools have engineering capacity information. The challenge isn't generating new data; it's extracting insights from what you already have and making it accessible.

If you don’t have a great analytics team setup, or lack an engineering support to automate these data-to-context pipelines, start by having each functional area pull their key metrics and insights from existing systems. Sales can share average cycle lengths from Salesforce, customer success can identify churn patterns from support tickets and usage data, engineering can assess capacity from sprint planning tools. The goal isn't perfection - it's creating a shared repository of organizational intelligence that everyone contributes to and uses.

This becomes incredibly powerful when it's centralized. Instead of each team member having their own understanding of "what's happening in the business," everyone works from the same context base. When your marketing manager asks AI to analyze campaign performance, it automatically knows that engineering is at 85% capacity and competitor A is pressuring on price, so recommendations fit organizational reality instead of being theoretically optimal but practically impossible.

The context structure doesn't need to be complex - just consistent. Teams that invest in building this shared intelligence find that their AI interactions become dramatically more useful because every recommendation accounts for real business constraints and opportunities.

Principle 3: Build as systems, not one-off solutions: create reusable prompt templates your team can share

Most teams treat AI like a series of casual chats rather than a scalable system. Every prompt starts from scratch, past work gets lost, and there’s no compounding intelligence. While memory and reflection help with inter-turn context, a stronger foundation comes from treating prompts and context like code: structured, reusable, and constantly improvable. This means moving from ad-hoc prompting to modular design. Think in components, not conversations, and design each prompt to stack, evolve, and transfer across use cases.

Individual prompt optimization is like everyone writing their own Excel formulas instead of sharing the good ones. Teams that build libraries of proven prompts capture institutional knowledge and let anyone access sophisticated analysis frameworks.

The most effective approach is identifying the analysis types your team does repeatedly - competitive response, campaign optimization, customer research, product launch planning - and building templates that capture how your best thinkers approach these problems. Start by having team members share prompts that have generated particularly useful insights, then work together to identify the common elements and structural patterns.

What's powerful about the template approach is that it democratizes sophisticated thinking. When your junior growth marketer needs to analyze competitive threats, they can use the same analytical framework that your senior strategist would use, complete with the organizational context and constraints that make recommendations actually implementable.

When someone discovers a better way to structure competitive analysis or finds that adding specific context dramatically improves recommendations, updating the template benefits everyone's AI interactions.

This also creates consistency in how your team approaches problems. Rather than getting wildly different analytical approaches depending on who's doing the analysis, the templates ensure a baseline level of rigor and organizational awareness across all AI-assisted work.

Principle 4: Start with questions, not answers: get your team to start implementing inquiry-response patterns before execution

Most teams use AI like a more sophisticated Google - asking specific questions and expecting specific answers. But the real power comes from using AI to help your team identify what questions you should be exploring together, especially questions that emerge from the intersection of different functional perspectives.

Traditional interactions follow a command-response pattern: human asks, AI answers. Advanced practitioners implement an inquiry-first approach where AI helps clarify objectives, question assumptions, and explore alternatives before providing analysis. Even in real life teams, the best partnerships start with 'What should we be asking?' and "How should we be working together," not 'Here's what I need you to do.'

The shift from individual questioning to collaborative inquiry happens when teams start AI sessions with context-setting rather than specific requests. This approach surfaces blind spots and connections that individual team members might miss. When AI suggests exploring how customer success retention insights could inform acquisition targeting, it's connecting dots between functional areas that might not naturally collaborate. When it identifies the tension between engineering bandwidth constraints and optimization opportunities, it's highlighting trade-offs that need explicit team decision-making.

The most productive teams make this a regular practice. Monthly strategic inquiry sessions where the team uses AI to surface questions they should be investigating together, quarterly sessions where they explore longer-term uncertainties and opportunities. The key is treating AI as a strategic thinking partner that helps teams ask better questions, not just answer the questions they already know to ask.

Principle 5: Be great at role-based model configurations (i.e. setting persona-specific system prompts with domain constraints)

AI that adopts specific professional identities - with the mental models, priorities, and constraints of those roles - generates substantially more relevant and actionable outputs. Sander Schulhoff, the OG prompt engineer, called out the impact of role-based prompting on the quality of outputs (go to 18th minute to hear directly from him), and he’s right that it may not make sense in a lot of scenarios- but then when it comes to supplying the role with enough context, there are enough instances of incredible gains in the output.

Getting different expert viewpoints on the same challenge reveals trade-offs and opportunities that single-lens analysis misses. The goal isn't to get one "right" answer, but to understand how the same situation looks through different strategic lenses.

Teams should run the same strategic challenge through multiple analytical lenses and personas simultaneously, then discuss where the perspectives align and where they create tension. Start by defining what each perspective should focus on based on your team's actual functional expertise - your growth strategist cares about unit economics, your competitive analyst focuses on market positioning, your customer success lead thinks about lifecycle optimization.

When all three perspectives agree on a recommendation, you have high confidence. When they conflict - like growth strategy saying "increase spend" while customer lifecycle says "focus on quality" - you have identified the strategic trade-offs that need explicit team decision-making. These tensions aren't problems to solve; they're the interesting strategic questions that determine your approach.

The key insight is that different functional perspectives reveal different aspects of the same strategic reality. Your growth strategist sees that rising acquisition costs are pressuring unit economics, your competitive analyst sees that pricing pressure could commoditize the market, your customer lifecycle expert sees that different channels bring different quality users. All three are right - the strategic question is how to balance these realities.

Principle 6: Make AI reasoning transparent and checkable

I’ve previously written about this (How the hidden hurdle to scaling vibe is architecting human nose for error): the most dangerous outputs aren't obviously wrong - they're confidently wrong in ways that bypass human skepticism. LLMs can generate sophisticated analyses with fictional citations, reasonable-sounding stats from non-existent studies, and logical conclusions based on flawed assumptions. This becomes exponentially more complex in agentic workflows where multiple AI systems make sequential decisions and errors compound rapidly without systematic verification.

Building transparency into AI reasoning means your team can validate assumptions, check logic, and trust recommendations because they understand how conclusions were reached.

Transparency starts with teams demanding that AI show its work, not just its conclusions. When AI recommends shifting budget allocation, teams need to see the data foundation (LinkedIn CPA vs paid social CPA), understand the assumptions (that LinkedIn performance will scale), and know what validation is needed (account manager confirmation of inventory availability).

The real power (and also closure to the build cycle) comes from having team members with domain expertise validate different aspects of the reasoning. Your paid social expert can assess whether audience saturation explains performance decline, your LinkedIn specialist can evaluate scalability assumptions, your analytics lead can confirm data quality and statistical significance. Each team member contributes their expertise to evaluating the recommendation's foundation.

This creates a culture where AI recommendations become starting points for informed team discussion rather than final answers. When the reasoning is transparent, teams can identify where they agree with the logic, where they have concerns, and what additional information they need to make confident decisions.

Principle 7: Learn from what works and what doesn't - Set up feedback/error handling mechanisms

The best AI outputs aren’t the most confident, they’re the most honest about uncertainty. False confidence is dangerous and compounds pretty quickly in agentic systems, turning small errors into systemic failures. Smart teams distinguish between types of uncertainty - epistemic uncertainty can be reduced with more data, aleatoric must be managed, and ambiguity demands human judgment. The highest-value AI response, which systems should aspire to get to, is often, “I don’t know, but here’s how we can find out.”

By systematically tracking what works, what doesn't, and why, teams build institutional memory that compounds over time.

Learning happens when teams systematically track the outcomes of AI-assisted decisions and extract patterns from what works. Start by documenting significant recommendations and their results - not just whether they succeeded or failed, but what contextual factors contributed to the outcome.

The goal is building institutional memory that survives team changes and compounds organizational intelligence. When your team discovers that including customer success insights about churn predictors improves campaign targeting accuracy by 18%, that learning should benefit all future AI interactions. When a competitive response strategy works particularly well, the specific approach and context should become part of your team's strategic playbook.

This creates a feedback loop where your team's AI capabilities improve over time. Context gets richer as you learn what organizational intelligence is most valuable. Prompts and contexts get more sophisticated as you understand what analytical frameworks produce the best results. Decision-making gets better as you accumulate patterns about what works in specific contexts.

The key is making this learning systematic rather than ad hoc. Monthly retrospectives on AI-assisted decisions, quarterly reviews of what context and prompts are performing best, annual assessments of how your team's AI capabilities have evolved. Teams that invest in systematic learning build AI capabilities that compound and become increasingly difficult for competitors to replicate.

Building systematic team AI capabilities

The progression from individual AI use to sophisticated team capabilities happens when teams build shared infrastructure: common goals, centralized context, reusable prompts, collaborative inquiry, multi-perspective analysis, transparent reasoning, and systematic learning.

The JSON examples throughout these principles aren't meant to be intimidating or examples of best-in-class; they're just illustrations of how to structure information so AI can use it effectively and teams can maintain it over time. Most teams can start with simple shared documents that capture the essential elements, then evolve toward more sophisticated systems as their capabilities develop.

The key insight is that effective team AI isn't about having the smartest individual prompts or the most advanced technical setup. It's about building systematic approaches to capturing and leveraging collective intelligence. If you're just starting, don’t chase complexity, but focus on mastering context engineering - how to write, select, compress, and isolate context (good starting kits and intro here from Langchain, and here from DAIR.ai). These skills are the scaffolding for everything that follows. Teams that get this right early will build AI capabilities that compound over time and become increasingly hard to copy.

Further reading:

"Why did you charge me $100 without telling me?"

piyush sagar mishra — Wed, 09 Jul 2025 09:54:59 GMT

The backlash around Cursor’s pricing wasn’t just a one-off mistake. To me, it feels like a sign of something deeper, a challenge that a lot of companies in the LLM space are quietly dealing with. Getting pricing right is honestly just as tough, maybe even tougher, than building a great product. You can make something super impressive, but if the price doesn’t match how users actually experience value, the whole thing can feel shaky. At the same time, I think we’re seeing a bigger shift in this space - the early days of generative AI were all about being generous, free tokens, high usage limits, and growth that looked amazing but was really just burning cash. That’s changing now - the economic reality is starting to hit, and it reminds me a bit of how cloud infrastructure pricing evolved - more metered usage, more cost visibility, and a lot more pressure to optimize. I am sure flat pricing/classic SaaS tier like models aren’t disappearing overnight, but the trend is clearly toward models that reflect real usage and are harder to game.

Right now, most vendors seem to be settling on two main price points. There’s a hobbyist/prosumer tier in the ~$10/$25 per month band, and then a “max”/power user tier that’s ~$100$200. I don’t think of these as old-school SaaS plans, but more like prepaid usage brackets, and I think they reflect the underlying costs that providers are juggling. What’s interesting is how quickly the lower tier is coming under pressure. Open-source and local models are improving fast, and honestly, it’s getting harder to justify paying for something when the free alternatives are good enough. The primary reason hobbyist band is still doing so well is because the “consumer” layer on the free alternatives isn’t very solid today, but i wouldn’t be surprised if the hobbyist tier keeps getting squeezed, very quickly getting to ~$0 for most use cases. That puts more weight on the higher tier, where people are still willing to pay, but only if it’s really worth it. And at the core, the overlap between tools becomes a real issue - if I’m already paying for Claude pro max, it’s tough to justify another just to try something similar, for an adjacent use case. And it’s not just about the money, switching is a hassle, especially when your data and context are stuck in one tool. But I think that’ll also change pretty soon as “context” becomes more portable.

Personally, I keep experimenting a lot - I’ve probably switched between 3-4 foundational stack I use in the last couple of months. But companies are different - once a tool is embedded in a team’s workflow, with approvals, compliance, and integration into internal systems, it’s not so easy to pull it out. That’s where I think most retention will happen, not just by delighting solo users, but by helping teams actually get work done with less friction.

So, as companies focus on building for that higher tier, the real question is, how do you stay sticky when the hobbyist end falls away? And what does it take to build a moat when free tools are getting better every week, and paid tools face more scrutiny than ever?

I keep coming back to the same ideas (and i know it sounds pretty obvious the way i frame it), but IMO":

the tools that last will be the ones that move from being interesting to being embedded - so, it’s not enough to just be a good chatbot/ conversation layer/ being able to drive clean UIs/repurpose assets etc..the winning products will tie directly into the user’s workflow, and they won’t just answer questions, but actually take action. If a model can file a Jira ticket, update Salesforce, or spin up Terraform, it’s not just responding, it’s actually operating. Once you have that, the cost of switching isn’t just financial, it’s operational. Integration is a big piece of this, deep, messy, real-world integration; if your tool can read and write in github, snowflake, notion, slack, not just through surface-level APIs, but in ways that actually fit how those systems are used (and how people use them), you start to build something that’s hard to replace. People underestimate how sticky complexity can be - once a product touches enough parts of the stack, pulling it out becomes a project no one wants to take on just to save a bit of money.
There’s also a growing need for flexibility with the models themselves. More teams want to bring their own weights, or fine-tune on private data without sending it to an external provider. Not everyone wants to run their own stack, and most won’t, but the desire to control costs and protect sensitive data is real. Vendors that make this easy, without needing deep ML expertise, will earn a lot of trust. On the financial side, folks from finance are asking tougher questions now, so flat caps and unlimited plans don’t work when spend starts to scale. Transparent pricing, with clear breakdowns per run or per token, becomes a real differentiator. Not because it’s exciting, but mainly because someone in procurement needs to make the numbers work. Being able to forecast spend, set alerts, and tie usage back to outcomes makes those conversations a lot easier.
Personalization is another lever, but i don’t mean it in the generic marketing sense - basically, if a tool learns from its users quickly, improves every day, and starts to mirror the language and patterns of an organization, it creates a sense of ownership. That feedback loop becomes a kind of organizational memory, and at that point you’re not just using a tool, you’re shaping it, and that kind of embeddedness (Fun fact: substack doesn’t recognize it as a word) doesn’t copy-paste easily to a competitor. All of this really assumes a team-centric lens, i.e. role-based workflows, approvals, audit logs, version control, these aren’t features that attract individual hackers, but they’re essential for teams doing real work. They’re often what separates a hobby project from something a business truly depends on.
And finally, needless to say, distribution matters more than we admit. The smartest model in the world loses out to the tool that’s right there where the work happens. If you’re one click away in VS Code, or a slash command in Slack, you’re part of the flow, and once you’re in the flow, you’re in the decision loop. There’s also a governance layer that becomes more important as tools scale inside companies. Pre-built guardrails, safe defaults, red-teaming, usage analytics, these don’t make headlines, but they make executive buy-in possible. These are the features that get you through security reviews, not the ones that get you trending on Twitter.
So, I don’t think the moat will come from having the biggest model or the lowest latency, but it’ll come from being useful, trusted, and deeply woven into how a team works. That’s harder to build, but it’s also much harder to rip out. We’re exiting out of the playground phase of LLMs, and the infrastructure bill is here. The tools that survive will be the ones that can justify their cost, not just with benchmarks, but with real outcomes.

Soloists to Ensembles: the evolving debate between single-agent and multi-agent systems

piyush sagar mishra — Sun, 29 Jun 2025 14:45:02 GMT

When I first began working in tech, data science meant wrestling with tools like R and SAS on single-core machines. Scale quickly became an insurmountable obstacle - simple regression models were manageable, but any meaningful analysis on large datasets turned overnight jobs into multi-day marathons. Then came Hadoop and distributed computing, fundamentally altering our relationship with data. Suddenly, problems once deemed impossible became routine, reshaping entire industries.

Spend enough decades wrangling software at scale, and an inevitable arc emerges - single-threaded CPUs gradually evolved into multi-core architectures. Monolithic databases became harder to manage as systems grew, leading to the rise of microservices and data meshes. Centralized data warehouses eventually gave way to distributed lakes.

Today, large language models are facing a similar transition. We have increased token limits, adopted retrieval augmentation, and improved compression, and still, practical limits remain - relying on a single model is unlikely to be the final stage of artificial intelligence. This growing awareness has sparked intense discussion across the field about whether to keep refining single agents with enriched context, or move toward ensembles of specialized agents. What we seen over the years is that in artificial intelligence, as in civilization, the true leap forward arrives not when one mind grows larger, but when many minds learn to synchronize their strengths and doubts.

the limits of single-agent reasoning

The debate crystallised dramatically on X, albeit for less than 72 hours, when two industry titans took opposing stances two weeks ago. Walden Yan at Cognition AI published “Don’t Build Multi-Agents,” arguing that multi-agent architectures create fragile systems due to poor context sharing and conflicting decisions. Days later, Anthropic countered by releasing their multi-agent research feature, claiming their system outperformed single agents by over 90% on complex tasks (if you find time, i like this recap of the Anthropic paper by Simon Willison). This was not mere academic posturing, but rather represented a fundamental philosophical divide about AI’s architectural future.

Consider the elegant simplicity of a single LLM: It is a bit like a genius researcher moving through a vast library, checking one book at a time. This sequential process means that as new information arrives, older details get pushed aside. You can expand the context and memory of the researcher to capture it all, but research is starting to measure this limit more precisely. Cognitive scientists have found a clear parallel between LLMs and human working memory. Just as a person can only keep track of a few ideas at once, an LLM has a bounded capacity. When that capacity is exceeded, performance reliably drops.

Several recent studies grounded in cognitive load theory confirm that LLMs exhibit human-like failure modes under overload. When tasked with integrating too many facts, a lone model loses the plot: it forgets earlier instructions, fixates on recent details, or averages everything into lukewarm responses. Anthropic’s BrowseComp benchmark illustrated this limitation vividly. When asked to enumerate the entire board roster of the S&P 500’s tech sector, a single Claude Opus trudged along linearly and stalled out before even reaching companies starting with “C.” It wasn’t hardware memory that failed. It was the model’s cognitive maneuvering room.

Similarly, LangChain’s Tau benchmark showed that beyond seven concurrent tool usages, single-agent accuracy nosedived as context became bloated and focus fragmented. Without getting too much into the technicalities, a lone model reasoning through problems resembles someone walking a tightrope - one step at a time, no/low branching, no revisiting paths without starting over. If crucial insights lie down unexplored avenues, the single LLM will miss them entirely. This tunnel vision means models often zero in on particular interpretations and persevere with them even when suboptimal.

two approaches to scaling intelligence

When Yan unintentionally (or, perhaps deliberately) exaggerated the concept of “context engineering,” he signaled a shift beyond prompt engineering toward complete context sharing and unified decision-making. Cognition’s approach follows two core principles - (a) share context across all system components and (b) avoid conflicting decisions by ensuring every action is informed by the full record of prior actions. Their single-agent approach for Devin prioritized reliability and context continuity, embodying the KISS principle. For deep, narrow tasks like programming where memory consistency and logical coherence are paramount, this architecture shows clear advantages. A coding agent needs to remember variable names from hundreds of lines earlier, maintain architectural decisions across multiple files, and ensure that each change does not break prior functionality.

Anthropic took the opposite stance. Their orchestrator-worker model employs Claude Opus as conductor, dissecting high-level goals and delegating subtasks to specialized Claude Sonnets, each with its own token budget and domain focus. Rather than straining one model with millions of tokens, the system distributes work across many agents operating in parallel. Every additional agent in a system is not merely another processor, but a new perspective, reminding us that diversity of thought is as vital to machines as it is to human progress.

The performance gains were striking. In Anthropic’s evaluation, their multi-agent system achieved over 90% higher success on the most difficult research tasks than a single Opus. They did this by rethinking work organization rather than merely scaling parameters.

The Cognition-Anthropic debate revealed an important nuance: architecture choice depends fundamentally on the nature of the task. Multi-agent systems excel in wide and shallow scenarios, such as market research, data gathering, or comparative analysis, where subtasks proceed independently and results are merged later. These breadth-first exploration problems benefit enormously from parallel processing and varied viewpoints.

Conversely, single-agent architectures retain an edge in deep and narrow domains, like programming, legal analysis, or long-form creative writing, where sustained memory and logical consistency matter more than sheer throughput. When Facebook’s legal team analyzed merger contracts, they found that a single specialized model better maintained understanding of contractual relationships than a multi-agent system, which fragmented the analysis.

the economics of running many minds

And we are at a point where delving into this discussion and seeing benefits from multi-agent setups is not just academic theorizing. DHL uses a multi-agent approach to optimize logistics (separate agents for monitoring weather forecasts, port congestion, inventory buffers etc.). Deutsche Telekom processes millions of daily support interactions through a sophisticated multi-agent pipeline (orchestrator agent triaging queries and delegating to specialist agents: billing agents, networking agents, guidance agents etc.). Their VP summed it up simply: this is distributed computing with LLMs on the edges. Bank of America’s Erica operates similarly - agents for fraud patterns, policy information retrieval, and conversation management. The result in all the instances is faster, more accurate responses.

For months, critics have argued that multi-agent systems are too expensive. Anthropic reported their multi-agent system consumed 15 times more tokens than standard chats, a serious cost multiplier. Running one gpt-4 was expensive enough; running five simultaneously looked untenable. But this is changing fast. Model efficiency improves exponentially, not incrementally. gpt-4.1 models, for example, deliver higher output quality at a fraction of the prior cost. OpenAI’s gpt-4.1 mini matches older gpt-4 performance while costing 83% less. Meta’s Llama 4 can run at a fraction of the price on affordable infrastructure. Groq’s Llama-based systems cost just $0.05 per million input tokens. You could feed 20 million tokens for a single dollar, and i have a feeling that by the time I hit send on this essay, prices would drop even further.

Selective participation mechanisms also cut costs by activating only the agents necessary for a given task. When smart orchestration prevents redundant agents from spinning up, the economics improve even further. As the cost of cognition approaches zero, the frontier of AI will be defined less by raw power and more by the elegance with which we orchestrate memory, context, and collective reasoning.

the road ahead for distributed intelligence

Multi-agent architectures bring real challenges though - Cognition AI’s criticism about fragmented contexts and conflicting decisions was not unfounded. Without disciplined orchestration, shared scratchpads, and clear state management, multi-agent systems can collapse into chaos. Anthropic found this the hard way. Early prototypes created dozens of subagents even for simple queries and had agents distracting each other with redundant work. They solved this by enforcing strict boundaries: the lead agent sets goals, assigns tasks, and defines limits for each worker.

Shared memory channels have become essential. Without them, the orchestrator bottlenecks every merge. Most successful systems now rely on common scratchpads where agents contribute interim findings, reading each other’s work like researchers sharing notes. I don’t intend to make it dramatically poetici, but the architecture of intelligence is no longer a contest between giants , and rather is a choreography of specialists, each agent a thread woven together into a tapestry of insight.

Looking ahead, two trends are converging to redefine AI architecture

Economic barriers are collapsing as token costs drop. When running ten agents costs the same as one, the focus will shift fully to effectiveness.
Simultaneously, context engineering and orchestration will become the core skillset. Just as protocol design transformed the chaos of early networking into the internet’s seamless flow, context engineering is poised to become the lingua franca of AI collaboration.

This convergence means multi-agent architectures will move from experimental to default. The question will not be whether we can afford multi-agent systems but whether we can afford not to use them.

Every computational revolution faces the same crossroads: single-unit complexity versus distributed intelligence. History favors distributed systems refined through careful orchestration. Multi-agent systems are not complexity for its own sake-they acknowledge a simple truth: distributed intelligence scales where soloists stall out.

If you’ve read Collapse by Jared diamond, or the Selfish gene by Richard Dawkins, you’d see a common theme that our civilisation thrived not because individual minds grew larger, but because they learned to collaborate. The dawn of multi-agent intelligence reflects our oldest cognitive strategy: many minds working as one. When coordination becomes seamless and economics irrelevant, which it definitely will in a few months, collaboration will triumph. The lone genius has had its run, but i feel fairly certain that the future belongs to the organized collective, architected with the precision of distributed systems and the humility to recognize that no single agent holds all the answers.

Beyond the prompt: "smell" test and the hidden cost of verification

piyush sagar mishra — Sun, 15 Jun 2025 10:51:28 GMT

Imagine this scenario - you are racing a month end deadline, you paste a three-hundred-line Python script into Claude and ask it to tidy the data wrangling loops and sort out the joins. Thirty seconds later the model returns a neat rewrite, the unit tests stay green, and the BI dashboard shows no alarms. A teammate scans the totals and murmurs, "umm, the numbers look light." Ten minutes of digging uncovers the culprit, deep in one aggregation, the model has swapped a SUM for an AVG, trimming every cohort’s revenue by nearly eighty percent. The static linter is pleased, the pipeline stays green, and without that quick human sniff test the mistake would slide straight into the board deck.

I am sure we’ve all encountered some form of this workflow and went hard at the keyboard “I said replace em dashes”, “Please don’t use any article older than 3 months for this research”, “Why can’t you write this simple code without an issue”, often including a swear word or two, not only to feel in control, but also to reinforce to the machine that since you’re paying $25 a month and Twitter says the latest version beat some benchmark, it should operate as intended, especially for a task as simple as the one at hand.

That quiet moment captures the tension Balaji distills when he contrasts typing prompts with verifying answers. Typing costs nothing beyond curiosity; verification costs time, nuance, and often domain-specific instrumentation.

Terence Tao similarly explains the issue with his eye test versus smell test analogy. The eye test checks surface polish, whereas the smell test taps an intuitive sensor built from years of living with the data, the code, or the proofs. (Watch the full interview, it's excellent)

It’s an over simplification, but at heart a large language model is a conditional probability machine - given tokens t₁…tₖ it produces a distribution over tₖ₊₁, an operation repeated until the stop signal fires. The model has no internal table of verified facts, only a multidimensional map that associates contexts with likely continuations. When the distribution flattens because the prompt pushes beyond the support of the training corpus, the sampler must still commit to a token. The gap between the prompt and the training data is bridged with the most statistically compatible guess. In practice, the guess can be a phantom citation, a fabricated function name, or a proof step that skips three lemmas and declares victory. The literature labels the phenomenon hallucination, but the term is slightly misleading. The model does not see a false world, it simply predicts the next token under uncertainty in the only way it knows.

The confidence that accompanies the fabrication is baked in as well. During supervised fine‑tuning the loss function punishes low log‑likelihood on reference tokens, so the network learns to assign high probability mass to fluent sequences. At inference, the sampler often operates in greedy or top‑p mode, concentrating on the upper rim of that distribution. The result is prose that sounds certain even when the epistemic footing is weak. You can clip the temperature or add nucleus randomness, the tone remains smooth because the density of the manifold itself is smooth.

Researchers have been witnessing and measuring these failures across domains. In mathematics the Lean‑Dojo benchmark shows transformer proofs that look impeccable yet omit indispensable sub‑lemmas. In software security audits, tools like CodeQL find synthetic package names slipped into import statements that compile but at runtime resolve to nothing. In scientific writing, there are several ACM and other papers which report a large percentage of citations produced by unrestricted generation pointed to non‑existent papers. The pattern repeats because the underlying mechanism repeats, every time the model ventures outside the core of its training distribution it continues the sentence anyway.

The teams i work with also keep a similar log - we have seen several instances where the models invent Salesforce object names, mislabel ISO date offsets, misunderstand the applicability of SQLs in Presto/Trino/Dremio and hallucinate rate‑limits on APIs that never existed. Each error on its own is small, but the verification cost compounds.

GenAI feels miraculous because it collapses drafting latency - a marketing email, a legal rider, or a database schema arrives in seconds. That miracle ends when the verification cycle starts. Engineering rituals that once belonged to the author now move to the reviewer; integration tests, redlined citations, and adversarial QA pile up. In analytics pipelines we find that every minute saved at generation returns as three minutes spent on validation queries and anomaly plots.

Things are getting much better with newer models featuring advanced reasoning and deep tuning, which go on to set new coding and reasoning benchmarks, but the overall narrative in organisations stays similar - teams report a predictable trajectory. In the first sprint productivity leaps because the backlog is draft heavy. By the fourth sprint the ratio between creation and audit normalizes, sometimes inverting when downstream consumers tighten controls. The tool has not degraded, it's just that the entropy introduced by fast drafts has surfaced.

The bottleneck is being tackled on multiple fronts: ongoing work on neural-symbolic loops which tie a transformer to a proof assistant such as Coq. The model proposes a step, the checker accepts or rejects, and the gradient shifts toward proofs that survive formal scrutiny. Then, there are execution sandboxes which compile every generated function, run unit and property tests, capture traces, and feed the failures back to the generator until all tests pass. Lots of interesting work also happening on retrieval-augmented decoding which pulls vetted passages from a curated index before token sampling, then stores hash links that let readers click-through to the canonical source, a trick that has already cut fabricated facts by double digits in long-published toolkits such Phi-3 and LlamaGuard. Finally, uncertainty and critic layers quantify variance over multiple forward passes and flag the high-entropy spans that merit human review. Each method spends extra compute up front yet replaces ad-hoc eyeballing with systematic, architecture-level checks.

The last bit is what most of the enterprise use cases are leveraging in some form or the other today (and FWIW, it is also a similar foundation I use across most of my builds) - i.e. include leveraging critic models; AI systems trained explicitly to interrogate and validate other models' outputs, and automate preliminary layers of verification. Net net, one model drafts, a second adversary searches for contradictions, a third arbitrates and so on. This approach, however, risks circular reasoning. Without external grounding in independently verified data or execution contexts, critic models may amplify rather than mitigate hallucinations. The technique also yields diminishing returns if the agents share weights, so effective systems randomize initialization or training perspectives. Importantly the arbiter must defer to external ground truth, otherwise the debate becomes closed circuit.

Despite all the issues and ongoing research to solve for those, a persistent human-in-the-loop framework remains the single most proof way to manage and reduce risks; particularly within high-stakes domains involving customer-level solutioning and reasoning, and the risks are more acute in industries such as healthcare, finance, and law. For the most prime verification, human expertise is still paramount and essential to contextual grounding, bridging the gap between AI-generated plausibility and genuine trustworthiness.

The deepest leverage to solve this verification-at-scale problem belongs to the organizations that own the pre‑training stack. Only they can modify the loss to reward verifiable output directly, embed symbolic reasoning layers during training, expose intermediate attention maps for calibration, or fine‑tune on datasets where correctness is measured by formal acceptance rather than by human preference. External validators can audit and provide domain-specific verification (there are tons of repos on Git, and yes quite a number of startups are building domain-specific stack to validate, providing a factual and symbolic reference stack for comparison, and are making money off it), but they cannot re‑shape the inductive bias at the root. Easier for an organisation which owns the existing distribution to add verification, than a very strong verification-only platform, which has to scale and capture the inference stack, and at the same time also grab distribution. In this scenarios, economic incentives align as well - deployment stalls when the inference owning enterprises realize that each hallucination carries legal or reputational liability. Providers that solve verification at the architectural level will unlock the next adoption wave. Trust, like latency, is a platform metric.

The lure of instant fluency is real, yet intelligence is more than the absence of grammatical error - IMO, it is fidelity to reality and a willingness to expose uncertainty. Until machines can offer that fidelity natively, humans will keep a hand on the circuit breaker. And from where i see things in practice, the role shift is healthy: we move from clerical drafting to higher order judgment, supervising an eager assistant that is brilliant, tireless, and occasionally delusional. It’ll sound cliche, but symbiosis does beat substitution - we prompt, the model drafts, our tests verify, the model learns the correction, and the loop tightens. Productivity rises again, this time on a foundation of trust, not wishful acceptance. In that future the smell test will still matter, but it becomes a final flourish instead of the last line of defense.

Why Taste will matter more than Code in the AI decade

piyush sagar mishra — Sun, 04 May 2025 07:15:18 GMT

In just the 1st quarter of ‘25, the AI and software landscape has seen explosive, measurable growth: 150+ AI-focused startups secured initial funding in Q1 2025 alone, collectively raising more than $5 billion across sectors like enterprise automation, healthcare, and creative industries. AI startups now capture over 33% of global VC funding, with $120 billion invested in 2024 and the momentum accelerating into 2025.

This surge is reflected in the unicorn landscape as well: nearly half (48%) of all new unicorns in 2025 are AI companies, with 11 out of 23 startups reaching $1B+ valuations in Q1 alone coming from the AI sector.

Low-code and no-code platforms are also reshaping the creation landscape - by end of 2025, 70% of all new business applications are projected to use low-code or no-code technologies, and over 500m+ apps and services will be developed using cloud-native approaches. This democratization of development is empowering not just engineers, but business users and entrepreneurs to launch products faster than ever.

All this raises a simple question: if everyone can now build, what makes a product stand out?

Michael Truell (@mntruell), co-founder and CEO of Cursor, said it plainly on Lenny’s podcast that the playing field for creation is now flat. You no longer need to know how to code to launch software. What separates a project that gets ignored from one that gets used—and loved—isn’t execution anymore, but rather judgment, and judgment, in the deepest sense, is taste - the human ability to discern what feels right, what flows, and what deserves to exist.

This is something I’ve been thinking and talking about a lot lately, especially with friends and family who have young kids. Many of them are curious, creative, and eager to build things—but they’re unsure what skills to focus on, or where to even begin. I’ve found myself coming back to the same core idea in those conversations, and I wanted to share it here as well.

I. Taste is built by consumption, but refined by discrimination

If you want to build taste, the first step is getting yourself amidst good work - taste is built on recognition, and you can’t recognize quality if you haven’t seen it. So you read obsessively, you walk through museum exhibits you don’t yet understand, you download apps that aren’t for you just to observe how they behave. Study the UI of Notion, flow of Figma, precision and speed of Superhuman - not to imitate them, per se, but to understand what makes them feel so considered.

But exposure alone isn’t enough. You also have to filter.

Taste begins to form when you stop consuming passively - for example, you look at a landing page and ask: Why does this feel off? or, you watch a film and notice the rhythm of its cuts. You observe that one product feels scammy while another feels confident, even if they offer similar features. You learn to listen to your gut - but you also train it.

The skill is not just to notice beauty or coherence, but rather to name it and then eventually to recreate it. Here are 2 articles i absolutely love on this topic

On Taste (Julie Zhuo, Medium)
Developing Good Taste in Design (Hongkiat) — some thoughts on why stepping away from screens and exposing yourself to physical environments deepens your design instincts

II. Make things —> destroy them —> make them again.

It might sound obvious, but taste doesn’t emerge in theory, and one of the clearest ways I’ve found to explain where it begins is this: it’s forged in the frustrating gap between what you imagine and what you’re actually able to create.

When you try to design a homepage and it ends up looking clumsy, that’s when you start to understand spacing. When your copy sounds flat, you notice how rhythm and tone work in the writing. When every startup name you come up with feels off (guilty!), you start developing an instinct for what evokes clarity and trust.

This gap—between what you hoped something would be and what it turned out to be—isn’t failure, but feedback, and is one of the major drivers behind how your taste gets sharpened.

And the loop is the same for everyone: make something, hate it, revise it, and try again. The people we describe as having “great taste” are often just people who’ve made and thrown away more drafts than you can imagine.

(please go watch this: https://stripe.com/en-sg/sessions/2024/craft-and-beauty-the-business-value-of-form-in-function)

III. Taste lives at the intersection of disciplines

I know there’s tons of literature on how the most creative experiences come from being and thinking alone, but you absolutely don’t develop exceptional taste by staying inside your silo. In fact, some of the richest creative instincts come from outside your own medium; the restraint of Apple borrows from Bauhaus design, the warmth of Pixar comes from the rhythms of jazz, Stripe’s branding has echoes of classic editorial layouts from The New Yorker and Monocle, and Airbnb’s design system draws as much from hospitality as it does from tech.

If you want to design better apps, study how restaurants guide their guests. If you want to write better copy, read poetry and political speeches. The connective tissue between seemingly unrelated fields is where creative judgment becomes unusually sharp.

The most tasteful people I know are almost always generalists with obsessions. They care deeply about food, architecture, typography, furniture, photography, film. They follow their curiosity across disciplines—and then bring it back.

One of the first ideas I was exposed to early in my career—thanks to Dhiraj Rajaram, founder and CEO of Mu Sigma, where I started out—was the power of interdisciplinary thinking. It’s a perspective that has stayed with me ever since. You can watch him share some of that point of view here https://www.mu-sigma.com/founders-quirks/a-lesson-in-interdisciplinary-thinking/

This publishing from Artsy is another lovely, quiet essay on learning to see better across disciplines.

IV. Surround yourself with people who have better taste than you

Taste is contagious- if the people around you are okay with shipping “meh” work, you’ll start to tolerate it too. But if you’re surrounded by people who push, who prune, who say “this isn’t good enough yet” without ego or apology—you’ll grow faster than you thought possible.

Some of my biggest creative leaps in my career came not from books or blogs, but from watching my manager or another colleague quietly improve my work and realizing what I had missed. Learning good taste at work or in social setups isn’t about hierarchy - it’s about environment. No matter how hard you try, you can’t edit your own blind spots. That’s what other people are for.

So ask: “What feels off here?” Most of the times you’d see people struggle telling you what exactly feels off, but listen when someone can’t articulate it, but knows something’s not right - those vague signals are more valuable than they seem. That’s your taste learning to speak.

I like some of the chatter here and here on personal design style and how it’s mostly inspired by anything but personal.

V. Slow down and let things simmer.

We’re trained to ship fast, iterate fast, fail fast, but taste doesn’t work that way. It works more like cooking - rewards time, distance, and re-approach. The best creative decisions don’t always arrive in a sprint—they often come the next morning, or a week later, when you finally see what’s missing.

You notice a button that feels loud. A sentence that tries too hard. A layout that suffocates. And with that space, you fix it—not by adding more, but by removing what doesn’t belong. It’s less about perfectionism and more about patience; about letting a decision settle into its best shape.

Check out this (especially 25-35 minutes into it)

VI. Taste is a lifelong practice. All good things are.

You never arrive at “having taste”, neither do you wake up one day and just “build taste” Needless to say, you’re just always refining it - your bar will keep rising and the work you once admired will start to feel naive (my favorite movie when I was 20 versus today, for example).

Taste doesn’t ossify but sort of breathes, and like any practice—writing, math, chess, cooking—it atrophies if neglected. The people with enduring taste are the ones who stay curious, who keep consuming, keep discarding, keep stretching toward something just out of reach.

Until a point where high quality taste does become the only reason someone gets paid or not, people who have a good rhythm to building taste are not looking for perfection - only for what feels right—and better than last time.

When i share some of these ideas in my circle - one of the most common misunderstandings I hear is that taste is a luxury or that it is a “vibe” or that it is about fonts, filters, or gradients. It is not.

Taste is only knowing what deserves to be built (or not)—and how to build it well.

In a world where AI can write your copy, design your layout, deploy your backend, and even sell for you—judgment is one of the few creative advantages that will take the longest to be automated.

The great B2B SaaS replatforming

piyush sagar mishra — Thu, 10 Apr 2025 10:47:43 GMT

Disclaimer: The views and opinions expressed in this article are entirely my own and do not represent the official position, strategy, or opinions of my employer. This is purely a personal perspective based on my observations of the industry.

Reading through these bullet points will take less than 2 minutes. If you see your org in this list, go ahead and read the entire post. If not, you should get back to building.

Your org is still hiring outbound SDRs like it's 2018—with a focus on call volume, activity metrics, and "grit”. SDR/Sales job descriptions still talk about "hustle," "cold calls per day," and "objection handling," instead of AI tooling, account strategy, prompt techniques, or agent coordination.
Sales productivity continues to be measured by number of meetings booked/outreach rates, where the first outreach takes 48+ hours
Marketing performance is tied to form fills, ebook downloads, and lead magnet conversions—without anyone asking if those leads actually close. You also worry about declining form fill conversion rates and celebrate growing rates irrespective of what's happening with revenue
Email nurture flows are built in first-gen MAPs (no names, but you know who I'm talking about) still based on rigid stage gates instead of adaptive paths.
Onboarding is a tooltip tour, maybe a checklist..doesn't change based on who the user is. AI shows up as an "add-on" feature—not something that's actually embedded in the product flow
The website is static. Everywhere. Everyone sees the same homepage, no matter who they are or where they came from. Gated PDFs and whitepapers still dominate your content strategy, even though no one reads them and most links go straight to bounce
Support teams are scaling by headcount, not by training or tuning AI agents. CS teams are mostly manual. No ticket routing, no sentiment tracking, no proactive saves. Just check-ins and QBR decks
Analytics teams are still being asked to build a new dashboard every time someone wants a metric, instead of designing clean semantic layers for agents to use. BI tools are everywhere, but usage is still 5% of the org. Most dashboards were made for QBRs and haven't been opened since. You are still hiring analysts who can make BI reports.
Pricing is still seat-based—even for products where usage and seat count have nothing to do with each other
Your marketing teams have low to no fluency in LLMs, prompt chaining, or agent workflows, and your analytics team doesn’t even have Cursor/Windsurf installed on their laptops.
No one's testing agentic experiences or AI-native growth loops. There's no experimentation budget carved out for it either

Sounds like your org? Don't worry at all, and here's why.

The SaaS model most of us grew up with is starting to break down in ways that are easy to ignore unless you're looking closely. If you're building, investing, or even just working in software today, you've probably felt it already. A friend who used to run a big outbound team now tells you they're shutting down a big chunk of their SDR org. A startup founder in your network just posted on LinkedIn a record MRR with a team of three. You open a product and instead of a tour or walkthrough, you're greeted by an AI that knows exactly what you're trying to do. You see Tobi Lutke posting "Before asking for more Headcount and resources, teams must demonstrate why they cannot get what they want done using AI." , and you come across Klarna's article stating their agent does the work equivalent of 700 full-time agents and wonder why your team is struggling. All of this is happening at once, but most people still talk about it like a set of random tactics, and not what it actually is: the end of a certain kind of SaaS company.

The last decade was about getting better at a model we already understood. There was a pretty standard playbook, and most companies ran some version of it. You generated leads, captured them through forms, routed them to BDRs, and tried to move them through onboarding. You charged per seat, tracked pipeline in a CRM, and scaled headcount as a signal that things were working. You built dashboards for every team and layered on tools to make the engine run more predictably. Finance had their own thing going—usually a Google Sheet with 12 tabs, pulling from six places, that only one person knew how to update. And for a while, all of it more or less worked.

That's the part that's ending. And it's not because some new GTM guru wrote a different playbook—it's because the foundations have changed. Buyers have more context. AI has flattened the cost of building and shipping — Product tours are turning into real conversations. The go-to-market motion that used to require a 50-person team is now getting handled by a founder and a few agents. And the friction we used to accept—forms, seat-based pricing, linear onboarding, headcount as leverage—isn't just outdated. It's starting to get in the way.

This isn't a list of trends and it's not even just a corporate hype anymore. It's a picture of what's already changing, drawn from what I'm seeing in my own work and in conversations with SaaS founders, growth leaders, sales operators, product managers, and investors across the US, Europe, India, and Southeast Asia. Unfortunately, a lot of very senior executives across organizations still think it is hype or they have a lot of time to catch up.

I have a front-row seat to some of this shift as an “AI adopter”, and what’s happening is wild. Just last week, a colleague told me they replaced six workflows with one agentic workflow over a weekend. The week before, a good friend, and a founder cut their onboarding time in half with a simple prompt change. I keep a list of "mind-blowing AI examples" and have to update it almost daily because things move so fast.

Trying to keep up is exhausting and in all truthfulness, any prediction beyond 90 days will leave one very embarrassed but I'll take the risk - here are some trends I am witnessing move too fast.

1. Forms will disappear from homepages

For the longest time, the form was non-negotiable. Demo request? Form. Pricing question? Form. Whitepaper download? Also a form. Name, email, job title, company size. Then wait. Sometimes a rep would get back to you in a few hours. Sometimes it took days. And in the meantime, you'd get dropped into a nurture sequence that usually assumed you knew nothing about the product.

But the people evaluating tools in 2025 don't look like that. They're showing up with more context, more urgency, and much higher expectations. They've already asked ChatGPT or Perplexity to compare their options. They've seen walkthroughs on YouTube. They've tried out a competitor's free plan. They don't want to wait two days for a BDR to email them back. They want to try the product. They want specific questions answered. And they want that to happen now.

Some companies are starting to use AI SDRs and tooling to drive agentic marketing on your webfront to make the form experience feel a little less static. These tools do a decent job of catching high-intent visitors and routing them to the right place, or popping up a chat experience that feels more tailored. But even then, it's mostly optimizing around the form—not replacing it.

And then there are a few who're removing forms altogether. Not hiding them behind modals. Actually removing them. They're replacing the form with a conversational AI that can answer product-specific questions in real-time, guide someone toward the right plan, and hand them off to a human only when it makes sense. It's not just a chatbot—it's a lightweight, contextual guide that can move someone through the first few steps of evaluation without asking for anything upfront.

One SaaS company I spoke with—a mid-market player in web automation—replaced their main demo form with a GPT-powered assistant trained on their onboarding data and help docs. Their lead volume dropped by 40%. But the qualified pipeline went up 70%. The people who didn't engage weren't serious. The ones who did converted faster, asked better questions, and didn't need as much hand-holding.

A lot of other companies are placing these agentic engagements as email or voice conversations beneath the form, so visitors fill the form and get into an email flow talking to an AI who helps them get started and answers questions around pricing, features, integrations, seats, etc. Most of the organizations offering transactional products/services do not realise that this approach of AI email engagement is actually worse than using a GPT-assistant right on the web as the customers just need to check pricing, features and other tactical questions which may be better served in real-time than via a back and forth email exchange.

The shift here isn't just about automation. It's about making sure your first touchpoint matches where the buyer already is. When someone arrives with that much intent and context, a static form feels like a delay tactic. And in a world where someone else offers them a smarter entry point, you might not get another shot.

That said, forms won’t be completely dead across all industries. Some sectors will maintain forms out of necessity rather than choice. Highly regulated fields like healthcare, where HIPAA compliance demands careful handling of information, will keep forms as verification checkpoints. Financial services, particularly for high-net-worth products, will still need qualifying barriers to filter out tire-kickers. Enterprise security software vendors might retain forms as part of their security theater – the logic being if you can't handle a form, you're not serious about security. But even in these cases, the smartest players are building conversational layers on top to make the form-filling experience less painful. The forms might stay, but they'll become the step after engagement, not before it

2. SDRs will shrink and more importantly their days will look very different

The idea that AI is going to "replace" SDRs is too simplistic. The role isn't disappearing—but it is changing fast. What used to be a high-volume, semi-scripted job is turning into something a lot more specialized.

In the old model, SDRs were responsible for volume. That was the entire point. You made your 100 calls, sent your 100 emails, chased the meeting. The best reps were the ones who could crank through the list with energy and bounce back from rejection. But most of that front-loaded work—finding leads, enriching data, tracking intent signals, running sequences, booking time—is now better handled by tools. A ton of teams are already doing this using Clay, Qualified, Apollo, Tactic, and a stack of GPT-based workflows.

So the question becomes: what's left for the human? Turns out, quite a bit—but it looks different.

Instead of focusing on volume, the SDR becomes more like a deal coordinator. Their job isn't to chase 200 strangers—it's to understand the 10 accounts that actually matter. They're figuring out how the buying committee is structured, what the internal politics look like, where the friction might show up, and what kind of narrative will actually land across functions. The meeting is often already booked. Their role is to make it stick.

The tooling is changing here too. Right now, most agentic workflows run 1:1—one agent talking to one contact. But the near future looks more like one agent per persona within an account. You've got one agent talking to the RevOps lead, another checking in with a security engineer, another nudging the product team. And the human SDR? They're managing the overlap. They're picking up on signals, resolving confusion, adding the nuance that the AI can't catch, yet.

They'll also play a bigger role in expansion and retention. Especially in land-and-expand models where the initial deal is small and the upside lives in usage growth. AI can alert you to usage spikes or login drops, but it still takes a person to follow up, see what's going on, and act with context.

So it's not that the SDR goes away. It's that the job moves up a level. Less brute force, more orchestration. Less activity tracking, more relationship management. The teams that figure this out early will be running leaner, more focused outbound programs—with better outcomes and fewer people.

3. Marketing websites will become hardcore interactive products

Most SaaS websites today still assume that the person visiting them knows nothing. They show everyone the same homepage: a headline about unlocking productivity, three vague benefits, some logos, and a CTA to book a demo or start a trial. If someone scrolls far enough, they might get a pricing table or a product walkthrough. And maybe, if they're lucky, a chatbot opens in the corner offering help.

But that experience doesn't reflect how people actually arrive anymore. The visitor might have seen your launch thread on X, or your integration on G2, or a teardown on Hacker News. They might be returning for the third time after sharing your pricing page internally. And they're still seeing the same generic homepage as someone discovering you for the first time.

This is going to feel increasingly dated. The shift that's coming—and in some cases already happening—is toward dynamic, personalized websites that behave more like real-time applications. Instead of routing people through static content and hoping something resonates, these new sites will adjust themselves based on who's visiting, where they came from, and what they're trying to do.

Lots of teams are already experimenting with this using tools like Qualified, which helps identify high-intent visitors and routes them into intelligent chat experiences. But what's coming next goes beyond that. It's not just changing the engagement layer—it's changing the experience layer.

A small adtech startup from India is building a lightweight engine that rewrites the homepage in real time based on visitor context. If someone from a Fortune 500 IP visits the pricing page twice in one week, the homepage changes to show volume-based pricing, an offer for enterprise onboarding, and a preloaded calendar link. If someone arrives from Reddit on mobile, they see a stripped-down version of the site with a demo video and a three-step setup flow. Same product, totally different experience.

What will disappear here is the idea that the marketing site is just a glorified brochure. By 2026, it'll feel weird to show the same version of your homepage to a junior engineer at a seed-stage startup and a VP of Operations at an S&P 500 company. Today, most of the personalization sits after the visitation - specific ABM or growth nurtures based on who visited and what they did on your website. But it will all move upstream to the point of visitation. We'll stop optimizing landing pages for conversion rates in isolation and start optimizing for journey continuity.

4. Seat-based pricing will break completely

Per-seat pricing has stuck around mostly because it's easy to explain and model. You sell 50 seats, you charge $50 a seat, great—you've got a neat little line item for finance and a nice multiplier for your valuation.

But this whole setup is starting to collapse. The buyer doesn't care how many people are logging in. They care what the product is doing for them. And in more and more cases, the product is doing a lot—with fewer people.

Automation is a big reason for this. Small teams are running massive workflows with barely any active users. You might have five people in the tool, but it's triggering dozens of actions every day, touching ops, finance, product, and customer success. When you ask that team to pay per seat, it feels like you're penalizing them for being efficient.

Even worse, per-seat pricing creates internal friction. I've heard teams say things like, "Let's not invite marketing yet—we'll have to pay for more seats," or "Let's wait until next quarter to roll this out to support." That's a bad place to be as a SaaS company. The thing you're selling becomes less valuable as it spreads. You end up discouraging usage.

Some companies have already moved. Workflow tools are charging per automation run. Analytics tools are charging per insight delivered. In both cases, customers don't blink—because it feels fair. Even if they end up paying more, it makes sense.

By 2026, I think most AI-native products—and plenty of others—will move to some form of usage-based or outcome-based pricing. Or at least hybrid models that combine access with activity. The pitch won't be "add more users." It'll be "get more done." And seat count just won't be the thing anyone anchors on anymore.

5. Search will get replaced by recommendation

This one's already in motion, but most SaaS teams haven't felt it yet. We're still operating as if discovery starts with search. You write blog posts. You build landing pages. You try to rank for "best API platform for fintech." You invest in SEO and hope someone clicks.

But discovery now increasingly starts with prompts. People are asking ChatGPT or Perplexity, "What's the easiest way to do approval workflows for remote teams?" or "Which CRM integrates best with Notion?" And they're getting answers—not links.

That shift changes everything. If your product isn't showing up in those answers, it's invisible. It doesn't matter how good your blog is if the model didn't get trained on your content or didn't consider it authoritative. And unlike SEO, you can't pay your way to the top of the response.

This changes how companies need to write. It's less about keywords and more about structure. Clear documentation, consistent naming, well-defined use cases. AI doesn't care about your brand tone. It cares if your product makes sense.

I think we'll also see new layers of recommendation emerge. Not just prompts, but passive discovery. Agents embedded in someone's workflow that say, "Hey, you're doing a lot of work in Sheets—do you want this automated?" Or "You keep sharing Notion links with your team. Want to auto-tag tasks from Slack?" This won't be ads. It'll be embedded logic that suggests tools at exactly the right moment.

What disappears here is the whole idea of top-of-funnel as a "capture" problem. You won't be fighting to get listed on G2 or gaming keyword density. You'll be trying to make your product legible to models and agents—because that's where the recommendation will come from.

6. Onboarding Will Become a Conversation

Most onboarding today still looks like a checklist. You get a tooltip that tells you what button to click, maybe a guide that explains how to create your first report. It's designed for the average user, which means it's usually kind of generic and kind of long.

But with AI in the loop, onboarding is starting to look more like a real conversation. You open the product and it asks, "What are you trying to do?" If you say "automate our weekly team report," it walks you through that exact setup. If you say "get alerts when a customer downgrades," it shows you a template and offers to plug in your Stripe data.

I know a SaaS company that rebuilt their onboarding this way, using a GPT-powered assistant as the front door. They ask three questions, then dynamically rewrite the interface—highlighting different sections, loading the right templates, and showing different documentation. Activation rates jumped from 60% to 88%. That's not just a better flow—it's a completely different experience.

What's going to disappear here is the idea that users will teach themselves. The default expectation will be: the product should understand what I want and help me get there. Anything that feels like work—or feels too generic—will quietly drive churn from day one.

7. Ghost Teams will become the norm - low headcount/more leverage

One of the things I keep noticing—often in the background of these other shifts—is that the teams doing the most interesting work right now are small. Not "scrappy startup" small. I mean structurally lean. Three to five people running products that feel like they should require 20. Founders who don't plan to hire a sales team. Engineers who are also running onboarding. Marketers who ship with Notion AI and a Loom walkthrough.

I talked to a founder recently who's running a $2M+ ARR SaaS business with zero employees. He uses GPT-4 for support, Jasper for content, Notion AI for docs, Zapier for ops, and a no-code builder to ship product updates. His burn is under $4K a month with a full stack subscription rollup. From the outside, the business looks like it has 15 people.

And this isn't a weird exception. It's becoming a pattern. People aren't scaling by hiring—they're scaling by stitching. Using AI to handle the parts of the business that used to require people. Not because they're cheap, but because they're fast. Because they don't need coordination meetings.

What disappears here is the assumption that growth equals headcount. Or that momentum needs to look like hiring. The teams that win in 2026 will look small from the outside but operate with a level of leverage that older companies just can't match.

What we’ll see in companies that are falling behind?

None of these changes are isolated. They compound. When your onboarding gets smarter, your support volume drops. When your homepage becomes a product, your form goes away. When your pricing matches usage, you expand faster. When your team is five people instead of fifty, you move faster by default.

So what disappears isn't just a set of tools or tactics. It's the entire layer of operational weight that companies used to treat as necessary. Static forms. Seat-based pricing. Manual support queues. Headcount as a proxy for progress. These things aren't being phased out slowly. They're being replaced—quietly and quickly—by teams who are already building in the new model.

If you're starting something now, you're not late. You're early to the next wave.

But only if you're willing to let go of what worked in the last one. If you’re hesitant, remember that there here are always two kinds of signals to decay: the lagging ones that show up too late to do anything about—flat revenue, churn creeping up, win rates falling. And the leading ones, which are quieter but far more useful - less about outcomes, more about behavior. How many times is AI used in your workflows. Does your team “prompt” to build content, or still operate with an agency. Does the VP at your firm post about ML and personalization completely unaware of what’s happening outside in real world? And worst of all—that VP is still waiting for "proof" that this shift is real, instead of looking around and seeing the teams that are already living in the new model.

The hard part here isn't the tech. It's letting go of the habits and heuristics that used to work. But the faster teams are already adapting. They're not just surviving—they're shipping faster, scaling smarter, and spending less time justifying their existence to the spreadsheet in finance.

And if you're reading this and seeing your own org in that list—it's not too late. But definitely time to move.

software is eating work (not just the world)

piyush sagar mishra — Tue, 24 Dec 2024 19:31:07 GMT

Something transformative is happening in software, and the numbers from Battery's 2024 OpenCloud report tell an interesting story. Most people are focused on technological advances in AI, but I think most of the folks are missing a deeper shift in how software creates value.

Traditional SaaS companies built tools that made humans more efficient - but, look at what's happening with cloud providers. Their CapEx growth has shot up 73% y/y in Q3 2024, far outpacing their revenue growth. Why spend so aggressively if you're just building better tools? Because they're not. They're building infrastructure for software that does the work itself.

This shows up clearly in the economics. Take call centers: there are 16 million agents globally with a $200B annual salary spend. Old software tried to make these agents more efficient. New software simply handles the calls. The addressable market just jumped from a fraction of the technology budget to the entire labor budget.

The same pattern appears in legal ($515B labor spend) and architecture/engineering ($270B labor spend). When your software does the work instead of helping with it, your market expands dramatically. That's why Battery estimates a $4T+ opportunity. It's not just eating software budgets anymore - it's eating labor budgets.

This shift changes everything about how software companies operate. Look at the revenue multiples as well — private AI companies are trading at a 3.2x premium to public SaaS companies. The market isn't just paying for growth anymore - it's paying for labor displacement (well..) potential.

Cloud-native companies that have made this shift are growing at a 34% CAGR even at scale; that's not just impressive growth - it's growth that defies traditional software scaling limitations. When your software does the work, you're not constrained by how many people can use your tools — and this IMO is such a nuanced, yet landmark idea shaping the new narrative.

But here's what's really interesting: only 37% of AI use cases are in production today. Most companies are still experimenting. This gap between potential and current reality explains why private markets are paying such high premiums. They see the flip coming.

The cloud providers' behavior is particularly telling. Public cloud providers have grown their combined run-rate revenue to $221B, but they're spending even faster on infrastructure. They're building for a world where software does more of the work directly.

Look at their revenue structure: legacy infrastructure pieces are growing at 19-21%, while AI-related services are growing at 44%. The market is shifting from computing-as-a-service to work-as-a-service.

I think we're witnessing a fundamental transformation in what software companies do. The old model was building tools for human workers. The new model is building systems that do the work. This isn't just automation - it's replacement.

This creates weird dynamics. When you sell tools, you want to make humans more efficient. When your software does the work, making humans more efficient actually shrinks your market. The incentives have flipped.

The companies that win in this new era won't be the ones with the best tools. They'll be the ones that figure out which parts of work can be eaten by software, and build reliable systems to eat them. That's why the ServiceNow story is so telling - they've grown from basic IT tools to $10B in revenue by steadily eating more types of work (from 11 month ago..).

This feels like one of those shifts that will seem obvious in retrospect but is non-obvious while it's happening. Just as the move to cloud changed everything about how software companies operated, this shift from tools to work will change everything again.

The interesting question isn't whether this will happen - the numbers show it's already happening. The interesting question is what software companies will look like when they're primarily in the business of doing work rather than helping with it. That's the great flip happening in software right now.