How do we bridge the gap between enterprise demand and AI supply

What I learned this year, and what enterprises actually need in 2026

Dec 06, 2025

Disclaimer: The views here are my own. They don’t represent my employer or anyone else. Just a personal take based on what I’ve been building, breaking, and fixing across teams and companies throughout 2025.

2025 went by too fast, and it was a big one. The global AI market hit $391B and forecasts now point toward $2T by 2030. Cursor went from niche to absurd growth, hitting $65M ARR with 6,400% YoY expansion. GPT crossed its three year mark (i know, right!) with GPT 5.1 dropping a month ago. World Labs started generating full 3D worlds from text. The tools jumped several leagues in a single year.

And yet, enterprise progress stayed weirdly slower relative to the foundational growth. That is what this year taught me - we unlocked insane front end and generation capabilities while the backend, the workflows, the operations, and the enterprise plumbing stayed stuck in 2024. We built great demos but not as great systems as we could have. We focused tons on generation, not enough on integration. We hyped autonomy while enterprises desperately needed reliability. The model layer evolved faster than the actual organizational ability to use it.

I spent this entire year building AI workflows across marketing, ops, engineering, and product. I watched agents break in every possible way, developers use AI daily but not trust it, non-technical teams unable to convert their knowledge into working instructions etc. Google’s DORA report showed ninety percent adoption but only twenty four percent trust, Stack Overflow showed sixty six percent frustration with agents that are almost right but not quite. That matches everything I saw firsthand 🥲

So instead of another hype post, here is the wishlist for next year - the infrastructure gaps, workflow gaps, a bit of cultural gaps (more on that in another post though). The things that actually broke in 2025 across the five stages of the real lifecycle: Build, Deploy, Distribute, Engage, Scale.

Build: the context collapse and the “build for non-programmers” problem

The build phase went through a total reset in 2025 - tools like Cursor, Replit, and Github Copilot made “type a sentence, get code” boring. Start of the year got Cursor’s 0.43 update strongly adopted with the composer Agent, Windsurf added voice interaction, Claude Sonnet 4.5 hit 75%+ SWE-bench scores.

But as i have mentioned in the past as well, generation was never the hard part - review was, still is. And as more low/no code tooling has evolved, allowing non-engineers to build applications, the problem has gotten much more severe. The cost to write code has almost dropped to near zero, but the cost to understand what got written went through the roof. I’ve seen folks in my team personally debug 600-line agent-generated orchestration logic where no one remembered the original assumption. That’s not development; but borderline archaeology.

The enterprise reality check from Google’s 2025 DORA report was sobering - 90% of developers now use AI (up 14% from last year), but only 24% trust it. Stack Overflow’s survey shows 52% of developers either don’t use agents or stick to simpler tools. We’re in this bizarre “trust paradox” - using tools we don’t trust because they boost productivity by 80%.

And even before you get to code, you hit the “translation gap” I kept running into all year. The people who know what the system should do - marketing managers, ops leads, customer success folks - still can’t turn that knowledge into prompts that produce working software. Gap Inc partnered with Google Cloud to embed AI across operations, but they’re the exception, not the rule.

For 2026, we need to solve this. Not with better generators, but with better reviewers. Here’s some of the big gaps i think would matter a lot -

AI-native code review: Tools like Cursor’s Bugbot and CodeRabbit are pushing boundaries - Bugbot now generates PR summaries automatically and achieves 42% accuracy in detecting runtime bugs, while CodeRabbit hits 46%. Qodo (formerly CodiumAI) is my personal favorite for now, offering context-aware, test-aware, and standards-aware reviews that actually understand our codebase patterns very well. But i think what’s still missing is ensuring regression errors don’t leak through workng features and abstracting code into business context for reviews - who decided what, why did that choice matter.
Guardrails that understand AI-written code: I love SonarQube - it remains a solid enterprise standard with AI-enhanced detection capabilities, while newer tools like DeepSource and Graphite evolving to modern PR workflows with AI-powered context management. The next step in my mind is continuous, contextual validation that goes beyond just linting - tools need to understand architectural decisions and team-specific patterns.
Diffs that explain themselves: When multiple agents propose changes, I want to see not only what changed, but why, and what it affects downstream; crucial for iterative builds, especially as the codebase is refactored from upstream low/no code tools.
Business logic abstraction: Still no good way for non-engineers to express “if X happens, do Y” at scale in plain language and have it generate maintainable logic without breaking something else. Specificity in scoping is hard. When released, i found the new MCP projects sponsored by GitHub copilot and VS Code teams in October very interesting - they’re building frameworks that let AI interact with tools, codebases, and browsers in revolutionary ways.

Deploy: the configuration explosion

Deployment used to be predictable - now every app is a mess of model endpoints, vector stores, auth layers, API keys, and observability hooks. Vercel makes it smooth for our standard Next.js setups, but the moment you need custom routing, compliance, or multi-region data handling, you’re in the weeds again.

Everyone’s building AI infrastructure like it’s the new arms race - which honestly, it kind of is, but getting “Infrastructure as prompt” right is one of big dreams to make upstream devs work - describing what you want (”a multi-region setup with SOC2-compliant storage and automatic rollback”) and letting the system generate the boilerplate to manage integrations. Amazon’s new Nova Act IDE extension is interesitng, Salesforce’s MuleSoft Agent Fabric for orchestrating AI agents across enterprise systems is also interesting, but there’s lots to be done. These layers in the stack would be very useful:

Infrastructure-as-prompt: Everyone’s building demos, no one’s cracked the version that’s secure, predictable, and enterprise-ready - and fits into custom infra
Verification-as-a-service: Microsoft’s MMCTAgent which can reason over hours of video and massive image collections is interesting - imagine that level of verification for generated code. Tools like Qodo’s test generation and CodeScene’s behavioral analytics are getting closer, but we need this as a standalone API any generator could call.
Enterprise guardrails: Rollbacks, audit trails, compliance policies - every team rebuilds these from scratch, all custom-built. Such a waste.
Data residency automation: Still no simple way to describe regional rules and have infra handle routing, caching, and storage automatically from a knowledge base. Lot of custom dev needed for separate instantiation and deployments

Distribute: the discovery layer inversion

This is the part most people underestimate. AI-native distribution doesn’t look like sales, SEO, or App store listings anymore. It’s about being discoverable by other AIs.. when someone asks an AI assistant, “what’s the best way to automate approvals,” the models’ answer is your new channel. That means your documentation - not your ad copy - is the entry point.

The October launch of GPT-5 with its 400K context window and multi-modal capabilities was a decent step up - being able to process text, images, audio, and video simultaneously - they’re not just reading your API docs, but understanding your entire product ecosystem. Baidu’s ERNIE 5.0 is also neat, claiming to beat GPT-5 on visual understanding benchmarks, while Google added Deep Research to NotebookLM, turning it into an autonomous research assistant.

Most of the docs online today are human-friendly and LLM-hostile. They’re full of adjectives, testimonials, and vague promises, but LLMs want clean schemas, examples, and integration specs. Some of the new MCP projects are addressing this - fastapi_mcp was interesting but slowed down, context7 pulls version-specific documentation straight from code into AI prompts. But there’s still a lot of missing standardisation across tools and interoperability. Some thoughts for this phase:

Agent-readable documentation standards: We NEED a shared format for docs that AIs can consume - think JSON-like schemas, explicit input/output examples, and transparent auth steps.
Programmatic capability marketplaces: App store built for agents, not people. Notion 3.0’s AI Agents with memory and connectors show what’s possible, but are too limiting (and i am not a big fan of the review system they have in place)
LLM visibility ops: Meta bringing Llama to federal agencies shows how critical government and enterprise visibility is becoming.
Procurement bridges: Enterprises will expect structured metadata for security, pricing, and compliance that their internal agents can parse when deciding which vendors to recommend.

Engage: so much that could be personalized, but should it be?

Every AI-generated app can personalize itself infinitely - which sounds great until you try to measure engagement. Traditional metrics like MAU or session duration don’t mean much when each user’s product behaves differently. Legacy tools like Customer.io and Braze (I love them both though) assume uniformity, and don’t know what to do when each user gets their own app logic.

According to Stack Overflow’s 2025 survey, developers are most resistant to using AI for high-responsibility tasks like deployment and monitoring (76% don’t plan to) and project planning (69% don’t plan to). The biggest frustration? 66% cite “AI solutions that are almost right, but not quite.” We’re personalizing the wrong things while avoiding the areas where AI could actually help.

Very soon, we’re going to see engagement handled directly by AI agents - personalized outreach, onboarding, and lifecycle management - all grounded in live behavioral data. Perplexity’s new app connectors and enhanced memory, along with Enterprise Max features, show where this is heading. I’d love to see some of these:

Variant-aware orchestration: Engagement systems that understand product usage/engagement/variance per user and can adapt flows dynamically
Agent-owned communication: WhatsApp rolled out message translations earlier this year - imagine agents that can communicate across languages and contexts under brand guardrails.
Knowledge that stays alive: ElevenLabs’ Scribe v2 Realtime with sub-150ms transcription across 90+ languages shows what’s possible for real-time knowledge systems
Rethinking metrics: The shift from “how many users came back” to “did this user achieve their intended goal.”

Scale: where all the generated code breaks

This is where everything that looked fine in dev blows up - code that worked for ten users collapses at a hundred. AI-generated code (so far) tends to optimize for “it runs now,” not “it scales later.” Current tools like SonarQube and CodeGuru help with static analysis, but they’re reactive, not predictive. The newer wave - Qodo with its enterprise-grade reviews, CodeScene with behavioral analytics - are getting better at flagging scalability risks before they happen.

The November LogRocket rankings show GLM-4.5 debuting at $0.35/$0.39 pricing with MIT license and self-hosting capabilities - 90.6% tool-use success rate beating Claude 4 Sonnet. The economics of AI are changing fast. When you can get frontier-level capabilities for pennies, the bottleneck isn’t compute anymore - it’s understanding what actually scales.

Then there’s multi-region deployment, consistency, compliance - all the enterprise stuff AI doesn’t understand yet. Oracle’s new AI platform for government agri risk forecasting and Suki’s nursing consortium for healthcare workflows show some interesting trends in sector-specific solutions emerging. But we need more general solutions:

Predictive scalability checks: Tools that simulate real usage before launch and surface bottlenecks proactively. Qualcomm’s Snapdragon X2 Elite Extreme shows the hardware is there - we need the software to catch up.
Data-policy compilers: Declarative frameworks that encode legal and regional constraints into routing and caching decisions automatically
Continuous performance governance: Systems that enforce latency and cost budgets at PR time, not after users start complaining
Security and provenance by default: Traceable lineage for every artifact or decision an agent produces - not optional, not bolted on later.

The obvious opportunities for 2026

A lot of the gaps I’ve called out above will definitely be tackled in 2026, especially the ones which have built a good foundation and will easily manage the distribution layer too -

Dev tools for AI-generated code will expand far beyond generation. The shift to “review-first” development is already happening. Cline offers open-source BYOK (bring your own key) flexibility, while tools like serena provide semantic retrieval and editing capabilities. Qodo achieving enterprise-grade context awareness, CodeRabbit with 46% bug detection rates, Graphite with modern PR workflows - show that review and verification are becoming more valuable than generation itself, and will have several BIG evolution in the next 12 months
Enterprise knowledge infrastructure will mature rapidly. Companies are already building AI agents for customer service, sales, and support. Google NotebookLM’s Deep Research feature that browses hundreds of sites and creates comprehensive reports shows the direction. We need model-agnostic middleware to allow non-programmers to manage knowledge
Context engineering platforms will expand AI capabilities accessible to non-technical users. Microsoft’s Copilot Studio Wave 2 for no-code agent building is just the beginning. Stability AI’s Image Services on Amazon Bedrock deliver professional-grade editing as APIs. I am pretty sure good things will ship here.
Operational excellence automation will abstract away complexity. GitHub turning Teams conversations into code with Copilot shows how seamless this can become. Amazon, Oracle, and Salesforce all launched enterprise AI platforms in last few months alone.
Engagement infrastructure for adaptive applications needs a rethink. 75% of developers say they’d still ask a human “when I don’t trust AI’s answers” according to Stack Overflow. Building that trust layer is the opportunity.

What 2026 needs to deliver

Looking back at 2025, the pattern is clear: we built amazing generators but (almost) forgot about everything else. The AI value chain keeps getting more lopsided - the build stage gets easier while complexity cascades downstream into deployment, distribution, engagement, and scale.

The market will grow from $391 billion to that projected $1.81 trillion by 2030, but most of that value won’t come from better code generation. It’ll mostly come from solving the unglamorous problems - the integration nightmares, the trust gaps, the review bottlenecks, the scaling failures.

My prediction for 2026 is that the winners won’t be the companies with the best models or the fastest generation, but the ones who finally crack the “last mile” problems that 2025 exposed. The ones who build:

Trust infrastructure that makes that 24% trust number climb to 80%
Review systems that actually understand context and architectural decisions
Deployment platforms that work with real enterprise constraints, not just demo apps
Integration layers that let non-technical teams actually use these tools without breaking production
Scaling intelligence that predicts and prevents the failures before they happen

2025 taught us that anyone can generate code, but 2026 needs to teach us how to ship it. And since everything is plateauing based on where the stack sits today, foundational companies will evolve to solve this, or go vertical-first (where they’ll surely make a lot of money, but maybe not solve enterprise problems)

Amidst all that’s happening, one of the uncomfortable question I keep coming back to is “if creation cost truly drops to zero and anyone can build software, should we even be building software the way we do today? If the constraint isn’t “can we build it” but “should we build it” - then what?”

Maybe a big part of 2026 won’t be about better tools, but about better judgment about when to use them and who should truly use them.

The AI Architect

Dec 7

The trust paradox you highlighted (90% usage but 24% trust) captures exactly what I've been seeing. Teams ship code faster than ever but the review debt piles up because nobody wants to own code they didn't fully understand in the first place. The shift from generation to verification is probably the most underrated problem in enterprise AI right now, and I think whoever cracks agent-readable documentation standards will basically own the distribution layer for the next wave of B2B tooling.

Discussion about this post

Ready for more?