EssayApril 22, 2026· 8 min read

Beyond the Chatbox

The next decade of AI in products won't look like a conversation.

By Pinar Patton

The dominant interface pattern for AI in 2026 is still, somehow, a rectangle you type into.

It's a defensible pattern. It's familiar. Chat is the interaction paradigm most software people have been optimizing for twenty years. Cheap to ship. Do a div, a text input, an API call. It's also measurable: messages in, messages out, tokens per second. For a particular kind of problem, where the user has a well-formed question and is willing to formulate it, it does the job.

But not all jobs. If you've been been building AI features inside products long enough, you' probably notice something uncomfortable: the chatbox is often a transitional form... It is what you ship when you don't yet know what the right interface is.

This post is about what's on the other side.

What the chatbox gets wrong

Start with a concrete observation: most of the time users need help, they are not in a position to ask for it.

Think about the last time you were genuinely stuck in a piece of software. Odds are, you didn't formulate a clean question in your head and then fail to find the chatbox. More likely, you were in a muddle. You half-knew what you were trying to do. You'd clicked some things and they hadn't worked. You were on the fourth tab of a documentation site, skimming. The cost of translating that muddle into a one-sentence prompt... the cost of context-switching out of your actual task to do it... We talking about something larger than the muddle itself.

So you didn't open the chatbox. You muddled through. Or you left.

The chatbox doesn't lose to another AI interface. It loses to abandonment. That's the part that isn't showing up in most funnels, because abandonment is invisible unless you look for it.

Two structural problems compound this:

The translation cost. Turning "I'm stuck" into "I'm stuck because X and I want Y" is itself cognitive work. It's often the hardest part of getting help. Any interface that requires the user to do that work before assistance can arrive is going to miss the users who need help most.

The attention cost. Even if the user has a clean question, the act of deciding "should I open the chatbox, or keep trying" is a context switch. Repeated across a session, it adds up to a tax that users pay on every ambiguous moment. They get tired. They stop asking.

What replaces it

A good heuristic for where a field is going: look at the interfaces that feel expensive right now, and imagine the version where the expensive part is absorbed by the software.

Three shifts stand out.

1. From prompted to ambient

The next generation of AI-in-product doesn't wait to be asked. It watches what's happening and acts when the evidence crosses a threshold. That threshold is the hard part. Act too often and you've built the most annoying product on the market. Act too rarely and you're a lifeguard staring at a drowning swimmer.

The interesting engineering problem is no longer "how do we answer the question well." It's "how do we decide whether acting is warranted, and at what amplitude." The quality of the restraint is the feature.

2. From conversational to situational

A conversation is a single modality with a single grammar: messages in a thread. But assistance in software is inherently multi-modal. Sometimes the right answer is a highlight on the exact field that needs attention. Sometimes it's navigating the user to the correct screen. Sometimes it's a confirmation dialog with an alternative. Sometimes it's saying nothing and quietly pre-loading what they're about to need.

Pressing all of those into "a message from the assistant" flattens them. The chat transcript becomes a lossy compression of what the system actually should have done. The replacement is a system that picks from a vocabulary of actions, each tuned to the situation, rather than a single channel of text.

3. From attention-demanding to attention-aware

The chatbox presumes it is always worth the user's attention. The next wave of products will treat the user's attention as the scarcest resource in the system, and design the assistance layer accordingly. That means measuring the amplitude of every intervention, deciding whether it's warranted, and often choosing the one the user will barely notice — or no intervention at all.

Put differently: the right question is not "what should the assistant say?" but "how much should the assistant even show up right now?"

The shape of the new stack

If the interface is ambient and multi-modal, the architecture underneath has to change too. A few things gain vital importance:

A sensing layer. Not logging. Not analytics. A real-time source of structured evidence about what the user is doing, how well it's going, and where the friction is. This layer has to respect privacy by construction: raw signals stay on the client; only high-level descriptions go on the wire.
A state model. Something that maintains, in near-real-time, an evolving picture of what the user is trying to accomplish, their progress, and their trajectory. Not a chat history, but a situation.
An intervention policy. The code that takes the state model and decides: nothing, a cue, a highlight, a navigation, an intercept. This is the taste layer. Most of the product quality lives here.
A narrow, visible action surface. When the system does act, it acts in ways that are legible to the user. No black-box rewrites. No silent edits. The user should always be able to see what the system did, and why.

Notice what's not on that list. A big conversational transcript. A prompt template the product team tweaks weekly. A RAG system whose value is "we can answer questions about your own app." Those are chatbox-era infrastructure. Some of them survive, repurposed; a lot of them don't.

What this means for product teams

If you're building AI features right now, the useful question is not "is our chatbox good enough." It's:

Where in our product is the user most likely to be stuck in a way they can't articulate?
What would a lifeguard watching that screen do?
What is the lightest possible intervention that would help?
How often should we do nothing? (hint: very, very often)

That last question is the one most teams skip. They build an AI feature, measure it by how often it's invoked, and optimize for engagement. But for ambient intelligence, engagement is often the wrong metric. The right metric is something closer to user task completion, in the presence of the assistance layer, compared to without it. If the system makes users faster by mostly staying quiet and occasionally being exactly right, it's winning.

The bet

Here's the bet we're making with CogStream: that the category-defining products of the next few years are not going to be the ones with the best chatbox. They'll be the ones whose intelligence is woven into the product, watching the room, using restraint, and acting only when it's earned.

The chatbox isn't going away. For some tasks (open-ended research, composition, reasoning under uncertainty) the prompt is the right interface, and it'll stay the right interface. But as the interface for AI in products, it's a placeholder. It's where we are because we got here fast. It's not where we're going.

The next interface is quieter. It already has the context. It doesn't wait to be asked. When it acts, it acts precisely, and most of the time, it doesn't act at all.

That's the lifeguard. That's the room, read.

That's what needs to get built.

The next decade of AI in software won't be a conventional conversation. It will be a presence.

← All notes Request early access →