10 minute read

In my previous post, I argued that the architecture of the next platform shift is being assembled. Protocol, agent layer, marketplace, multi-client adoption — the pieces are falling into place.

But architecture doesn’t guarantee adoption. Web 3 had architecture too.

The question for Web 4.0 isn’t whether the technology works. It’s whether anyone actually wants what it offers. I think the answer is yes — but not for the reasons most AI coverage suggests.

What Web 3 Got Wrong

I think Web 3 failed because it solved a problem consumers didn’t have.

The pitch was compelling if you were a technologist: decentralised ownership, self-sovereign identity, trustless transactions, no intermediaries. You’d own your data, your digital assets, your identity — free from platform control. I get the appeal. I really do.

But consumers heard something different: manage a wallet, pay gas fees, understand blockchain mechanics, lose everything if you misplace a seed phrase. More friction, more complexity, more cognitive load — for a benefit most people never asked for. Consumers don’t lie awake thinking about data sovereignty. They lie awake thinking about whether they’re getting ripped off on their car insurance.

Web 3 asked people to care about the plumbing. If I’m right about Web 4.0, it makes the plumbing invisible.

The Problem Everyone Already Has

Web 4.0 starts from the other end — a problem so pervasive people have stopped noticing it.

You want to book a trip. So you open Skyscanner for flights, Booking.com for hotels, Google Maps for restaurants, TripAdvisor for reviews. You cross-reference across tabs, apply filters, convert currencies, check cancellation policies, compare loyalty programmes. Two hours later you’ve made a booking — and you’re not confident it was the best option.

You need insurance. You visit comparison sites that have their own promoted placements, wade through fine print, try to understand the difference between policies that look identical but aren’t. You either spend hours becoming a temporary expert or you pick something and hope for the best.

You’re moving to a new city. You need to register your address, find an internet provider, set up utilities, locate a GP, understand the local public transport options, figure out waste collection rules. Each of these is a different website, a different account, a different set of forms — and none of them talk to each other.

You’re looking for a job. You know what you want — but every job board requires you to mentally translate that into their rigid implementation. Location dropdown, seniority checkbox, keyword search that doesn’t understand context. You don’t think in filters. You think “senior engineering role, remote-friendly, interesting product, not adtech.” Good luck expressing that in a search form.

You want to dispute a charge on your bank statement. You navigate a phone tree, wait on hold, explain the situation to someone who transfers you, explain it again, get told to fill out a form online, log in, can’t find the form, call back. An hour of your life, gone.

These aren’t edge cases. This is just… Tuesday. This is what digital life actually looks like for most people, most of the time. The web made everything available. It also made everything your job. And the expertise gap between a savvy digital native and everyone else is enormous — and it’s not closing.

Here’s where I think Web 4.0 comes in: describe what you want, and an agent does the comparison, synthesis, and decision-prep for you. You review the options and confirm. I think of this as the labour inversion — the self-service era offloaded work from paid intermediaries onto you, the unpaid user. Web 4.0 inverts it back. The interaction model swings to conversational and personalised again, like having a travel agent or insurance broker, but at software cost instead of human cost.

Voice, Vision, and Who Gets to Participate

Here’s what I think is the most underappreciated part of this shift. It’s not just text. It’s multimodal interaction — and it changes who technology is for.

“Plan me a trip to Portugal, under two thousand euros, first week of April” is natural to say out loud. It’s awkward to type into a search box. It’s impossible to express through Skyscanner’s 14 filter dropdowns. Voice is the most natural interface humans have — we’ve been using it for a few hundred thousand years. Every other interface is a workaround for the fact that computers couldn’t understand us.

But voice alone isn’t the full picture. Think about this: you’re abroad, staring at a restaurant menu you can’t read. Today, you open Google Translate, point your camera, squint at the overlay, then open a separate app to check allergens, then maybe ask a waiter anyway. With a multimodal agent, you take a photo and say “what’s safe for me to eat here?” One interaction. It knows your dietary restrictions because you’ve told it before. That’s not science fiction — the individual capabilities exist today. What’s missing is the integration.

Or you get a letter from your insurance company full of jargon. Today, you either puzzle through it or call someone during business hours. With an agent, you snap a photo and ask “what does this actually mean for me?” The agent reads the letter, cross-references your policy, and explains it in plain language. I find myself imagining these scenarios constantly, and each time I think: this is clearly better. Maybe I’m wrong. But it’s hard to see how “navigate five apps and become a temporary expert” beats “have a conversation.”

What strikes me is how naturally the modalities shift with context. You’re cooking with your hands full — voice in, voice out. You’re commuting — you ask about weekend plans, and a curated list gets pushed to your phone for later. You’re at a desk comparing mortgage offers — the agent talks you through the trade-offs while showing a comparison table. The interface adapts to the situation instead of forcing you to adapt to it.

And this is where it gets bigger than convenience. If this plays out, it doesn’t just help existing users. It expands who gets to participate in the digital economy entirely. Your grandmother who can’t navigate Booking.com. A visually impaired user who struggles with complex web interfaces. Someone whose first language isn’t the one the app was designed in. A first-generation immigrant trying to navigate a new country’s bureaucracy. People who are digitally literate enough to have a conversation but not to operate a comparison tool.

That’s not a UX improvement. That’s a fundamentally different answer to the question of who technology is for. The web democratised access to information. I think Web 4.0 could democratise the expertise needed to act on it. I’m deliberately leaving pricing and access tiers out of this discussion — the capability argument stands on its own, and the economics deserve their own treatment in a later post.

Why This Time the Experience Is Different

I covered the 2016 bot parallel in the previous post — same vision, different timing. But it’s worth dwelling on what the consumer experience actually felt like.

Ordering flowers through a Messenger bot in 2016 went something like this: you typed “I want to order flowers.” The bot replied with a menu. You picked an option. It asked for a delivery date. You typed a date. It didn’t understand the format. You tried again. It offered three bouquets with tiny images. You picked one. It asked for an address. The whole interaction took longer and felt worse than just using the website.

The threshold isn’t whether delegation is possible. It’s whether it’s meaningfully better than doing it yourself — for enough use cases, often enough, that the habit forms.

I think that threshold is starting to be crossed — not across the board, but for a specific category of tasks. The friction-heavy transactional ones: comparing, researching, coordinating across providers. For those, an agent that actually understands your intent and presents a curated shortlist is genuinely faster and less effortful than doing it yourself. It’s not perfect. It’s not there for everything. But when natural language understanding actually works, the experience flips from frustrating to convenient — and that flip, I think, is what changes habits.

The Trade-offs Are Real

I don’t want to oversell this.

The conversational format genuinely kills a category of dark patterns — the ones that are structural to the UI. Hidden cancellation flows, pre-checked boxes, buried opt-outs, confusing checkout sequences designed to prevent you from doing what you want. When an agent acts on your behalf, there’s no button to hide. “Cancel my subscription” just happens.

But not all manipulation is structural. Urgency messaging (“only 2 left!”), scarcity pressure, and information shaping don’t depend on UI layout — they depend on business incentives, and those survive the format change. An agent that says “I found a great rate but it expires in two hours” is the same trick in a more trusted voice. And new vectors take their place on top of that. When your agent recommends three flight options, are those the best three — or the three whose providers paid for placement? The agent feels like your assistant. That intimacy is what makes it useful, and it’s also what makes the new manipulation vectors harder to detect. A promoted result on Google is visually marked and feels like what it is — advertising. A promoted recommendation from your personal agent feels like advice.

The question isn’t whether Web 4.0 is perfect. It’s whether the net effect is better than the status quo — and I think it is. Today’s digital experience is already adversarial, already full of dark patterns. But the new vectors are subtler, and “your agent might not be fully on your side” is a tension that doesn’t go away. I’ll come back to this in a later post.

Where It Doesn’t Reach

Now, I could write a breathless piece about how this changes everything. But I don’t think it does, and I’d rather be honest about where I think Web 4.0 changes very little.

Content consumption stays native. You don’t delegate scrolling Instagram, watching YouTube, or browsing Spotify. The experience is the product. An agent can play something for you (“play something for cooking”), but that’s voice control, not disintermediation — Alexa already does this and Spotify isn’t commoditised.

Simple habitual tasks stay where they are. If you order the same coffee every morning through the same app, delegation adds nothing. The current interface is already one tap.

Privacy-sensitive users opt out. Some people will never be comfortable with an agent that sees their financial decisions, medical research, and personal deliberations. That’s rational, not technophobic. The concentration of personal data in a single platform is unprecedented, and not everyone will accept the trade-off.

Power users who want control. Some people want to compare 40 flights themselves. They enjoy the process. The interface is the value. Web 4.0 doesn’t serve them and doesn’t need to.

I think knowing where a thesis doesn’t apply is at least as important as knowing where it does.

The Demand That Doesn’t Know It Exists

Here’s the paradox that makes this hard to evaluate: the demand is latent.

Developers are excited — OpenClaw’s growth shows that. But consumers aren’t asking for Web 4.0. They’re not campaigning against app fatigue. There’s no protest movement against having to compare insurance policies yourself. People have normalised the friction because it’s all they’ve known since the self-service era began.

Now, “latent demand” is a convenient argument — you can use it to justify almost anything. But the pattern is real. Nobody asked for a smartphone before the iPhone. Nobody asked for a web browser before Netscape. The demand became obvious after the product existed — not before. That doesn’t mean every “latent demand” claim is valid. It means you can’t use the absence of demand as proof that the demand doesn’t exist.

The test isn’t whether people are asking for this today. It’s whether, once they experience it, they go back. If your grandmother uses an agent to book a flight and it works — does she ever open Skyscanner again?

I genuinely don’t know the answer. I’m speculating here — looking at the future through a crystal ball like everyone else. But the direction feels clear to me, even if the timing and shape are uncertain.

The web democratised access. Web 4.0 democratises expertise — if we build it right.

Previously: the architecture and the platform thesis. More soon.

Updated: