Bringing visual and interactive chats for any LLM
Lugman Hussain Khan
A few months ago Anthropic added a small but interesting capability to Claude. When Claude felt a diagram or interactive widget would help, it would generate one inline, right there in the conversation. Instead of only returning paragraphs of text, it could embed diagrams, charts, and small animated widgets inline while explaining concepts.
The capability felt obvious in hindsight, but it raised a question. Why was this locked to one provider? The actual mechanism, generating HTML or SVG and rendering it inline, does not require anything proprietary. Most modern frontier models can produce decent visuals if you ask them to.
Meet Weave
Weave is a browser-based chat playground. You bring an OpenAI-compatible endpoint, drop in a base URL, an API key, and a model ID, and start chatting. The interface looks and feels like a normal chat app, but under the hood it is wired to render rich inline visuals whenever the model produces them.
Think of it as a generic version of the visual feature in Claude. Same idea, any model.
Everything runs in the browser. The API key sits in local storage and requests go directly from the browser to the provider you chose.
How the visuals get there
The core trick is small and a little old-fashioned, but it works.
When a chat starts, Weave sends a system prompt that does two things. It tells the model how to format normal answers, and it tells the model that whenever a visual would make the explanation clearer, it can embed HTML directly inside its response by wrapping it in special delimiters.
The delimiters look like this.
|||HTML_START|||
<your html, css, js here>
|||HTML_END|||
The model is encouraged to interleave. Explain a concept in markdown, drop a visual in the middle, continue the explanation, maybe drop another one. The output reads like a well-written article, with illustrations placed exactly where they help.
On the client side, Weave streams the model's response token by token and watches the buffer for those delimiters. When it sees an opening delimiter, it knows the next chunk is HTML and starts piping characters into a sandboxed iframe block in the chat. When it sees a closing delimiter, it finalizes that block and goes back to streaming markdown text.
The result is that visuals render progressively as they arrive. You watch the diagram appear alongside the explanation, not after a long wait at the end.
The case against tool calls
The obvious alternative is to define a render_visual tool and let the model call it. That is cleaner in theory. The arguments would be HTML, the tool would render, done.
In practice, this falls apart when the goal is to support any provider. Tool argument streaming is inconsistent across the ecosystem. Some providers stream argument deltas as they generate. Others buffer the entire tool call and emit it as one chunk at the end. When that happens, the user stares at a spinner until a five thousand character HTML payload finishes generating, then sees it pop in all at once.
That ruins the feel. Half the magic of inline visuals is watching them stream in, the same way you watch text stream in. Delimiters in the main content stream do not have this problem, because content streaming is well supported almost everywhere.
Lessons from real use
We have used Weave with a wide range of models at this point, and a few patterns are clear.
Recent flagship models are surprisingly good. Kimi K-2.6, Gemini 3.1 Pro, and GPT 5.4 all produce solid inline visuals when prompted well. They understand when a diagram helps, they know how to lay one out, and they get the streaming order right so the visual takes shape sensibly as it arrives.
Claude Opus 4.6 is the one that stood out. For SVG specifically, the gap is wider than we expected. The model produces visuals with thoughtful composition, careful spacing, and small details that other models tend to skip, like consistent stroke weights or labeled axes that do not collide with the content. This is not a benchmark, just an observation from running the same prompts side by side. If you care about visual quality more than anything else, Opus 4.6 is currently the strongest option we tested.
Older or smaller models tend to struggle. Some default to text even when a visual is the obvious answer. Others produce visuals that are technically valid HTML but visually crude. Visual generation is one of those capabilities that scales hard with model quality.
Costs and caveats
Weave is fun, but it is not free of tradeoffs.
The biggest one is output tokens. A long interactive visual can easily run into thousands of tokens of HTML, CSS, and JavaScript. When you pay per output token, this adds up fast. Whether the visual is worth the cost depends on the question. For a quick factual lookup, probably not. For a concept explanation you actually want to understand, often yes.
The other tradeoff is variance. Not every response produces a great visual. Some models miss the mark, some queries do not call for one in the first place, and some visuals come out looking rough. We see this less as a flaw and more as the point. Weave is a playground for comparing how models handle this, and the variance is part of what is interesting.
Try it yourself
Weave is open source. The whole thing is a small Vite app, a system prompt, and a streaming parser. If you want to play with it, the hosted version lives at weave.madhi.ai, and the code is on GitHub.
Bring your favorite model. See what it can draw.