TTY-changelog #044

Four new world models dropped, SANA-WM, Odyssey's pair and HY World 2.0, plus Runway's bet, a 275x-faster DNA model drew biotech backlash, and Gemini 3.5 Flash and Antigravity CLI shipped.

May 23, 2026

👉 Article originally posted on TTY

Autonomous Agents

🧪 Vercel’s Zero, a language for agents – The experimental language put AI agents first, aiming for a small, regular surface a model can learn while working, a standard-library-first design, and tooling that exposes diagnostics and repair plans as structured data. It is pre-1 and not meant for production.

🔌 Codex plugin for AI methods – A new extension by Louis Choquel from Pipeless that lets the Codex desktop app author multi-step AI workflows in a dedicated language, then validate and repair its own output through hooks and CLIs, with build artifacts rendered live as it worked.

Biotech, Health, and Chemistry

🧬 Carbon, an open DNA model – It claimed to run 275x faster than the prior best, processing a full human genome on a single GPU in under two days, using a tokenizer that splits sequences into six-base chunks while keeping single-base resolution. An interactive demo covered genes, mutations and protein folding, but the release drew sharp pushback over its reconstructed tree of life and how little biology it seemed to grasp.

Community take: Félix Raimundo dismissed the model as AI for bio with zero domain knowledge, noting it treats flies as plants where PCA does not and fails to separate introns from exons that naive baselines clear at 95 percent. Jeremie Kalfon pushed back that the authors were LLM people new to biology and that getting this far without much guidance was already impressive, while Félix held that being good at the task is the bare minimum, not grounds for applause.

🧫 Open tool for variant effects – A genetics lab open-sourced an MCP server and a Claude skill on top of its model for predicting which genetic variants cause disease, letting anyone query predictions or analyze their own sequencing data. The team stressed it is a predictor, not a diagnostic tool.

Community take w/ Jeremie Kalfon: “From what I understand, it really only works for coding variants, which are the super easy ones.”

📄 Nature papers on AI scientists – Three papers on AI scientists for biology landed in Nature on the same day, one from Future House and two from DeepMind, pointing to a fast-growing push toward agents that can form and test biological hypotheses.

Image, Video & 3D

✂️ Runway Aleph 2.0 and Edit Studio – Runway upgraded its flagship video editing model and wrapped it in a new product, handling up to 30 seconds of 1080p video, localized edits that touch only the targeted region, image-level control where an edited frame sets the look, and changes applied across multiple shots at once.

Infrastructure

⏱ The 62.5-minute rule for caching – A deep dive on Claude’s prompt caching landed on a clean rule: if you expect to reuse a cached prefix within that window, keep it warm with a cheap read, otherwise let it expire and rewrite later. The threshold holds across models and prefix sizes even as the dollars change.

🧱 Local-first second brain for agents – An open-source tool from Stan Girard keeps notes as plain markdown behind a single index, exposed through a CLI and an MCP server so agents like Claude Code or Cursor could read and write to a local personal knowledge graph. Released under MIT.

Language Models

⚡ Gemini 3.5 Flash at Google I/O – Google shipped it to everyone, claiming gains on almost every benchmark over 3.1 Pro with a big jump in coding, quality near the best models, and roughly 4x faster token output. The keynote also covered a 24/7 personal agent, a new terminal and an Omni media model.

Community take w/ Kemal Toprak Uçar: “I rarely praise proprietary models, but Flash 3.5 has been genuinely fascinating for coding and brainstorming, with both the speed and the performance standing out.”

🟢 Cohere Command A+ open weights – Cohere released its most capable model yet under Apache 2.0, with 218B total and 25B active parameters tuned to run on as little as two H100s. It claimed over 2x higher output speed and 30 percent lower latency than prior Command A, plus native support for 48 languages.

🔀 Nemotron blends diffusion and autoregression – NVIDIA released a language model family that runs both autoregressive and diffusion-style parallel decoding by switching one model’s attention pattern, enabling self-speculation where diffusion drafts tokens and autoregression verifies them with a shared cache, claiming multi-x speedups at equal accuracy.

Community take w/ Amine Saboni: “The self-speculation setup, using autoregressive verification on the diffusion drafts, is the really interesting part.”

🧠 A 1B hierarchical reasoning model – Trained from scratch on structured public data, it drew attention as very competitive for its size. Two recurrent Transformer modules, one slow and one fast, loop over the same embeddings to give deep effective compute at a fixed parameter count. It ships as a pre-alignment checkpoint, not a chat model.

🐉 Qwen3.7 launches as agent frontier – Qwen published the release with a long write-up framing the family around agentic capability across reasoning, coding and tool use. On the Arena leaderboards a preview then ranked 13th overall in text and 16th in vision, lifting the lab to 6th in text and 5th in vision, with stronger showings in math, expert prompts and coding.

Programming

🖥 Cursor Composer 2.5 ships – An upgrade over Composer 2 that sustains long tasks better, follows complex instructions more reliably and feels nicer to work with. It is built on the same open Kimi K2.5 checkpoint with extra reinforcement learning, while a much larger from-scratch model trains on SpaceXAI compute.

🚀 Gemini CLI becomes Antigravity CLI – Google folded its terminal tool into a new Go-based CLI built for multi-agent work that orchestrates agents asynchronously in the background and shares the harness of the Antigravity desktop app. The old Gemini CLI and Code Assist IDE extensions stop serving consumer requests on June 18, 2026.

Community take w/ Quentin Dubois: “It is fast with a nice feel, but the knowledge cutoff trips it up on newer APIs, the harness feels weaker than Codex or Claude Code, and it does not import your existing settings.”

🔧 Swapping Claude Code for local stacks? – Members are experimenting setups for replacing Claude Code or Codex with lighter or local options, from pairing Qwen or Nemotron with QwenCode for faster prompt processing, to running GLM through a custom harness, to home-grown Python CLIs. Most agreed local is increasingly viable yet still leaned on premium tokens.

Community take w/ Gabriel Olympie: “Opencode gets good feedback but takes forever to configure with too many plugins, it feels like modding Skyrim.”

Robotic, World AI

🌍 SANA-WM, minute-scale world model – An open 2.6B model turned a single image and a camera path into a full minute of controllable 720p video on one GPU, using hybrid linear attention to hold a coherent scene without the memory blowup that crashes standard attention.

Two panels. Left: stage-wise 60s single-GPU latency ablation across VAE/DiT variants, with bars scaled for readability. Right: H100 latency and peak GPU memory vs video duration; recurrent variants grow compactly while all-softmax OOMs at 60s.

🕹 Agora-1, a multi-agent world model – Odyssey introduced a world model where several participants, human or AI, share and act inside the same real-time simulation, shown off as a playable multiplayer deathmatch, and pitched shared-world simulation as a path toward gaming, education and robotics.

🔊 Starchild-1, a world you hear – Odyssey also showed a model billed as the first real-time multimodal world whose generated simulations you can hear as well as see, with joint audio and video, positioned as an early step toward a general-purpose world simulator.

Community take w/ Pierre Chapuis: “It looks close to Veo for joint audio and video, but the key difference is that it keeps responding to your input while it generates.”

🌐 HY World 2.0 goes open source – Tencent released full inference code and all weights for building interactive 3D worlds, pitching it as generating engine-ready environments you can actually use rather than just video clips. Observers were surprised the models were so small.

🎥 Runway bets on world models – The NY-based AI video startup is betting video-based world models (not language) will drive next‑gen AI, expanding from filmmaker tools into physics‑aware models for robotics, science, and more while competing with giants like Google and OpenAI.

🚗 Kesai Labs, open self-driving stack – Kyutai and the ELLIS Institute launched a Franco-German non-profit building a fully open and efficient self-driving stack, with plans to extend into robotics, manufacturing and healthcare. Weights, data and code are all meant to be open.

🎞 Fine-tuning Cosmos for robot video – A guide walked through parameter-efficient fine-tuning of NVIDIA’s Cosmos Predict 2.5 world model with LoRA and DoRA adapters, so teams can generate synthetic robot manipulation videos without the cost of full fine-tuning, then feed those trajectories into downstream robot learning.

TTY Lunch

Each week, TTY Lunch brings together exceptional builders around the table. Today’s lineup included Alexandre Pereira (2501.ai), Alvaro Lamarche Toloza (Mago), Charles Bochet (Twenty), Gregory scafarto, ‎ Hugo‎ Venturini (SkipLabs), Justin Halsall (rrweb), Sacha Morard (Edgee), and Samy Ouyahia (Mistral AI).

Some topics of the day were:

Evaluating and trusting coding agents, and what drives engineers to switch between them
Diversity and recruiting women in tech, the severe pipeline difficulty (even for TTY, obviously), and why it is critical to prioritize it early on
The real operational impact of a “second brain” and automated agents on company workflows
The inevitable death of the “per-seat” SaaS pricing model in the era of autonomous agents

New members

🇫🇷 Khalil Ouardini – Senior Research Scientist @ ex-Owkin · Owkin builds AI for drug discovery and pharma R&D, from target discovery to automating research. Spent four years in AI for biology, first researching at UC Berkeley then building products at Owkin, and recently left to explore startups automating complex pharma workflows with agentic systems, now looking for co-founders. Into boxing, jiu-jitsu, music production and reading. Special power: dissociative daydreaming. 📍 Paris, France

Contributors This Week

Félix Raimundo, Julien Seveno-Piltant, Quentin Dubois, Pierre Chapuis, Nancy Wang, Gabriel Olympie, Benjamin Trom, Robert Hommes, Louis Choquel, Raymond Rutjes, Maziyar Panahi, Enrico Piovano, Nikolay Tchakarov, Ivan Yamshchikov, Jeremie Kalfon, Kemal Toprak Uçar, Khalil Ouardini, Stan Girard, Arnaud Thiercelin, Hugo Serrat, Amine Saboni, Bryen Param, Charles Sonigo, Charly Poly, Ihab Bendidi, Lior Oren, Sacha Morard, Stéphane Collot, Tejas Chopra

TTY Weekly

Discussion about this post

Ready for more?