TTY-changelog #037

Claude Code source leaks spark forks and deep teardowns, bio AI hits the noise floor on cell predictions, Axios is hit by a supply chain attack, and three open models drop in three days.

Apr 04, 2026

👉 Article originally posted on TTY

Community Updates

⭐️ Headroom crosses 1k stars with v0.5.12 — The open-source context compression layer for LLM applications shipped faster inference, supports for LangChain, LangGraph, Windows and OpenClaw. Used by Netflix, Meta, and Vercel.

📈 Edgee shows Claude Code session analytics — A new shareable session report format for the Edgee AI Gateway showed 63% tool compression across 213 requests and 13.9M tokens! Works with Claude Code, Codex, OpenCode, and Cursor support coming soon.

📱 LFM2.5-350M runs on Android via Xybrid — Liquid AI’s 350M-parameter model ran at 40+ tokens per second on Android through the Xybrid runtime, crossing the threshold where on-device AI feels native.

Events

🇳🇱 Reliable Agents Club, April 15th in Amsterdam — For people who run AI agents in production and want to make them more reliable and debuggable. The event focuses on discussing agent reliability challenges, networking with peers, and introducing Moyai’s platform for monitoring and diagnosing failures in agentic workflows.

Audio

🔊 On-device TTS enters private beta — Gradium announced Phonon, a text-to-speech model that runs locally on smartphone CPUs with natural voices, multilingual support, and voice cloning, requiring no server, no latency, and no per-call cost.

The private beta is now open for game developers and app builders. The model runs entirely on-device, eliminating cloud dependency and enabling offline voice interactions at scale for free-tier products.

A community discussion highlighted the gap between available high-quality TTS models and polished consumer apps: “Why wouldn’t Mistral, Gradium, or others just build a competing Speechify? It feels like the remaining work is only 10 percent once you already have the model and the API?”.
For Laurent Mazare, “audio models are advancing very quickly right now. Several companies, including us at Gradium, focus on training these models but lack the cycles or expertise to build applications, so we sell them to startups that handle the end products.”
Also, the “remaining 10%” framing undersells the problem for Glenn Sonna. “Getting a TTS model to run well on someone’s actual phone is a whole different beast: quantization, memory, battery, latency. Cloud API plus per-token cost kills UX for something like a reading app.”

Autonomous Agents

👔 Pocket CTO delegates to cloud agents — The new product from Anicet Nougaret is live. Pocket CTO accepts voice memos on Telegram or desktop instructions, then orchestrates multiple cloud coding agents working in parallel on their own machines. It maintains a central always-on agent aware of your tools and codebase, surfaces demos, PDF reports, hosted builds, and executive summaries, and supports cron jobs, webhooks, secrets, and git hooks.

🧰 A massive agent harness for coding tools — A performance optimization system originating from an Anthropic hackathon winner, shipping 30+ agents, 150+ skills, autonomous loop patterns, continuous learning, and AgentShield security scans across 10+ language ecosystems. Built over 10+ months of real-world use with 30+ contributors, it supports Claude Code, Codex, Cursor, and OpenCode via a single install pipeline.

Biotech, Health, and Chemistry

🧬 Bio foundation models hit the noise floor — A journal entry analyzed a GenBio AI benchmark testing whether AI can predict how cells respond to interventions, drawing on TTY community discussions to propose four investment criteria for techbio: solve a specific biological problem deeply, iterate fast on data that actually predicts what happens in patients, pick mechanisms where physics constrains the outcome, and generate rich data only where the underlying biology is well-understood enough to learn from.

The best AI models approached but could not exceed the natural variability of the experiments themselves, meaning the ceiling is set by biology, not by compute or model size.
Models that encoded existing biological knowledge (which genes interact with which) consistently outperformed those trained on raw data alone, suggesting that in biology, domain expertise still beats brute-force scaling.
Newer models not in the benchmark are closing the gap: State (Arc Institute) improved prediction accuracy by up to 200% over baselines by training on 167M cells, TxPert (Valence Labs/Recursion) matched experimental reproducibility using biological knowledge graphs, and Xaira’s X-Cell claims to go further still at 4.9B parameters.
Community take w/ Ihab Bendidi: “GenBio’s setup does not actually perform perturbation prediction in the standard sense. It tests whether gene embeddings from unperturbed cells correlate with perturbation effects, which is an interesting but distinct question.”
Community take w/ Félix Raimundo: “Even with perfect predictions, all you achieve is predicting the result of experiments that already fail to translate to humans 91% of the time. Being able to predict a broken proxy faster does not fix the proxy.”
A complementary essay argued that AI will not accelerate drug development on its own, because the real bottlenecks are patient recruitment, years-long endpoint observation, and regulatory friction, not a lack of intelligence.

🔬 Five new cancer research teams launched — Global Cancer Grand Challenges funds five international teams (ATLAS, CAUSE, ILLUMINE, InteroCANCEption, REWIRE-CAN) up to $25M each to tackle cancer avoidance, mutational signatures, dark proteome, nervous system–cancer links, and therapeutic rewiring.

Image, Video & 3D

🎨 Unified image model generates and edits — Alibaba released Wan 2.7 Image, a single model handling generation, editing, and understanding in one architecture. It offers granular face control (bone structure, eyes, contour across ethnicities and ages), color palette extraction with up to 8 HEX codes, 3K-token text rendering in 12 languages including dense tables and formulas, interactive region-based editing via natural language, and multi-image generation.

Cyber

☠️ Axios npm package compromised — The popular JavaScript HTTP client with 100M+ weekly downloads was briefly hijacked through a compromised maintainer account, delivering a cross-platform remote access trojan via a hidden dependency during a two-hour window on March 31.

🎩 Black-hat LLMs automate real attacks — Nicholas Carlini, Research Scientist, Anthropic, at the [un]prompted 2026 argued that LLMs have crossed a threshold for autonomous vulnerability discovery, with capabilities growing on an exponential curve. Placing a model in a VM with broad permissions and a short prompt was enough to reliably discover severe bugs in heavily audited software.

Demonstrated examples included a blind SQL injection in Ghost CMS that the model fully weaponized into an exploit exfiltrating admin credentials, and multiple remotely exploitable heap buffer overflows in the Linux NFSv4 daemon including a bug dating back to 2003.
Benchmark evidence showed that models from only a few months ago rarely succeeded on such tasks, while current generations repeatedly found kernel-level bugs, with task complexity and exploit value growing roughly exponentially.
The talk warned that automated bug discovery will soon far outstrip human triage and patching capacity, and that weak safety filters only hobble honest defenders while bad actors jailbreak around them.

Language Models

📈 Time series models meet trading — Google’s TimesFM 2.5, a 200M-parameter pretrained forecasting model with 16K context length and continuous quantile predictions, sparked a community discussion with insights from Enrico Piovano, Jeremie Bordier, and Hubert Ancelot about applying foundation time-series models to financial markets.

Early evaluations showed decent zero-shot performance on demand forecasting but fell short of fine-tuned SARIMAX models, with inference time a dealbreaker for high-frequency use. Chronos 2 from AWS, a universal forecaster handling univariate, multivariate, and covariate-informed tasks via in-context learning, was also tested for stock backtesting with limited results.
Community members with quantitative finance experience noted that systematic trading firms still rely on legacy logistic regressions with smart feature selection, and the shift from high-frequency to mid-frequency strategies makes portfolio hedging more important than raw latency.
Robert Hommes drew an analogy with Nvidia PhysicsNeMo (transformer and neural operator architectures for physics simulation), suggesting similar approaches could model portfolio dynamics and forward-simulate hedging effects to maintain stable return-to-risk ratios.
Value in time series models for trading likely lies in assessing stationarity and cointegrated pairs in mid-frequency strategies rather than directional stock predictions.

💎 Google releases Gemma 4 under Apache 2 — Google launched its most capable open model family in four sizes (E2B, E4B, 26B MoE, 31B Dense), purpose-built for advanced reasoning and agentic workflows.

The 31B model ranked third among open models on the Arena AI leaderboard, outcompeting models 20x its size.
Day-one support spans llama.cpp, MLX, vLLM, Ollama, LM Studio, Unsloth, transformers.js for browser inference, and fine-tuning via TRL including multimodal tool-use workflows.

👁️ Vision models answer without seeing — Multimodal models can often “fake” seeing by leveraging language patterns and benchmark quirks, so their impressive scores and detailed visual explanations can be an illusion. To trust them (especially in fields like medicine) we need evaluations and system designs that explicitly check whether their answers truly depend on the images, rather than just on text and clever guessing.

🐉 Qwen 3.6 Plus targets agentic workflows — Alibaba released its new flagship proprietary model combining hybrid linear attention with sparse MoE routing, a 1M-token context window, and up to 65K output tokens. Optimized for coding agents with support for code repair, complex terminal operations, and automated multi-step tasks, it also added enhanced multimodal reasoning for document understanding and visual coding workflows.

⚗️ Tiny 350M model punches above weight — Liquid AI released LFM2.5-350M with 28T tokens of pre-training and large-scale reinforcement learning, outperforming models over twice its size on tool use, data extraction, and instruction following. The model runs on devices from Raspberry Pi to smartphones.

🌿 Eight billion params fit in 1.28 GB — Bonsai-8B compressed a Qwen3-8B model to 1-bit quantization, shrinking it to 1.28 GB while scoring 70.5 average across multiple benchmarks. On an M4 Pro it achieved 131 tokens per second for generation, and 44 tok/s on iPhone 17 Pro Max with 2-6x better energy efficiency than 4-bit baselines.

Programming

🔓 Claude Code source code leaked — The Claude Code source was exposed through a source map file accidentally left in Anthropic’s npm registry. Within days, independent researchers published detailed analyses of the codebase, an open-source fork added support for 200+ models, and an interactive architecture explorer mapped all 519K+ lines of code.

A reverse-engineering analysis shows Anthropic has better internal safeguards (post-edit verification, multi-agent orchestration, chunked reads, stricter prompts) that reduce hallucinations and context loss, but these are gated to employees. Author shares a “CLAUDE.md” fix to replicate them (available at the end of the tweet).
OpenClaude forked the CLI to support OpenAI, Gemini, DeepSeek, Ollama, and 200+ models via OpenAI-compatible APIs while preserving Claude Code workflows (bash, file tools, MCP, slash commands). It added agent routing that maps different agent types to specific models within one session for cost and performance optimization.
CC Unpacked mapped 1,900 files into an interactive explorer covering 50+ built-in tools, every slash command, and hidden unreleased features including Kairos (persistent mode with background actions), UltraPlan (long Opus planning runs), Coordinator Mode (multi-agent task decomposition in git worktrees), Bridge (remote control from phone/browser), and Daemon Mode (background sessions via tmux).

🐛 Claude Code cache bugs exposed — A detailed analysis revealed that the original v2.1.89 standalone binary had severe cache bugs dropping cache reads to 4-17%, causing nearly every token to be billed at full price.

👁️ GLM-5V-Turbo reads design drafts — Z.ai released GLM-5V-Turbo, a model using a next-generation CogViT architecture with deeply fused text-vision training. It natively ingests images, videos, design drafts, and document layouts to generate complete runnable code, targeting UI-to-code workflows and GUI agents.

🖱️ Cursor 3 goes fully agentic — Cursor announced version 3 with a whole new interface window - outside of VS Code - that lets developers run many agents concurrently across local projects, worktrees, SSH remotes, and cloud machines.

💻 Mac app wraps Headroom locally — A community-built macOS menu bar app that wraps the Headroom CLI, providing persistent savings tracking and one-click setup without pip install. All processing runs locally with no data leaving the machine.

🐳 Docker sandbox guide for agents — A hands-on quickstart for running Claude autonomously inside Docker Sandboxes using the sbx CLI, built around a full-stack demo app (Next.js, FastAPI, Postgres). Covers direct mode (edits your working tree) and branch mode (isolated git worktrees for parallel agents), with secure secrets management via OS keychain, configurable network policies, and multi-workspace mounting for cross-repo scenarios.

TTY Lunch

Each week, TTY Lunch brings together exceptional builders around the table. Today’s lineup included Charles Borderie (Lettria), Julien Goupy (Applikai), Antony Marion (Noways), Frederic Legrand (H Company), Louis Choquel (Pipelex), Julien Millet, Thomas Payet (Meilisearch), Alexandre Pereira (2501.ai), Francois Massot and Paul Masurel (Quickwit & Datadog).

Token maximalism vs. refined intent
Vibe fatigue and specification collapse
Deterministic spatial design
Moats: from code to responsibility

👉 The full discussion is here

New members

🇩🇪 Nancy Wang — CTO @ FluenTea · Language exchange with real humans and a touch of AI help. PhD combining computer vision and neuroscience, previously at IBM Research and Amazon. Born in China, raised in Canada, studied in the US, now in Berlin. Enjoys video games (currently Slay the Spire 2, all-time favorite XCOM). Special power: has never suffered a hangover. Berlin, Germany

🇫🇷 Hani Chalouati — CTO & Cofounder @ Guepard Inc. · Git-like serverless databases for AI agents. Software engineer born and studied in Tunisia, moved to France in 2014. Enjoys drone FPVs (especially DIY) and aviation. One exit from a previous company in 2024. Special power: makes very good homemade Neapolitan pizza. Île-de-France, France

Contributors This Week

Tejas Chopra (Netflix), Glenn Sonna (Xybrid), Robert Hommes (Moyai.ai), Ihab Bendidi (Recursion), Félix Raimundo (Tychobio), Koutheir Cherni KC (Guepard), Jocelyn Fournier (Softizy), Nancy Wang (FluenTea), Anicet Nougaret (Ariana.dev), Enrico Piovano (Goji), Sacha Morard (Edgee), Charly Poly (Browserbase), Quentin Dubois (OSS Ventures), Willy Braun (Galion.exe), Amine Saboni (Pruna.ai), Gabriel Olympie (2501.ai), Jérémie Bordier (XHR), Laurent Mazare (Kyutai), Hubert Ancelot (Verne), Jeremie Kalfon (Pasteur), Maziyar Panahi (OpenMed), Pierre Chapuis (Finegrain), Victoire Cachoux (Iktos), Hani Chalouati (Guepard), Karim Matrah (Contrast), Louis Choquel (Pipelex), Louis Manhes (Genario), Shuai Zhang (Jinko)

TTY Weekly

Discussion about this post

Ready for more?