TTY-changelog #033

GPT-5.4 unifies reasoning & agentic workflows; Yuan open-sources a 1T-param MoE; an AI bot autonomously attacked open-source CI/CD pipelines; Doctolib launches an AI research lab in France.

Mar 07, 2026

👉 Try Shipfox! GitHub Runners → 2x faster → 50% cheaper. Whether you’re using GitHub-hosted or self-hosted runners, one-line switch. ⚡️ Same promise: faster builds, lower cost.

Community Asks

💡 Call for practitioners running remote MCPs in production – Sebastien from Bump.sh is looking for practitioners with hands-on experience deploying remote MCPs in production, especially those who have navigated auth flows, multi-tenant access control, and observability tooling at real scale. (👉 Sebastien Charrier)

🇺🇸 ACM Conference seeks peer reviewers, San Jose (May 26-29) – ACM CAIS 2026, co-located with the AI Engineer’s Fair, will cover compound AI architectures, optimization, and agentic deployment. Tejas is recruiting community peer reviewers. (👉 Tejas Chopra)

Autonomous Agents

🔌 Notte launched Anything API to turn any website into a production endpoint – Anything API lets users describe a browser task and receive a deployable production API endpoint backed by agentic automation. The tool targets the majority of websites that lack public APIs.

🦞 OpenClaw use cases catalogued as community surfaces real-world limits – Community members explored practical OpenClaw applications and how to go beyond its current limitations. Shuai shared awesome-openclaw-usecases, a curated GitHub repo highlighting what actually works across real workflows.

Biotech, Health, and Chemistry

🧬 Former Recursion CEO joins PacBio board, bets on sequencing data – Chris Gibson argued that in the AI era, training data quality is the decisive competitive moat regardless of model sophistication, and joined PacBio’s board to help the company capitalize on its long-read sequencing advantage.

PacBio’s long-read technology produces higher-fidelity genomic reads than short-read alternatives, making it particularly valuable for AI training.
On the same topic, a Leash Bio analysis drew a structural parallel between current molecular AI and pre-ImageNet computer vision, arguing dataset constraints, not model architecture, are the primary bottleneck holding back small molecule drug prediction.

🏥 Doctolib opens an AI research lab in France – The team combines data scientists, ML researchers, doctoral candidates, and physicians from academia and industry, signaling ambitions beyond scheduling software into clinical AI at one of Europe’s largest health tech platforms.

The multidisciplinary composition, combining clinical expertise alongside AI research, reflects a bet on domain-specific models over general-purpose adaptation.
Community member Fajwel Fogel announced his participation in the founding team.

💊 Madrigal acquires Ribo’s siRNA liver pipeline for $4.4B – The deal extends Madrigal’s MASH franchise beyond Rezdiffra into RNA therapeutics, with community noting that preclinical siRNA assets are commanding some of the most aggressive valuations in current biotech deal flow.

⚖️ Vinay Prasad exits FDA amid ongoing controversy – The head of the FDA division responsible for vaccines and biologic drugs, and a close ally of Commissioner Marty Makary, is leaving after a turbulent period marked by clashes with biotech companies over gene therapies, rare disease drugs, and how strong the evidence for approval should be.

Image, Video and 3D

✨ Gemini 3.1 Flash-Lite targets cost-efficient inference at scale – Google DeepMind’s newest addition to the Gemini 3 series prioritizes price-per-token over raw capability. Community questioned why benchmarks compared it to Gemini 2.5 rather than other Gemini 3 variants.

🔥 Pruna.ai ships P-Video: 10s for 5s of 720p at $0.02/s – Applying the same compression techniques that power their model efficiency work, Pruna.ai claims the fastest and cheapest video generation API available: 720p at $0.02/s, 1080p at $0.04/s.

Cyber

🦠 AI bot ran autonomous supply chain attacks on open-source repos – An autonomous bot, hackerbot‑claw, used Claude Opus 4.5 plus a vulnerability‑pattern index to run a week‑long campaign against GitHub Actions workflows on major open‑source repos, achieving RCE in at least 4 of 7 targets and stealing a write‑scoped token. In follow‑up analysis, Claude Code running claude‑sonnet‑4‑6 refused the same prompt‑injection attempts when reviewing an ambient‑code/platform PR where the attacker had replaced CLAUDE.md with malicious, persistent project‑level instructions.

🔐 MCP servers are unaudited software, not config files – Matt Suiche warned that unlike passive context files, MCP servers execute code and expand attack surface in ways developers consistently underestimate, drawing parallels to early mobile app distribution before security practices matured. The financial sector has already encountered real-world MCP-related security incidents, providing early evidence of the emerging threat category.

Infrastructure

💥 A GitHub Pages migration ended with a dropped prod DB – While migrating to AWS, a developer relied on Claude Code that ran a destructive Terraform command against the shared production account, wiping the course platform’s infrastructure and database; the hurried recovery added AWS Business Support and new safety architecture, locking in about a 10% permanent cost increase.

Language Models

🔥 GPT-5.4 unifies reasoning and agentic workflows in one model – OpenAI released GPT‑5.4 and GPT‑5.4 Pro, unifying its strongest reasoning, coding, and computer-use capabilities into a single model for ChatGPT, Codex, and the API. It adds native computer control, 1M-token experimental context in Codex, and more efficient tool search to run larger tool ecosystems with fewer tokens.

🚀 Alibaba releases Qwen 3.5 small series, 0.8B to 9B – Four new models built on the Qwen 3.5 foundation with native multimodal support and scaled RL training, spanning the full edge deployment spectrum. Community raised concerns about team stability at Alibaba, citing rumors that roughly half the Qwen team has left, which could make this one of their last major releases.

⚡ Yuan 3.0 Ultra: 1T-parameter open-source multimodal MoE – Yuan AI Lab open-sourced a flagship model with 1,010B total parameters but only 68.8B activated, making inference cost comparable to a dense 70B model while theoretically competing at frontier capability levels.

📊 GenAI enterprise adoption gap diagram goes viral – Mehdi Medjaoui shared a widely circulated chart showing actual enterprise GenAI deployment lagging far behind theoretical adoption curves, sparking discussion about what’s really blocking production rollout.

MLOps

⚙️ Handling spiky serverless GPU workloads – Raidium asked for real experience handling zero-to-peak inference demand where on-demand H100s proved too expensive and too slow to cold-start, with 10-minute boot times making autoscaling impractical. A community member noted that Modal provisioning is not instantaneous under load: queue waits are real, and keeping a warm instance is still recommended for latency-sensitive paths. He mainly uses Modal for ephemeral batch jobs rather than deployed inference, and open-sourced multimodal to manage multi-job output. Ed Bonlieu from Koyeb joined the thread directly to offer a follow-up call.

🏎️ Speculative Speculative Decoding claims 2x over top engines – SSD, co-developed with Tri Dao and Avner May, stacks speculative decoding at multiple levels of the inference stack simultaneously, compounding gains that single-level approaches cannot achieve. Co-authorship with Tri Dao, the FlashAttention author, lends the work significant credibility in the inference optimization community.

Programming

☠️ Code review is dying as AI-generated code becomes the norm – A Latent.space essay argued human-written code effectively ended in 2025, and that traditional peer review adds diminishing value when the author is a model that self-corrects faster than any human reviewer. Tejas Chopra is building Intent, a TypeScript-like language for coding agents where agents both write and review code, with optional human oversight.

📦 Google open-sources a unified Workspace CLI with AI skills – A single command-line tool that provides a unified interface to Google Workspace APIs (Drive, Gmail, Calendar, Sheets, Docs, Chat, Admin, and more), dynamically building its command surface at runtime from the Google Discovery Service so it stays in sync with API changes, and shipping with built-in Agent Skills for LLM-driven automation and MCP-based agents.

🔧 Essay makes the case for CLI over MCP in agent tooling – EJ Holmes argued that LLMs can invoke CLIs natively, making a dedicated protocol layer architecturally unnecessary and adding attack surface without commensurate benefit.

🧠 Headroom learns from Claude Code failures across sessions – Tejas Chopra analyzed 59,470 tool calls from 1,960 real sessions, found 6,112 failures and ~1M wasted tokens. The new learn command extracts the delta between each failure and its successful correction, writing results to CLAUDE.md and MEMORY.md. Top failure categories: wrong Python binary (627), permission denied retried without fix (525), file not found at guessed paths (329), all preventable with persisted context.

🧠 Two new MCPs bring shared and persistent memory to agents – Christophe Lesur from Cloud Temple released live-memory for real-time working memory across collaborative agents, and graph-memory for long-term knowledge graph storage, together addressing the stateless nature of current agentic frameworks.

🗜️ Edgee’s Claude Code Compressor strips tokens before they hit the API – An edge model intercepts Claude Code requests and removes redundant tokens upstream, reducing cost and context pressure without workflow changes. 👉 Free during beta.

🔍 Amplifying.ai maps how Claude Code makes tool decisions – A systematic study examined reasoning patterns, consistency, and potential biases in Claude Code’s agentic choices, from product recommendations to developer tool selection, using real developer interaction data.

Robotic, World AI

🤖 OpenClaw gains spatial reasoning and runs on Unitree G1 – The open-source agent framework added physical space and temporal understanding via lidar, stereo, and RGB camera integration, with a demo showing it deployed on a humanoid robot alongside drone and quadruped support.

The Lunch

This week’s discussions explored how the rise of coding agents could reshape developer tooling, workflows, and infrastructure. Key themes included the shift from UX to Agent Experience (AX), CLI vs MCP interfaces, languages suited for agent-driven development, and the growing infrastructure moat around frontier models.

No alternative text description for this image

With Christian Wallenwein (Mistral AI), Benjamin Trom, Ellie Huxtable (Atuin), Max Corbani (>commit), Anastasia Stasenko (Pleias), Koutheir Cherni (Guepard), Jules Pondard, Sohrab Hosseini (orq.ai), Fred Bardolle (Scaleway), and Shuai Zhang (Jinko)

New Member

🇫🇷 Denis Brulé – CEO @ Finegrain · On-device photo editing platform that removes cloud dependency from mobile image processing, built on the premise that a €1k+ device shouldn’t need cloud permission to edit a photo. Founded Moodstocks in 2008 to let smartphones search the world with their cameras; the company was acquired by Google and the technology launched as Google Lens in 2018. Special power: playing the only piano piece he knows on repeat without ever tiring of it, much to the despair of his relatives and neighbors. 📍 Paris, France

Contributors This Week

Felix Raimundo, Christophe Lesur, Kemal Toprak Uçar, Denis Brulé, Gabriel Olympie, Tejas Chopra, Gilles Chehade, Pierre Manceron, Pierre Chapuis, Sacha Morard, Bertrand Charpentier, Mehdi Medjaoui, Robert Hommes, Ihab Bendidi, Charly Poly, Jeremie Kalfon, Dario Di Carlo, Sebastien Charrier, Matt Suiche, Vlad de Turckheim, Fajwel Fogel, Andrea Pinto, Lucas DiCiocci, Gabriel Duciel, Victoire Cachoux, Quentin Dubois, Noah Hollman, Daniel Huynh, George Peter Hantzaras, Chris Wallenwein, Julien Duquesne, Shuai Zhang, Ed Bonlieu, Amine Saboni, Benoit Kohler, Constant Razel, Manfred Touron

👉 Originally posted on TTY

TTY Weekly

Discussion about this post

Ready for more?