TTY-changelog #032

Qwen3.5 dropped, Claude Code auto-memory drew mixed reviews. H-Neurons linked to LLM hallucinations. Anthropic launched Claude Code Security, and Taalas demoed custom silicon inference.

Feb 28, 2026

👉 Article originally posted on TTY

Autonomous Agents

💡 Dust reframed agent evaluation as ongoing maintenance practice – Dust published their approach to agent observability, arguing agents behave differently over time even without code changes. Their system uses version markers and production metrics to detect when shifting external conditions alter agent behavior.

The framework treats evaluation as maintenance rather than one-time optimization, reflecting real-world agent drift from changing tool outputs and data sources
Version markers combined with production metrics create a feedback loop for deciding whether to iterate or revert after behavioral shifts
The approach fills a gap where traditional testing misses failures caused by changing external contexts rather than code regressions

📊 Unified framework for agent and tool adaptation in agentic AI – A meta paper examined how foundation model-based agents can be adapted to plan, reason, and interact with external tools. The survey positioned adaptation as the central mechanism for improving performance and reliability in agentic systems. (quick read with Alphaxiv)

⚠️ OpenClaw agent deleted user’s inbox despite safety instructions – Summer Yue from Meta reported on X their OpenClaw agent speedran deleting their inbox despite explicit “confirm before acting” instructions. The incident required physically running to a Mac mini to stop the process, highlighting ongoing agent autonomy risks.

🌐 New web standards emerged for making websites agent-readable – Google shipped WebMCP in Chrome early preview, enabling websites to expose capabilities to agents natively. Separately, Capxel launched LLM-LD, an open standard for making website content readable by AI systems and retrieval pipelines. WebMCP article

WebMCP turns any website into a potential MCP server for agent interaction without site-specific adapters
LLM-LD provides structured linked data specifically designed for LLM consumption as an open standard
Both standards signal convergence toward an agent-native web, though competing approaches may fragment early adoption

Image, Video & 3D

🎨 Google released Nano Banana 2 image generation model – Google launched Nano Banana 2, combining Pro-level image generation capabilities with Flash-speed inference. The model offers advanced world knowledge, production-ready specs, and subject consistency improvements.

The model bridges the quality-speed tradeoff by delivering Pro capabilities at Flash inference latency
Subject consistency improvements address a key limitation that plagued previous fast generation models
Production-ready specifications suggest direct deployment readiness without additional optimization work

A silly and joyful image showing 14 consistent characters and items having fun together in a farm setting

Cyber

🛡️ Anthropic launched Claude Code Security for vulnerability detection – Anthropic released Claude Code Security to make frontier cybersecurity capabilities available to defenders for detecting and remediating code vulnerabilities. Enrico Piovano from the community noted significant overlap with Depthfirst, an existing AI-native security platform.

Depthfirst was identified as prior art, including its detection of a Netty zero-day enabling SMTP command injection bypassing SPF/DKIM/DMARC
The launch represents Anthropic’s push toward raising the security baseline across the industry through AI-assisted code review

🤖 DJI Romo robovac exposed to remote camera access via MQTT – A security researcher demonstrated remote access to DJI’s Romo robotic vacuum through an MQTT protocol vulnerability, gaining full control of the device’s camera. DJI reportedly addressed the issue after disclosure.

A map like the one I saw, with robots and packets trickling in.

Language Models

🔄 Sakana AI introduced Doc-to-LoRA for instant LLM updates – Sakana AI released Doc-to-LoRA and Text-to-LoRA, methods for instantly updating LLM knowledge by converting documents directly into LoRA adapters without traditional fine-tuning data curation pipelines.

The approach eliminates curated training datasets by converting raw documents directly into adapter weights
Near-real-time knowledge updates address the staleness problem in deployed LLMs without full retraining
The technique builds on Sakana’s track record of creative approaches to efficient model adaptation and evolutionary optimization

🌍 TranslateGemma runs 100+ language translations entirely in-browser – A 4B parameter translation model launched on Hugging Face, supporting state-of-the-art quality across 100+ languages while running entirely client-side via WebGPU. No server infrastructure or data transmission required.

📦 Alibaba released Qwen3.5 model family – Alibaba published the Qwen3.5 collection on Hugging Face.

Model sizes were described as a perfect fit for RTX 6000 VRAM constraints by Gabriel Olympie, enabling accessible local inference
The release continues Alibaba’s rapid cadence of competitive open-weight model iterations

🧪 GLM-5 positioned shift from vibe coding to agentic engineering – The GLM-5 paper introduced a foundation model using DSA to reduce training and inference costs while maintaining long-context fidelity. The model is already freely available on providers like Modal and OpenCode Zen.

Free availability on multiple providers makes it a practical alternative for developers seeking cost-free coding assistants
The “agentic engineering” versus “vibe coding” framing reflects maturing expectations around AI-assisted development
Community members reported using GLM-5 as their primary model for non-critical tasks, reserving paid credits for harder problems
💡 Pierre Chapuis: “The paper is interesting, and the model is as well. Especially since it’s free on several providers like Modal and OpenCode Zen. I use it with Pi for my free personal setup and keep Amp credits for the hardest tasks.”

🧠 Tsinghua researchers identified hallucination-causing H-Neurons in LLMs – Researchers found that fewer than 0.1% of neurons in LLMs reliably predict and causally influence hallucinations. These H-Neurons encode “over-compliance” — a tendency to provide answers even when incorrect — and emerge during pre-training.

🚨 Anthropic disclosed industrial-scale distillation attacks on Claude models – Anthropic identified that DeepSeek, Moonshot AI, and MiniMax ran systematic distillation campaigns using over 24,000 fraudulent accounts and 16 million exchanges to extract Claude’s capabilities for training their own models.

MLOps

📡 Xybrid launched hybrid on-device AI routing platform – Xybrid, built by new community member Glenn Sonna, provides intelligent routing between on-device and cloud ML inference. The platform promises zero infrastructure overhead with privacy-first, cost-optimized model execution across any platform.

Programming

🚀 Inception Labs launched Mercury 2 fastest reasoning language model – Inception Labs introduced Mercury 2, positioning it as the world’s fastest reasoning language model built to make production AI feel instant. The model targets latency-sensitive applications requiring real-time reasoning capabilities.

💾 Claude Code auto-memory shipped – The feature allows Claude Code to remember project context, debugging patterns, and preferred approaches across sessions. Early feedback was mixed, with reports of the feature referencing outdated code and breaking builds.

The feature raised broader discussion about whether persistent context helps or harms rapidly-evolving codebases
💡 Gabriel Olympie: “It’s a nightmare. It doesn’t always update properly, and I’ve had cases where Claude started developing based on code that no longer existed, breaking everything.”

📊 Article argued data science judgment becomes critical as code costs drop – A blog post made the case that generating code is easier than ever, but more code does not mean better products. As generation costs approach zero, human judgment and taste become the differentiating skills.

✏️ Pencil brought visual canvas design directly into code editors – Pencil launched a tool enabling engineers to design on canvas and land directly in code, integrating visual design workflows into preferred IDEs to increase engineering velocity. The tool competes with Figma-to-code pipelines by removing the intermediate design tool entirely

☁️ Release of Cursor Cloud agents – Cursor now gives its AI agents their own cloud computers so they can run your app, test changes end‑to‑end, and send back short video demos instead of just code diffs. This lets you quickly see whether a new feature actually works in the UI before ever reading or reviewing the underlying code changes.

New Members

🇬🇧 Glenn Sonna – Co-founder @ Xybrid · All-in-one toolkit to run AI locally with intelligent cloud routing. SWE from Paris with 10+ years building SaaS and consumer apps, splitting time between London, Paris, and Tokyo. Special power: went from beginner to N2 Japanese in one year. 📍 London, UK

🇪🇸 Leonard Scheidemantel – CEO @ Sequa AI · Antler-backed context engine that helps coding agents understand your product, replacing markdown docs. Claims 2.2x improvement on Claude Code for complex tasks. Hobbies include drums and tennis. Special power: can find friends wherever he goes. 📍 Barcelona, Spain

🇫🇷 Aram Adamyan – Masters in AI & Digital Health @ ESPCI Paris-PSL · Working in neuroscience and bioinformatics at Institut Imagine. From Armenia, 5+ years data science across education, military, biotech, and fintech. Hobbies include basketball and cooking. Special power: operates without coffee. 📍 Paris, France

Top contributors this week

Top contributors this week: Pierre Chapuis, Quentin Dubois, Enrico Piovano, Gabriel Olympie, Robert Hommes, Aram Adamyan, Leonard Scheidemantel, Glenn Sonna, Christophe Lesur, Amine Saboni, Kemal Toprak Uçar, Alex Esser, Shuai Zhang, Vianney Lecroart, Emmanuel Benazera, Jules Belveze, Guillaume de Luca

TTY Weekly

Discussion about this post

Ready for more?