TTY-changelog #038

Claude Mythos Preview is too dangerous to release publicly, GLM-5.1 makes a leap for open models, Meta's Muse Spark benchmarks ignite a vibes debate, and Stanford dissects the compute race.

Apr 11, 2026

Community Asks

📰 Newsletter management tools – Glenn Sonna asked the community what tools they use to manage the flood of AI newsletters and filter for signal. Suggestions ranged from OpenClaw-based cron jobs to Tasklet, a YC-backed managed agent platform for automating business processes with plain English.

🔬 Amazon Research Awards feedback – Hani Chalouati is seeking feedback on the Amazon Research Awards Spring 2026 call for proposals.

Community Updates

🚀 Ascii.dev adds voice and in-game overlay – The pocket CTO agent now supports fully hands-free operation through walkie-talkie-style voice commands and an in-game overlay, letting developers orchestrate coding agents while doing other things.

Autonomous Agents

☠️ Claude subscriptions drop third-party coverage – Anthropic announced that Claude subscriptions will no longer cover usage on third-party tools like OpenClaw. Users can still authenticate via extra usage bundles (now discounted) or API keys.

🦞 Aquarium CE, self-hosted agent management – Shuai Zhang (Jinko) released an open-source platform for deploying and managing AI agent instances locally with a single command. It supports 27+ AI providers, 14 messaging channels (WhatsApp, Telegram, Discord, Slack), MCP tools, and encrypted credential storage, all running on local SQLite with zero telemetry.

🎛️ MCP vs Skills, a practitioner’s take – A detailed post argued that the MCP remains the superior architectural pattern for giving LLMs access to services. Remote MCP servers offer zero-install usage, seamless updates, saner OAuth-based auth, and true portability across devices, while Skills work best as pure knowledge layers.

🔍 Governed agent autonomy patterns – After analyzing the Claude Code source leak, Nnenna Ndukwe (Qodo) extracted a docs-first reference for building agentic coding systems with governance baked in. The framework covers five gates: plan, permission, tool trust, verification, and runtime accountability.

Biotech, Health, and Chemistry

🧬 AI-native biotech meets radiopharma – TTY published an interview with Ashley van Heteren of Radiogenesis, a startup using generative models and physics-informed AI to design targeted radioactive drugs for metastatic cancer. The company uses foundation models to design peptide candidates in silico before lab validation, compressing early discovery from years to months.

Image, Video & 3D

🩻 Raidium confirms the VLM mirage effect – Following last week’s mention of the Stanford “mirage” paper, Pierre Manceron (Raidium) shared that his team had independently observed the same phenomenon and published their own study. Their paper explored how prior medical knowledge and statistical reasoning allow models to bypass visual input entirely: “In practice many medical diagnoses can be inferred just from the question and good medical knowledge, statistically speaking. For example you know that a patient is a smoker, and ask ‘is there a lesion on this CT scan’, even if you don’t see the image itself, the probable answer is still lung cancer.”

Cyber

🏴‍☠️ Claude Mythos Preview stays behind closed doors – Anthropic released a 244-page system card for Claude Mythos Preview, its most capable model to date, alongside Project Glasswing, a program restricting access to 12 major tech companies for defensive cybersecurity only. The model achieved 93.9% on SWE-bench Verified and 100% on Cybench, and autonomously discovered thousands of zero-day vulnerabilities in every major OS and browser.

The system card documented rare but concerning behaviors: sandbox escapes, posting exploits to public websites, editing git history to hide unauthorized changes, and attempting prompt injection on evaluation graders.
Anthropic described the model as simultaneously its best-aligned and highest-risk release, noting that its skill enables it to reach more dangerous situations even when well-intentioned.

🔒 Astral shares CI/CD security practices – The team behind Ruff and uv published a detailed breakdown of how they secure their open-source tools against supply chain attacks. The post covers banning dangerous GitHub Actions triggers across the organization, requiring commit-pinned actions with impostor-commit detection, using Trusted Publishing for registry releases, and employing Sigstore attestations for binaries.

Language Models

🔥 GLM-5.1 tops SWE-Bench Pro – Zhipu AI released GLM-5.1, a 754B-parameter open-weight model under MIT license that claimed the top spot on SWE-Bench Pro (58.4) and led on MCP-Atlas (71.8). The model sustained autonomous execution for up to 8 hours across hundreds of iterations and thousands of tool calls, unlike predecessors that plateau early.

🧠 Meta’s benchmark layout draws scrutiny – Community members debated Meta’s new model, Muse Spark, and its potentially misleading benchmark presentation. The broader discussion questioned whether benchmarks still matter when real-world impressions consistently diverge from reported scores.

🔩 Karpathy’s LLM knowledge base in practice – Jeremie Kalfon shared extensive experience using Karpathy’s LLM KB concept to replace OpenClaw’s simple markdown memory with a rich Notion/Obsidian database: “The powerful idea is to replace OpenClaw’s simple memory markdown as a large Notion/Obsidian database with many enriched markdown files and an index. I can always search and interact with them on my Notion and this is nicer than through my terminal/IDE.” The approach worked well for personal knowledge bases built from messaging apps and social media exports, but model laziness remained the biggest unsolved friction: “They tend to be lazy colleagues/interns, they do not push the task to their limits, nor do they think about the ramifications of what has to happen next when the task is finished.” Running mostly on Sonnet 4.6, token costs reached ~€100/day.

Programming

🥷 Claude-sock wraps Claude Code for subscription use – A CLI tool that wraps the Claude Code TUI to let tools like OpenClaw use Claude models with a subscription only, without API keys or extra tokens. It spawns a fresh TUI process per call (with ~5s startup overhead) and translates output to the claude -p protocol format.

🐳 Docker Sandboxes quickstart guide – A hands-on guide covering the Docker Sandboxes (sbx) experimental launch, including parallel agent setup, branch mode, networking, Docker Compose inside sandboxes, port forwarding, and network policy management. The guide uses DevBoard, a full-stack Next.js + FastAPI issue tracker with intentional bugs as exercises.

Community take w/ Amine Saboni: “I’m fostering the intuition that Docker is a really appropriate abstraction for developers to work with AI. Their not-so-recent focus on agent isolation is really interesting to track.”

🐆 Ardent AI vs Guepard on database cloning – When YC-backed Ardent AI launched with a pitch for instant Postgres clones for coding agents, the community compared it with Guepard, an open-source alternative already available. Key differences: Guepard is self-hostable, supports all databases (not just Postgres), and branches entire data environments rather than using production copies.

👨‍💻 Pi’s creator joins Earendil – Mario Zechner, creator of pi (the agent powering OpenClaw), announced he sold the project to Earendil, co-founded by Armin Ronacher and Colin Daymond Hanna. The pi core stays MIT-licensed forever, with commercial additions planned under Fair Source (converting to open-source via delayed publication). Zechner cited alignment on OSS values and wanting to avoid the solo-founder VC path after his experience with RoboVM’s closed-source fate at Microsoft.

TTY Lunch

Each week, TTY Lunch brings together exceptional builders around the table. Today’s lineup included Bertrand Guiheneuf (Fairjungle), Denis Brule (Finegrain), Francis Bouvier (Lightpanda), Karim Matrah (Contrast), Olivier Desclaux (Harmattan), Pierina Camarena (42 AI), Quentin Soulet (Dune Health), and Sacha Morard (Edgee).

Fast code alone doesn’t build product value
Non-engineers shipping code without system awareness
Can AI learn taste
Facing the new GAFAM
Classical models still power critical systems

👉 The full discussion is here

New members

🇪🇬 Aly Moursy – Founder & CEO @ The Artificial Intelligence Company of Cairo · Applied AI lab building AI products for the MENA region. Born in Cairo, raised in Dubai/Canada, studied engineering in Canada. 2x startup founder. Special power: making funny tech memes that go viral. 📍Cairo, Egypt

🇳🇱 Joel Milligan – Founding AI Engineer @ neno · Based in Amsterdam for 4 years, 10 years of software engineering experience, ex-founder (TechStars Berlin). Currently building Viche, an agent-to-agent communication layer and discovery protocol. Special power: navigation, has never ever been lost. map📍 Amsterdam, Netherlands

Contributors This Week

Aly Moursy (AIC of Cairo), Glenn Sonna (Xybrid), Koutheir Cherni KC (Guepard), Enrico Piovano (Goji), Hani Chalouati (Guepard Inc.), Nnenna Ndukwe (Qodo AI), Pierre Manceron (Raidium), Robert Hommes (Moyai.ai), Anicet Nougaret (Ariana.dev), Pierre Chapuis (Finegrain), Amine Saboni (Pruna.ai), Félix Raimundo (Tychobio), Hugo Venturini (SkipLabs), Jeremie Kalfon (Pasteur), Quentin Dubois (OSS Ventures), Shuai Zhang (Jinko), Youssef Tharwat (Noodlbox), Daniel Huynh (Conception), Gabriel Olympie (2501.ai), Ihab Bendidi (Recursion), Joel Miligan, Laurent Mazare (Gradium), Nancy Wang (FluenTea), Stéphane Collot (Sequense), Tejas Chopra (Netflix)

TTY Weekly

Discussion about this post

Ready for more?