Your Dose of Reg.exe, Week {17}

Autonomous agents evolve with subagent architectures and security integration, while open-source alternatives challenge proprietary dominance.

Kevin Kuipers

Nov 01, 2025

We <3 Sota Blog

🌅 Becoming Standard: WebP’s Late Victory and AVIF’s Slow Takeover - How technically superior formats struggle with adoption despite solving real problems. A story of ecosystem politics trumping technical merit.
♟️ Deployment Metabolism: Mapping Constraints to Strategy - How constraints determine viable deployment strategies.

Autonomous Agents

🤖 Anthropic’s Subagent - Daniel Huynh shared “direct tool use in ChatGPT or Claude is doomed. subagents are the future. (…) With proper subagents, it should become possible for my ChatGPT or Claude (both generalists) to provide just the right context to a subagent (e.g., an invoice manager) to perform the required tasks.”

A statement to which Nikolay Rodionov (Alpic) added nuance: “I think this will be a mix of both. Having a custom context for each task isn’t very portable.” (👉 Daniel Huynh @ Conception.dev, Nikolay Rodionov @ Alpic)

🤖 Toolathlon benchmark launched - New benchmark to evaluate LLM agents on long-horizon tasks across diverse, realistic scenarios. Claude 4.5 Sonnet achieved highest pass@1 score of 38.6%, ahead of GPT-5, though success rates remain relatively low even for top performers. (🙏 Kevin Kuipers, Pierre Chapuis)

⚖️ Agent and evals - Two weeks ago, we shared Andrew Ng’s post criticizing the lack of proper agentic testing. Tejas Chopra and Robert Hommes are actively exploring that direction:
- Tejas has built Agent Reliability Scorer, analyzing multi-agentic development systems on 17 parameters using OpenTelemetry logs. Framework-agnostic system provides scores and recommendations for improvements like retries and sub-agent breakdown. He is seeking for collaborators for testing: reach out if interested.
- Robert (Moyai.ai) is also implementing a 🤖 bunch of demo agents in the most popular frameworks to get a feeling of what works and what doesn’t.
🎛️ OpenAI Apps SDK-compatible MCP - Alpic released a minimalist SDK-compatible MCP ChatGPT app template for rapid prototyping, featuring a Vite dev server with HMR, widget-to-endpoint mapping via file conventions, one-click deploy on Alpic, and built on the official Model Context Protocol SDK. No platform lock-in. (👉 Nikolay Rodionov @ Alpic)

Also:

🎛️ Claude Skills in Codex CLI: Make Claude Skills work in other agents like Codex by adding the missing piece: a small enumerator script.

Biotech, Health, and Chemistry

🧬 OpenFold3 release (GitHub) - OpenFold Consortium released preview of OpenFold3, an open-source foundation model for structure prediction of proteins, nucleic acids, and drugs. Fully accessible alternative to proprietary AlphaFold3 implementation.
🩻 Foundation models in pathology critique - Analysis revealed why foundation models fail in clinical pathology deployment, identifying category errors in optimization focus. Models optimized for parameters and benchmark accuracy instead of clinical utility and real-world robustness. Recommendation to abandon universal foundation model fantasy and integrate them intelligently into domain-specific systems.
🧪 Moderna’s challenges - Detailed analysis of Moderna’s recent struggles, highlighting poor management with 50 simultaneous programs and transition from COVID-era success to current operational difficulties. Discussion noted similarities to biotech’s “disillusion step” in hype cycle. (👉 Felix Raimundo @ Tychobio.ai, Jeremie Kalfon @ ENS/Pasteur)

📚 Healthcare AI adoption report - Menlo Ventures released comprehensive State of AI in Healthcare report showing healthcare setting the pace for enterprise AI adoption despite historically being a digital laggard. (🙏 Willy Braun)

Computer Vision

🔺 Triangle Splatting+ breakthrough - New paper demonstrated combining photorealism of gaussian splatting with game engine compatibility of triangles. Samuel tested locally with good results, though noted issues with transparent surfaces. Potential for transforming 3D rendering pipelines. (👉 Samuel McFadden @ Vid2Scene)

📑 Text-to-CAD ADAM release - YC’25 company ADAM open-sourced their text-to-CAD stack CADAM, enabling natural language to CAD model generation. Community sought experiences with this and other LLM-to-CAD alternatives. (👉 Nikolay Tchakarov @ Asteria)

Also:

👁️ FARMER generative framework - A generative model that combines normalizing flows and autoregressive transformers to efficiently synthesize images from raw pixels while allowing for exact likelihood estimation. It introduces latent dimension reduction and a fast distillation technique, making it competitive with other pixel-based image generators in quality and scalability.
🫟 3D Gaussian Splatting advancement - Gaussian splatting technology is making progress. Remi Kaito (Arcade AI), along with Jessy Tang and Carlos D. Mejia, won the Supercell Hackathon using 3DGS on Veo3-generated environments converted with Vid2Scene. (🎞️ Demo available here)

Cyber

🛡️ OpenAI Aardvark security agent - OpenAI introduced Aardvark, an agentic security researcher in private beta that thinks like a security researcher and scales to meet modern software demands. Represents convergence of autonomous agents and cybersecurity applications.
💉 Perplexity prompt injection mitigation - Under pressure over the risks posed by the wave of AI browsers, Perplexity published a detailed overview of its approach to mitigating prompt injection in the Comet browser agent, addressing key security concerns around AI assistants with browser access.

Infrastructure

⚡️ Poolside’s 2GW AI campus project - Poolside announced Project Horizon, scaling from coding agent to full-stack compute infrastructure including a 2GW AI campus. Represents massive infrastructure investment for AI model training and deployment.

Also:

🚨 AWS DynamoDB outage analysis - Amazon published detailed post-mortem of DynamoDB service disruption in Northern Virginia region, providing valuable insights into large-scale infrastructure failure analysis and recovery procedures. (👉 Antoine Millet @ Scaleway)
🙏 OHTTP production usage inquiry - Adam seeks practical experience with OHTTP deployment in production environments, highlighting interest in advanced privacy-preserving HTTP protocols for real-world applications. If you have an experience with OHTTP, reach out to him. (👉 Adam Surak, Julien Mangeard)
🤗 Scaleway Hugging Face partnership - Scaleway is now a Hugging Face Inference Provider, offering European developers flexible, secure, and sovereign cloud infrastructure for serverless AI inference at scale (👉 Antoine Millet @ Scaleway)

Language Models

📈 Open-weight models lag analysis - Epoch.ai research showed open-weight models lag state-of-the-art by around 3 months on average, providing quantitative analysis of the gap between proprietary and open model performance.

🤗 Hugging Face training playbook - HuggingFace released comprehensive “Smol Training Playbook” providing secrets to building world-class LLMs. Pierre humorously “fixed” their flowchart suggesting more hands-on training approach over using existing models for learning purposes. (👉 Fabien Niel, Pierre Chapuis)

🧠 LLaDA2.0 diffusion language model - InclusionAI released 100B parameter (6.1B active) diffusion language model achieving decent benchmark results, representing continued research into alternative architectures beyond traditional transformers. (👉 Gabriel Olympie)

MLOps

🔄 Recursive Language Models innovation - New inference strategy allowing language models to decompose and recursively interact with unbounded context through REPL environments. Small RLM (GPT-5-mini) beat larger models on difficult long-context tasks at lower cost, addressing context rot issue.
- RLMs let a LM call itself or other models recursively to handle long or complex input.
- It can process inputs much larger than a normal model’s token window.,
- It works inside an environment (like a REPL) and can peek, search, partition, or summarize pieces of the input.
- It solves ‘context rot’ (regular models forget or degrade with longer context),
- Recursive calls and chunking strategies are chosen dynamically at inference, not fixed in advance by the user.,
- RLMs are not classic agents or just summarizers: the model itself decides how to decompose context.

🧠 DeepMind reinforcement learning algorithm discovery - Nature paper revealed AI autonomously discovering state-of-the-art reinforcement learning algorithms, representing breakthrough in meta-learning and algorithm optimization. A direct follow-on from earlier DeepMind work (👉 Anselme Trochu, Fabien Niel)
😴 vLLM sleep mode feature - vLLM introduced zero-reload model switching with sleep mode, solving the multi-model serving problem without requiring 2x GPU memory or slow reload times. Enables efficient model switching for applications requiring multiple LLMs. (👉 Robert Hommes)

Also:

🧪 On-policy distillation methodology - Thinking Machines published approach combining reliable on-policy training with cost-efficient dense reward signals. Student model trains on its own outputs with teacher providing token-by-token feedback using reverse KL divergence.
🤔 DGX Spark performance concerns - John Carmack reported DGX Spark maxing out at only 100 watts power draw (less than half rated 240W) and delivering approximately half quoted performance, raising questions about hardware efficiency claims.
👀 Kimi Linear hybrid attention architecture - Moonshot AI released Kimi Linear 48B model with hybrid linear attention achieving superior performance and hardware efficiency for long-context tasks. Reduces KV cache needs by 75% and boosts decoding throughput by 6x for 1M token contexts. Interesting contrast with MiniMax returning to full attention.

Programming

📉 Junior SWE displacement analysis - Multiple graphs were shared showing a dramatic decline in junior developer hiring coinciding with the release of ChatGPT 3.5. The trend within the community appears to favor coding agents over junior hires for simpler projects. (👉 Anselme Trochu @ UN, Gawen Arab)

🪶 Cursor 2.0 and Composer launch - Cursor released major update with new interface and first coding model Composer, specifically designed for working with agents. Represents evolution of AI-powered coding environments toward multi-agent workflows. Members tested it, but it likely failed to run properly due to server overload (👉 Arnaud Porterie @ Vibe, Anselme Trochu @ UN)
🧑‍💻 MiniMax M2 open-source release - MiniMax released M2 model claiming a new all-time-high Intelligence Index score for open weights models with impressive efficiency (10B active parameters, 200B total). Focused on agentic capabilities and coding, available globally free for limited time. Strong performance 2x faster than Claude Sonnet at 8% price. (👉 Kevin Kuipers, Julien Kilo)

🐧 Omarchy Linux distribution - Hugo Serrat experimented with DHH’s opinionated Linux distribution based on Arch Linux leveraging Hyprland tiling window manager. Provides polished alternative to manual configuration avoiding bugs of other solutions. (👉 Hugo Serrat)

Also:

🤖 GitHub Agent HQ introduction - GitHub Universe 2025 introduced Agent HQ, a unified workflow for orchestrating any agent anywhere. Continues proliferation of coding agents in development ecosystem.

Robotic

🤖 1X NEO home robot pricing - Announcement of the NEO home robot, powered by Redwood AI (1X’s generalist AI model) for learning and repeating tasks. It will be available through a $499/month subscription or $20,000 ownership option. The news sparked strong skepticism about the timeline, with U.S. deliveries expected to start in 2026. (👉 Kevin Kuipers, Mathieu Breton, Pierre Chapuis)

Community Topics

🎵 Potential generative music tool from OpenAI - Following TechCrunch’s report on OpenAI’s interest in generative music, members discussed the surge of AIM (AI Music) tracks flooding platforms like Spotify and Deezer, disrupting recommendation algorithms. (👉 Gawen Arabs @ Airbuds, Gabriel Olympie, Pierre Chapuis @ Finegrain, Shuai Zhang @ Jinko)
- Gawen Arabs, co-founder of Airbuds (music app), shared insights on the growing cancel and backlash movement among Gen Z in the US toward AIM, with influencers making “canceling AI” trendy: “In music, there’s still an uncanny valley (vocals still need some work) but it’s closing fast. Professional artists appear to be using tools like Suno widely, though quietly, to protect their reputation. Labels are pushing back, while regulators are still figuring out how to make GenAI music pay its dues. That said, there are early signs that the conflict between labels and GenAI may already be settling.”
- A recent study (🔬 Echoes of Humanity: Exploring the Perceived Humanness of AI Music, 2025) showed that detecting AIM is very challenging: “When pairs are random [two tracks with different style of music] listeners cannot differentiate AIM from human-made songs, i.e., they are no better than random guessing. Nevertheless, when pairs are highly similar, listeners are able to make this distinction.” Funny enough, it shows that while being a professional musician improves detection, “surprisingly, a Formal Education of 5 to 10 years appears on the model with a negative effect.” Meaning they are worse detectors than no musical education at all.
📈 Anthropic overtaking OpenAI in Enterprise - Claude’s LLM API market share is reportedly gaining rapid adoption especially for “Enterprise Search” among non-coders. “Honestly, Claude has been killing it, and Enterprise Search, which shipped recently, is getting adopted super fast by non-coders in our org.” said, Arnaud Porterie @ Vibe.

Events

🇪🇺 Future of Cloud Ops in Paris, November 6 - Event on the future of cloud ops held at ZML offices, 14 Rue Le Peletier 75009 from 19:00-21:30.
🛠️ dotAI x Pruna Workshop in Paris, November 5 - Hands-on session in Paris at La Maison (18:30–21:00) for AI devs, ML engineers, and researchers about making models faster, cheaper, and greener.
👷‍♀️ Anyshift Product Demo in Paris, November 5 - Live demo at Anyshift Paris offices (Opera). Showcased real incident alert resolution.
🎭 Generation AI Conference in Paris, December 9-11 - Where builders, wizards, and prophets of GenAI and Agent Experience meet. 🎤 Speaker applications
👩‍💻 Women in Tech Initiative in Paris - Initiative launching at offices with X-IA community (AI community of Polytechnique) in small committee: Contact Margaux.
🍕 Our last TechLunch with Amine Saboni (Pruna AI), Antoine Durieux, Axel Nguyen and Hugo Le Belzic (H Company), Fabien Niel, Jonas Levy (Relief), Shuai Zhang (Jinko), Xavier Ayme, and Ziv Ilan (NVIDIA), featuring discussions on OpenAI’s recent commitment to open-weight models, MCP, and the Skills alternative.

No alternative text description for this image

Job Board

🇬🇧 CTO Co-founder (London/Western Europe) - Early-stage company building intelligent layer for Excel seeking technical co-founder. Equal equity split, pre-seed/MVP stage with early user interest. Looking for deeply technical founder-minded individual ready for full-time commitment. Contact: Shalaby.sar@gmail.com (🙏 Fabien Niel)
🇫🇷 CTO & Co-founder (Bordeaux/Remote) - Charles, former CFO of Lydia and Joko, seeking CTO partnership for Folkyn, a SaaS platform for managing freelancers. MVP live since September with beta testers and customers. Ideally 8-15 years experience in startup environment. Contact: charles@folkyn.com (🙏 Mathieu Breton)
🇫🇷 IT Support Apprentice - Seeking apprenticeship position in IT Support, Maintenance, or Networking for hard-working candidate in first job opportunity. BTS SIO student with attached CV. (🙏 Alex Charbonnier)

New Members

🇫🇷 Felix Raimundo (TychoBio) - CEO at TychoBio solving RNA therapeutics design. Computer scientist turned computational biologist who moved from embedded systems to ML before applying computational tricks to biology at DeepMind, ErvImmune, and UMass. Great warhammer painter and amazing at being “too early” - used BTC when they were worth $10. Better at bio than most CS people, and better at CS than most bio people.
🇺🇸 Samuel McFadden (Vid2Scene) - Founder at vid2scene.com turning videos into explorable 3D environments using Gaussian Splatting. Previously Technical Director at Pixar and SWE at Microsoft building graphics and game engine tools. Passionate about the convergence of real-time 3D and AI, envisioning open-world RPGs where every NPC is truly intelligent. Loves converting regular food recipes into healthier low-calorie versions.
🇨🇭 Anselme Trochu (UN) - Information Systems Officer at OHCHR – UN Human Rights building global digital platforms that help countries track, implement, and report on human-rights commitments. Full-stack/product-minded engineer with 20 years of tech passion, leading UN platforms like the National Recommendations Tracking Database and Universal Human Rights Index. Owns an alarming number of domain names “for future unicorns” and keeps vacation spreadsheets with KPIs. Can translate a 60-page UN policy PDF into a working prototype before the espresso cools.
🇫🇷 Christian Wallenwein (Mistral AI) - Research Engineer at Mistral.ai working on European foundation models. Recently moved to Paris and just converted from intern to FTE. Started coding at 10, built and sold his first company after high school. Focused on ML, especially text and audio foundation models. Brews his own kombucha, learning French, and has been doing daily powernaps for 8+ years giving him daily superpowers.

👉 Article originally posted on WeLoveSota.com

TTY Weekly

Discussion about this post

Ready for more?