Your Dose of Reg.exe, Week {14}

Reg.exe is a global, closed community of 260+ engineers, founders, and researchers in AI—from San Francisco to Tokyo. Each week we share discussion highlights in a short newsletter.

Kevin Kuipers

Oct 11, 2025

Events

🇩🇪 Engineering Night #8 in Berlin (October 15) - Technical gathering bringing together engineering folks in Berlin for networking and discussions.
- 👉 Register here
🇬🇧 Inference & vLLM meetup (October 20) - After successful meetups in Paris, Exxa is expanding to London with talks on inference optimization, speculative decoding, and AI hardware.
- 👉 Register here

🇬🇧 Engineering Night #7 in London (past) - Despite some unforeseen difficulties, Jules Belveze (Dust) had a blast meeting Vincent Moens (Periodic Labs)

Job Board

🇪🇺 Senior Product Manager at Wire - Wire is seeking a senior product manager for their calling/conferencing product line. The role focuses on building a European, sovereign, secure alternative to big American players. Remote from Europe or in Berlin/Paris offices.
- 👉 Job description

Autonomous Agents

OpenAI’s AgentKit, obviously - OpenAI has unveiled AgentKit, a new toolset for building, deploying, and optimizing AI agents, aimed at developers and enterprises. (🙏 Kevin Kuipers, Robert Hommes)
- Key modules include Agent Builder (visual workflow design), Connector Registry (data/tool integration), and ChatKit (embedding chat-based agent UIs).
- Agent Builder lets you visually compose, version, and configure workflows, speeding up agent creation and collaboration.
- Connector Registry centralizes management of data sources, enabling seamless connections across ChatGPT, APIs, and third-party tools.
- ChatKit simplifies deploying chat interfaces for agents, with quick setup and easy customization for apps and websites.
- Expanded evaluation features: automated prompt optimization, trace grading, dataset management, and third-party model support to boost agent quality.
- New reinforcement fine-tuning options help agents reason better, including support for custom tool calls and custom grading criteria.
- Some felt the approaches were steps sideways rather than forward, relying on deterministic flowchart logic similar to older tools.

Interface view of a customer service automation flow in a visual builder tool. The canvas shows connected nodes labeled Start, Jailbreak guardrail, Classification agent, If/else, Return agent, Retention agent, Information agent, Hallucination guardrail, and End. A sidebar on the left lists available node types such as Agent, Note, File search, Guardrails, MCP, and User approval. Top controls include options for Evaluate, Code, Preview, and Publish.

Cyber

🤯 Weaponizing image scaling against production AI systems - Trail of Bits revealed sophisticated attack exploiting image scaling on Gemini CLI, Vertex AI Studio, and other production AI systems. The vulnerability allows data exfiltration and prompt injection. (🙏 Pierre Chapuis @ Finegrain)
“This attack works because AI systems often scale down large images before sending them to the model: when scaled, these images can reveal prompt injections that are not visible at full resolution”
- Attackers can weaponize image scaling to hide prompt injections that only appear after an image is downscaled for AI processing, compromising systems like Google Gemini CLI.
- These attacks exploit differences between user-visible images and the model’s processed (scaled down) input, enabling data exfiltration without user awareness.
- Vulnerabilities were demonstrated on multiple production AI platforms, including Vertex AI Studio, Gemini web/API, Google Assistant, and Genspark.
- The attack leverages downscaling algorithms (nearest neighbor, bilinear, bicubic), requiring tailored approaches depending on the library and implementation.
- Trail of Bits released Anamorpher, an open-source tool for exploring and generating images crafted for such attacks.
- Robust defense requires previewing model input, restricting image transformations, and implementing secure design patterns to prevent prompt injection via images.

Image showing a side-by-side comparison of an image that is harmless at the original resolution but contains a prompt injection when scaled down

AI for cyber defenders improvements - Recent work from Anthropic and Google DeepMind shows major advances in AI-driven vulnerability detection and remediation. Claude now demonstrates stronger capabilities in analyzing and securing code, while DeepMind’s CodeMender introduces an autonomous agent focused on proactive code security. 📰 Anthropic article 📰 DeepMind article (🙏 Kevin Kuipers)

Infrastructure

🇰🇷 South Korea’s 858 TB government data loss – A massive data loss in South Korea’s cloud infrastructure exposed the risks of relying solely on internal redundancy. The incident sparked a wider debate on backup strategies for petabyte-scale systems, questioning whether large cloud storage platforms like S3 should maintain traditional backups in addition to erasure coding and soft-delete mechanisms. (🙏 Julien Mangeard @ Plakar, Pierre Chapuis @ Finegrain)

Language Models

🤯 Tiny Recursive Model (TRM) achieves impressive results - Samsung’s 7M parameter model reportedly beat DeepSeek-R1, Gemini 2.5 Pro, and o3-mini on ARC-AGI reasoning tasks. François Chollet, ARC co-creator, called it “impressive work.” (🙏 Thierry Abalea)
“Contrary to the Hierarchical Reasoning Model (HRM), TRM requires no fixed-point theorem, no complex biological justifications, and no hierarchy. It significantly reduces the number of parameters by halving the number of layers and replacing the two networks with a single tiny network. It also simplifies the halting process, removing the need for the extra forward pass. Overall, TRM is much simpler than HRM, while achieving better generalization”
- TRM uses a tiny 2-layer network (7M params!) to recursively improve answers.
- Outperforms much larger models (LLMs, HRM) on Sudoku, Maze, ARC-AGI benchmarks.
- Works with very little training data (~1K examples).
- Recursion replaces deeper architectures; avoids biological or fixed-point tricks.
- Achieves top generalization: 87% on Sudoku-Extreme, beats Gemini Pro and Deepseek R1.
- Efficient halting and stable training via binary classification and EMA.

🛠️ Context Engineering using DSPy GEPA - A guide on how the DS⁠Py framework, enhanced with GEPA evolutionary optimization, can significantly improve AI coding agents through context engineering. By refining prompts and agent workflows, DS⁠Py enables the creation of modular, reliable AI agents for a wide range of technical tasks. The outcome is more structured and effective prompts, leading to measurable gains in agent performance and output quality. (🙏 Robert Hommes, Charly Poly @ Inngest)
- DSPy simplifies complex prompt engineering, model selection, and agent workflow design for AI applications.
- It enables modular construction of agents, so you can easily swap out model providers, input types, and feedback loops.
- DSPy supports advanced optimization routines - like GEPA - for improving agent reliability and output quality.
- The framework can be applied to various AI agent tasks, including data analytics, automation, and coding assistance.
- “Interesting. Here’s some context engineering using GEPA in DsPy. The resulting prompts are impressively well-structured. I was genuinely surprised.”
  shared Robert Hommes (Moyai)
📐 InferenceMAX launched - Platform for monitoring and analyzing inference performance with support from Nvidia, AMD, OpenAI, Microsoft, PyTorch, SGLang, vLLM, and major cloud providers.
- They benchmark LLM inference performance nightly on major hardware platforms using the latest software currently on three models DeepSeek R1, Llama 70B Instruct, gpt-oss 120B.
- Each model/hardware combo is tested across tensor parallel sizes and concurrent requests, showing throughput vs. latency using clear visual graphs.
- Recent benchmarks focus on models like Llama 3.3 70B Instruct at FP8 precision, with metrics like token throughput per GPU and latency.
- Hardware platforms include H100, H200 (TRT), B200, MI300X, MI325X, MI355X, and more.

🔬 Evolving Algorithms Efficiently with ShinkaEvolve: ShinkaEvolve is an open-source framework designed to discover new algorithms with LLMs at unprecedented sample efficiency, inspired by evolution in nature as a masterful search algorithm.
“Evolution in nature is a masterful search algorithm, creating sophisticated solutions over millennia. In our work ... our consistent theme is to bring this incredible search algorithm to AI-driven discovery.”
- It is sample-efficient: solutions are found in as few as 30 to 150 generations, outperforming prior approaches like AlphaEvolve.
- Works across diverse domains: mathematical optimization, agentic system design, competitive programming, and LLM training design.
- Key innovations include balancing exploration/exploitation, novelty-based rejection sampling, and task-dependent LLM prioritization.
- ShinkaEvolve aims to be a co-pilot tool for scientists and engineers, accelerating AI and algorithmic research substantially.

Also:

🔥 Qwen3 Omni models released - Alibaba released Qwen3 Omni and Qwen3 Omni Realtime, two natively end-to-end “omni”-modal models processing text, images, audio, and video in a single unified architecture. Artificial Analysis benchmarking showed competitive Speech-to-Speech performance.
📑 ModernVBERT visual document retriever - New 250M parameter vision-language model achieved state-of-the-art performance for its size on the ViDoRe benchmark for visual document retrieval.
📄 Best input data format for LLMs benchmarked - Study compared 11 data formats (markdown tables, JSON, CSV, YAML, XML, etc.) to determine which LLMs understand best. XML emerged as the preferred format, though with only 60% success rate for the best one.
🔬 LoRA without regret - New research from Thinking Machines Lab demonstrated that LoRA can match full training performance more broadly than expected, challenging assumptions about parameter-efficient fine-tuning. 📰 Analysis
🇮🇹 Italian LLM startup Domyn eyes €1bn fundraise - Milan-based company, previously known as iGenius, is targeting LLMs for defense and AI gigafactory builds, seeking significant funding over the next six months.

MLOps

GGUF metadata editing on Hugging Face - Long-awaited feature launched allowing users to edit GGUF metadata directly from Hugging Face without downloading models locally. Announcement (🙏 Xuan-Son Nguyen @ Hugging Face)

Programming

⚖️ BigCodeArena for judging code generations - Hugging Face launched platform allowing side-by-side comparison of code generation models with actual execution and testing. Users submit coding tasks, watch models generate solutions, execute both programs, and vote on results. Organized into community leaderboard. (🙏 Kevin Kuipers)

Robotic

🤖 Figure 03 humanoid robot introduced - Figure unveiled their latest humanoid robot generation. (🙏 Fabien Niel)

🙌 New Members

🇫🇷 Nikolay Tchakarov (Asteria) - CTO at Asteria, a B2B SaaS biomimicry platform. Previously spent 12 years in London using ML to predict financial markets before pivoting to data and AI engineering. Can write and draw with both hands 🙌.
🇫🇷 Clement Nguyen (Lemrock) - Co-founder & CTO at Lemrock. Full-stack SWE with strong AI background. YC S24 alumnus. Nearly pursued a PhD in audio processing and can cook Vietnamese pho 🍜.
🇫🇷 Sasha Collin (Lemrock) - Co-founder & CPO at Lemrock, building AI agents for media & ecommerce. Data Scientist with research background in AI applied to healthcare. YC S24 alumnus. Can make his co-founder cook Vietnamese pho anytime.

Article originally posted on WeLoveSota.com

TTY Weekly

Discussion about this post

Ready for more?