Your Dose of Reg.exe, Week {14}
Reg.exe is a global, closed community of 260+ engineers, founders, and researchers in AI—from San Francisco to Tokyo. Each week we share discussion highlights in a short newsletter.
Events
🇩🇪 Engineering Night #8 in Berlin (October 15) - Technical gathering bringing together engineering folks in Berlin for networking and discussions.
🇬🇧 Inference & vLLM meetup (October 20) - After successful meetups in Paris, Exxa is expanding to London with talks on inference optimization, speculative decoding, and AI hardware.
🇬🇧 Engineering Night #7 in London (past) - Despite some unforeseen difficulties, Jules Belveze (Dust) had a blast meeting Vincent Moens (Periodic Labs)
Job Board
🇪🇺 Senior Product Manager at Wire - Wire is seeking a senior product manager for their calling/conferencing product line. The role focuses on building a European, sovereign, secure alternative to big American players. Remote from Europe or in Berlin/Paris offices.
Autonomous Agents
OpenAI’s AgentKit, obviously - OpenAI has unveiled AgentKit, a new toolset for building, deploying, and optimizing AI agents, aimed at developers and enterprises. (🙏 Kevin Kuipers, Robert Hommes)
Key modules include Agent Builder (visual workflow design), Connector Registry (data/tool integration), and ChatKit (embedding chat-based agent UIs).
Agent Builder lets you visually compose, version, and configure workflows, speeding up agent creation and collaboration.
Connector Registry centralizes management of data sources, enabling seamless connections across ChatGPT, APIs, and third-party tools.
ChatKit simplifies deploying chat interfaces for agents, with quick setup and easy customization for apps and websites.
Expanded evaluation features: automated prompt optimization, trace grading, dataset management, and third-party model support to boost agent quality.
New reinforcement fine-tuning options help agents reason better, including support for custom tool calls and custom grading criteria.
Some felt the approaches were steps sideways rather than forward, relying on deterministic flowchart logic similar to older tools.
Cyber
🤯 Weaponizing image scaling against production AI systems - Trail of Bits revealed sophisticated attack exploiting image scaling on Gemini CLI, Vertex AI Studio, and other production AI systems. The vulnerability allows data exfiltration and prompt injection. (🙏 Pierre Chapuis @ Finegrain)
“This attack works because AI systems often scale down large images before sending them to the model: when scaled, these images can reveal prompt injections that are not visible at full resolution”
Attackers can weaponize image scaling to hide prompt injections that only appear after an image is downscaled for AI processing, compromising systems like Google Gemini CLI.
These attacks exploit differences between user-visible images and the model’s processed (scaled down) input, enabling data exfiltration without user awareness.
Vulnerabilities were demonstrated on multiple production AI platforms, including Vertex AI Studio, Gemini web/API, Google Assistant, and Genspark.
The attack leverages downscaling algorithms (nearest neighbor, bilinear, bicubic), requiring tailored approaches depending on the library and implementation.
Trail of Bits released Anamorpher, an open-source tool for exploring and generating images crafted for such attacks.
Robust defense requires previewing model input, restricting image transformations, and implementing secure design patterns to prevent prompt injection via images.
AI for cyber defenders improvements - Recent work from Anthropic and Google DeepMind shows major advances in AI-driven vulnerability detection and remediation. Claude now demonstrates stronger capabilities in analyzing and securing code, while DeepMind’s CodeMender introduces an autonomous agent focused on proactive code security. 📰 Anthropic article 📰 DeepMind article (🙏 Kevin Kuipers)
Infrastructure
🇰🇷 South Korea’s 858 TB government data loss – A massive data loss in South Korea’s cloud infrastructure exposed the risks of relying solely on internal redundancy. The incident sparked a wider debate on backup strategies for petabyte-scale systems, questioning whether large cloud storage platforms like S3 should maintain traditional backups in addition to erasure coding and soft-delete mechanisms. (🙏 Julien Mangeard @ Plakar, Pierre Chapuis @ Finegrain)
Language Models
🤯 Tiny Recursive Model (TRM) achieves impressive results - Samsung’s 7M parameter model reportedly beat DeepSeek-R1, Gemini 2.5 Pro, and o3-mini on ARC-AGI reasoning tasks. François Chollet, ARC co-creator, called it “impressive work.” (🙏 Thierry Abalea)
“Contrary to the Hierarchical Reasoning Model (HRM), TRM requires no fixed-point theorem, no complex biological justifications, and no hierarchy. It significantly reduces the number of parameters by halving the number of layers and replacing the two networks with a single tiny network. It also simplifies the halting process, removing the need for the extra forward pass. Overall, TRM is much simpler than HRM, while achieving better generalization”
TRM uses a tiny 2-layer network (7M params!) to recursively improve answers.
Outperforms much larger models (LLMs, HRM) on Sudoku, Maze, ARC-AGI benchmarks.
Works with very little training data (~1K examples).
Recursion replaces deeper architectures; avoids biological or fixed-point tricks.
Achieves top generalization: 87% on Sudoku-Extreme, beats Gemini Pro and Deepseek R1.
Efficient halting and stable training via binary classification and EMA.
🛠️ Context Engineering using DSPy GEPA - A guide on how the DSPy framework, enhanced with GEPA evolutionary optimization, can significantly improve AI coding agents through context engineering. By refining prompts and agent workflows, DSPy enables the creation of modular, reliable AI agents for a wide range of technical tasks. The outcome is more structured and effective prompts, leading to measurable gains in agent performance and output quality. (🙏 Robert Hommes, Charly Poly @ Inngest)
DSPy simplifies complex prompt engineering, model selection, and agent workflow design for AI applications.
It enables modular construction of agents, so you can easily swap out model providers, input types, and feedback loops.
DSPy supports advanced optimization routines - like GEPA - for improving agent reliability and output quality.
The framework can be applied to various AI agent tasks, including data analytics, automation, and coding assistance.
“Interesting. Here’s some context engineering using GEPA in DsPy. The resulting prompts are impressively well-structured. I was genuinely surprised.”
shared Robert Hommes (Moyai)
📐 InferenceMAX launched - Platform for monitoring and analyzing inference performance with support from Nvidia, AMD, OpenAI, Microsoft, PyTorch, SGLang, vLLM, and major cloud providers.
They benchmark LLM inference performance nightly on major hardware platforms using the latest software currently on three models DeepSeek R1, Llama 70B Instruct, gpt-oss 120B.
Each model/hardware combo is tested across tensor parallel sizes and concurrent requests, showing throughput vs. latency using clear visual graphs.
Recent benchmarks focus on models like Llama 3.3 70B Instruct at FP8 precision, with metrics like token throughput per GPU and latency.
Hardware platforms include H100, H200 (TRT), B200, MI300X, MI325X, MI355X, and more.
🔬 Evolving Algorithms Efficiently with ShinkaEvolve: ShinkaEvolve is an open-source framework designed to discover new algorithms with LLMs at unprecedented sample efficiency, inspired by evolution in nature as a masterful search algorithm.
“Evolution in nature is a masterful search algorithm, creating sophisticated solutions over millennia. In our work ... our consistent theme is to bring this incredible search algorithm to AI-driven discovery.”
It is sample-efficient: solutions are found in as few as 30 to 150 generations, outperforming prior approaches like AlphaEvolve.
Works across diverse domains: mathematical optimization, agentic system design, competitive programming, and LLM training design.
Key innovations include balancing exploration/exploitation, novelty-based rejection sampling, and task-dependent LLM prioritization.
ShinkaEvolve aims to be a co-pilot tool for scientists and engineers, accelerating AI and algorithmic research substantially.
Also:
🔥 Qwen3 Omni models released - Alibaba released Qwen3 Omni and Qwen3 Omni Realtime, two natively end-to-end “omni”-modal models processing text, images, audio, and video in a single unified architecture. Artificial Analysis benchmarking showed competitive Speech-to-Speech performance.
📑 ModernVBERT visual document retriever - New 250M parameter vision-language model achieved state-of-the-art performance for its size on the ViDoRe benchmark for visual document retrieval.
📄 Best input data format for LLMs benchmarked - Study compared 11 data formats (markdown tables, JSON, CSV, YAML, XML, etc.) to determine which LLMs understand best. XML emerged as the preferred format, though with only 60% success rate for the best one.
🔬 LoRA without regret - New research from Thinking Machines Lab demonstrated that LoRA can match full training performance more broadly than expected, challenging assumptions about parameter-efficient fine-tuning. 📰 Analysis
🇮🇹 Italian LLM startup Domyn eyes €1bn fundraise - Milan-based company, previously known as iGenius, is targeting LLMs for defense and AI gigafactory builds, seeking significant funding over the next six months.
MLOps
GGUF metadata editing on Hugging Face - Long-awaited feature launched allowing users to edit GGUF metadata directly from Hugging Face without downloading models locally. Announcement (🙏 Xuan-Son Nguyen @ Hugging Face)
Programming
⚖️ BigCodeArena for judging code generations - Hugging Face launched platform allowing side-by-side comparison of code generation models with actual execution and testing. Users submit coding tasks, watch models generate solutions, execute both programs, and vote on results. Organized into community leaderboard. (🙏 Kevin Kuipers)
Robotic
🤖 Figure 03 humanoid robot introduced - Figure unveiled their latest humanoid robot generation. (🙏 Fabien Niel)
🙌 New Members
🇫🇷 Nikolay Tchakarov (Asteria) - CTO at Asteria, a B2B SaaS biomimicry platform. Previously spent 12 years in London using ML to predict financial markets before pivoting to data and AI engineering. Can write and draw with both hands 🙌.
🇫🇷 Clement Nguyen (Lemrock) - Co-founder & CTO at Lemrock. Full-stack SWE with strong AI background. YC S24 alumnus. Nearly pursued a PhD in audio processing and can cook Vietnamese pho 🍜.
🇫🇷 Sasha Collin (Lemrock) - Co-founder & CPO at Lemrock, building AI agents for media & ecommerce. Data Scientist with research background in AI applied to healthcare. YC S24 alumnus. Can make his co-founder cook Vietnamese pho anytime.







