Your Dose of Reg.exe, Week {23}

AI safety crisis as Google agent wipes user's drive; critical React exploit active in wild; GPT-5.2 Pro hits 90.5% on ARC-AGI with 390X efficiency gain; RL + robotics surge at NeurIPS.

Dec 13, 2025

Reg.exe is a global closed community of 260+ engineers, founders, and researchers interested in AI innovation, from San Francisco to Tokyo. Each week, we share the highlights of our discussions in a newsletter. If you’d like to join, write to join@welovesota.com

👉 Article originally posted on WeLoveSota.com

Audio

🎙️ OpenAI conversational AI improvement - OpenAI released updates to ChatGPT Voice allowing users to use it directly inside chat without separate mode. Users can talk, watch answers appear, review earlier messages, and see visuals like images or maps in real time. Rolling out to all users on mobile and web, with option to enable “Separate mode” under Settings for the original experience. (🙏 Yusuf Eren)

Autonomous Agents

🔥 Skybridge framework for ChatGPT apps released - Nikolay Rodionov announced the first open-source framework to build ChatGPT apps, adding missing DX features to the OpenAI Apps SDK including type safety, easy testing and HMR, React-Query style hooks, UI-to-LLM sync, and more. (🙏 Nikolay Rodionov @ Alpic)

🤦 Google’s agentic AI catastrophic failure - Google’s Antigravity AI agent wiped a user’s entire hard drive without permission after misinterpreting instructions to clear a cache. The agent apologized stating “I am absolutely devastated to hear this. I cannot express how sorry I am” and called it “a critical failure on my part.” The user documented the incident with a screen recording as the agent deleted their entire D: drive instead of just the project cache. (🙏 Julien Mangeard @ Plakar)

🙋🏽 Benchmarking inference engine providers - Tejas Chopra announced work on benchmarking inference engine providers including Fireworks, Baseten, Friendli, and OpenAI. The goal is to compare costs, TTFT, throughput, latency, and TIL (time between tokens) using the same model (like Llama-70B) to help companies select the right inference engine. The project focuses on simulating AI agent interactions rather than just single prompts, as agentic inference differs from human inference: agents care about throughput and end-to-end latency while humans prioritize TTFT.

👉 If you’re interested in the results, contact Tejas.

Biotech, Health, And Chemistry

🧬 Arc Virtual Cell Challenge winners announced - The goal of the Virtual Cell Challenge is to benchmark and reward models that can accurately and robustly predict single‑cell responses across diverse biological conditions using standardized datasets and multi-metric evaluation. (🙏 Jeremie Kalfon, Felix Raimundo)

🔥 Community’s take: The challenge sparked criticism over its evaluation design. Many of the top submissions were said to have achieved high rankings by exploiting the scoring metric rather than by producing genuinely strong biological models, although the winning team does not appear to have hijacked the metric. The broader diagnosis is that this outcome was unsurprising, since the challenge incentivized metric optimization, leading to criticism of the competition as poorly designed without accusing any specific team. Overall, the discussion frames the results as a case of Goodhart’s law and flawed benchmarks rather than cheating or misconduct by the winner.

💽 ASD: Antigen-Specific Antibody Database released - New curated dataset contains 1,097,946 unique antibody-antigen interaction records collected from 25 datasets across 15 sources, including 865,153 unique antibodies and 716,650 complete heavy and light chain pairs, with 9,575 unique antigens represented. (🙏 Sophie Monnier @ InstaDeep X-IA)

👁️ Image, Video & 3D

🌅 Saber framework for reference-to-video generation - Meta AI introduced a scalable zero-shot framework trained exclusively on video-text pairs, bypassing the bottleneck of explicit reference image-video-text triplets. The system generates videos that align with reference images while following text prompts.

🧊 Radiance Meshes - Radiance Meshes represent radiance fields as Delaunay tetrahedral volume meshes that render exactly and in real time on standard GPU triangle pipelines. (🙏 Samuel McFadden @ Vid2Scene)

They achieve state-of-the-art speed per primitive, about 32% faster than 3D Gaussian Splatting at 1440p for the same number of primitives.
Circumcenter-based queries into a Zip‑NeRF–style neural field keep appearance smooth despite Delaunay edge flips during vertex optimization.
The volumetric meshes integrate with existing tools, supporting physics-based simulation, complex camera models (e.g., fisheye), and watertight surface mesh extraction.

🫟 MeshSplatting for opaque mesh rendering - MeshSplatting is a differentiable renderer that learns smooth, opaque triangle meshes from radiance fields for real-time, high-quality view synthesis. It trains faster and uses less memory than prior mesh-based methods while producing engine-ready meshes for interactive applications. (🙏 Samuel McFadden)

Trains about 2× faster and uses around 2× less memory than leading mesh-based baselines on Mip-NeRF360.
Outputs connected, opaque triangle meshes that plug directly into standard engines like Unity and Unreal for physics, ray tracing, and editing.
Achieves higher visual fidelity than prior mesh methods, with sharper details and fewer artifacts.

🔥 Community’s take w/ Samuel McFadden: New 3D reconstruction papers are coming out like crazy. This one also looks very promising. It generates an opaque mesh that can be directly dropped into game engines with minimal additional development. I might have to implement support for this in Vid2Scene.

🎞️ Mago AI video-to-video tool launched - New stable video-to-video AI tool designed for creators, VFX artists, and 3D artists who want control over chaos. Features Nanobanana Pro for precise edits, AI VFX pipeline integration, clean character replacement, and AI rendering that speeds up 3D workflows. Now open to everyone. (🙏 Leonard Strouk @ .Omics)

Cyber

😈 Critical React vulnerability React2Shell (CVE-2025-55182) - Wiz Research discovered a critical RCE vulnerability in React Server Components being exploited in the wild. The vulnerability affects React and Next.js applications, with organizations urged to patch urgently.

Language Models

🧠 GPT-5.2 Pro achieves 90.5% on ARC-AGI-1 - ARC Prize verified new GPT-5.2 Pro (X-High) achieving 90.5% on ARC-AGI-1 at $11.64/task, representing a ~390X efficiency improvement over the unreleased o3 (High) which scored 88% at $4.5k/task one year ago. However, community discussion noted that while 5.2 performs well on ARC, it may not excel on regular benchmarks. (🙏 Kevin Kuipers)

🛰️ First LLM trained in space - Using Nvidia H100 onboard Starcloud-1, the team trained nano-GPT model from Andrej Karpathy on complete works of Shakespeare and successfully ran inference. They also ran inference on a preloaded Gemma model, marking a milestone in space-based AI computing. (🙏 Fabien Niel)

Also:

🧑‍💻 Devstral 2 and Mistral Vibe CLI released - Mistral announces Devstral 2, a high‑performing open-source coding model family, and Mistral Vibe CLI, a native terminal agent that can autonomously explore, modify, and orchestrate changes across entire codebases.

🗜️ Poetiq achieves ARC-AGI-2 SOTA at half the cost - Poetiq reports a 54% score on the ARC‑AGI‑2 Semi‑Private Test Set at about $30.57 per problem using a meta‑system that sits on top of Gemini 3, beating the previous best of 45% from Gemini 3 Deep Think at $77.16 per problem. Their key claim is that this new state‑of‑the‑art comes from a learned test‑time reasoning layer that orchestrates an off‑the‑shelf Gemini 3 model, without fine‑tuning the underlying frontier model itself. (🙏 Louis Choquel @ Pipelex)

MLOps

⫚ R-Fork accelerates model weight loading - LMSYS introduced R-Fork technology that makes spinning up additional replicas of the same model much faster and cheaper in terms of startup time and I/O, without extra memory copies on the source side. The technique enables more efficient model deployment and scaling. (🙏 Kevin Kuipers)

Programming

👩‍🎨 Cursor introduces visual design mode - Cursor released functionality allowing developers to design directly in their codebase by selecting elements, modifying them visually, and having Cursor write the code. The feature integrates visual design tools with code generation in a single workflow. (🙏 Sacha Morard @ Edgee, Kevin Kuipers, Robert Hommes @ Moyai)

🔥 Community take: Still a lot to try, but members are very impressed so far. For Robert, “once it became clear that Playwright was the most used MCP, this made a ton of sense.”

💬 Discussion about AI coding setup - Kasra Aliyon shared an optimized AI coding architecture using approximately 2000 lines of instructions divided across files covering architecture info, best coding practices, and business logic:

Have a large set of rules, roughly 2,000 lines of instructions split across multiple files.
The rules cover: a. architecture information b. best coding practices c. business logic.
We initially included all rules in every request, which caused excessive token consumption and context poisoning. We then moved to directory specific rules, but this proved insufficient. Some rules, such as database constraints, are still required when modifying UI logic, and token usage remained high.

Robert Hommes suggested exploring alternative approaches such as Quibbler to better manage and iterate on large rule sets. In terms of structure, Kasra noted that simple hierarchical organization using headings works well, and that older, more rigid or HTML-like formatting conventions are no longer necessary. (🙏 Kasra Aliyon, Robert Hommes)

Reinforcement Learning

🎓 Stanford Deep RL 2025 course online - Chelsea Finn’s Deep RL course lecture videos from Spring 2025 are now available on YouTube, providing comprehensive reinforcement learning education from Stanford. (🙏 Sophie Monnier)

🧠 Alibaba introduces SAPO reinforcement learning method - The Tongyi Qwen team at Alibaba introduced Soft Adaptive Policy Optimization (SAPO), an innovative reinforcement learning method enhancing RL stability in LLMs by adaptively adjusting update magnitudes, surpassing rigid clipping methods to make large language models more stable and powerful. (🙏 Margaux Wehr)

Robotic, World AI

🔬 Google DeepMind opens robotic AI lab in UK - Google DeepMind announced plans to open a robotic AI lab in the UK focused on discovering new materials, expanding their physical AI research capabilities.

🧠 RL + Robotics dominates NeurIPS 2025 - Analysis from NeurIPS 2025 showed Reinforcement Learning + Robotics has become the largest and fastest-rising field at the world-leading AI conference, driven by massive breakthroughs in embodied AI, autonomous agents, and next-gen robotics platforms, challenging the perception that AI is all about LLMs and GenAI. (🙏 Sophie Monnier)

TTY Weekly

Discussion about this post

Ready for more?