Showing posts with label Entropulse. Show all posts
Showing posts with label Entropulse. Show all posts

22.8.25

ComputerRL scales online RL for “desktop agents,” unifying APIs and GUIs

 The next wave of computer-use agents won’t just click around UIs—they’ll mix API calls and GUI interaction in one policy. That’s the bet behind ComputerRL, a new framework that treats desktop work as an end-to-end reinforcement learning problem and introduces an API-GUI paradigm so agents can call services and operate human-oriented interfaces within the same loop. 

The missing infrastructure for scale

Training desktop agents with online RL has been hamstrung by slow, brittle environments. ComputerRL ships a distributed RL stack that orchestrates thousands of parallel virtual desktops, making long-horizon, on-policy training runs practical for general computer use. 

Stabilizing long runs: Entropulse

Pure RL on complex desktops tends to collapse exploration entropy over time. The authors propose Entropulse, a simple but effective schedule that alternates RL with supervised fine-tuning, restoring healthy entropy while retaining the gains from policy improvement. 

Results & models

Using open backbones (GLM-4-9B-0414 and Qwen2.5-14B), the team evaluates on OSWorld and reports 48.1% accuracy with AutoGLM-OS-9B, a new state of the art for general desktop automation in their setup. The framework underpins the group’s AutoGLM system.

Why it matters

  • Bridging the modality gap: Real workflows mix API calls with UI operations; ComputerRL trains a single policy to do both. 

  • Throughput for RL: Parallelized desktops unlock the scale online RL has needed for computer agents. 

  • Simple stability trick: Entropulse offers a practical recipe any lab can try to keep long runs from collapsing. 

If your roadmap includes agents that file expenses, reconcile sheets, or run web apps end-to-end, ComputerRL reads like a blueprint for turning brittle demos into trainable, scalable systems.

Paper link: arXiv 2508.14040 (PDF)

 Multi-agent frameworks can crush complex tasks—but they’re brittle, hand-engineered, and expensive to run. OPPO’s AI Agent team proposes a ...