Showing posts with label Qwen2.5-14B. Show all posts
Showing posts with label Qwen2.5-14B. Show all posts

22.8.25

ComputerRL scales online RL for “desktop agents,” unifying APIs and GUIs

 The next wave of computer-use agents won’t just click around UIs—they’ll mix API calls and GUI interaction in one policy. That’s the bet behind ComputerRL, a new framework that treats desktop work as an end-to-end reinforcement learning problem and introduces an API-GUI paradigm so agents can call services and operate human-oriented interfaces within the same loop. 

The missing infrastructure for scale

Training desktop agents with online RL has been hamstrung by slow, brittle environments. ComputerRL ships a distributed RL stack that orchestrates thousands of parallel virtual desktops, making long-horizon, on-policy training runs practical for general computer use. 

Stabilizing long runs: Entropulse

Pure RL on complex desktops tends to collapse exploration entropy over time. The authors propose Entropulse, a simple but effective schedule that alternates RL with supervised fine-tuning, restoring healthy entropy while retaining the gains from policy improvement. 

Results & models

Using open backbones (GLM-4-9B-0414 and Qwen2.5-14B), the team evaluates on OSWorld and reports 48.1% accuracy with AutoGLM-OS-9B, a new state of the art for general desktop automation in their setup. The framework underpins the group’s AutoGLM system.

Why it matters

  • Bridging the modality gap: Real workflows mix API calls with UI operations; ComputerRL trains a single policy to do both. 

  • Throughput for RL: Parallelized desktops unlock the scale online RL has needed for computer agents. 

  • Simple stability trick: Entropulse offers a practical recipe any lab can try to keep long runs from collapsing. 

If your roadmap includes agents that file expenses, reconcile sheets, or run web apps end-to-end, ComputerRL reads like a blueprint for turning brittle demos into trainable, scalable systems.

Paper link: arXiv 2508.14040 (PDF)

 Multi-agent frameworks can crush complex tasks—but they’re brittle, hand-engineered, and expensive to run. OPPO’s AI Agent team proposes a ...