Wandering Nomad

8.7.25

DeepMesh makes artist-quality 3D meshes a one-click affair

Triangle-mesh modelling is the CAD world’s equivalent of hand-drawn in-betweens: essential, mind-numbing and painfully slow. A new paper out of Tsinghua University, NTU and ShengShu AI says it can hand that job to an LLM-sized transformer without melting your GPU.

The team’s framework, DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning, marries a clever compression trick with a dose of RLHF to crank out clean, editable topology directly from point clouds or images.

Why previous mesh LLMs hit the wall

Most auto-regressive mesh generators treat every vertex coordinate as a token. Feed them a high-poly model and the sequence balloons into tens of thousands of steps, torpedoing training stability and inference speed. Worse, their loss functions optimise geometry alone, so outputs pass numeric checks yet still look like Swiss cheese to artists.

Two upgrades, one big leap

Pillar	What they did	Why it matters
72 % shorter sequences	A hierarchical patch-based tokenization merges duplicate offsets and encodes connectivity inline, shrinking vertex strings by nearly three-quarters without dropping detail.	Cuts pre-training FLOPs and lets the model scale to 30 k-face meshes on a single A100.
Human-aligned RL	Collected 5 000 preference pairs scored with a hybrid of human rating and 3D metrics, then ran Direct Preference Optimization (DPO) on the base model.	Removes holes and stray faces while nudging topology toward “artist-grade” layouts.

The researchers also trimmed an 800 k-mesh corpus to a cleaner 500 k set, tamping down the loss spikes that plague raw WebGL scrapes.

Results: fewer faces, better faces

Up to 1 B parameters: two Hourglass-style transformer variants (500 M & 1 B) both converge in 100 k steps thanks to shorter sequences.
Topology wins: DeepMesh’s large model eliminates 90 % of non-manifold edges that slip through MeshGPT and Nautilus, according to the authors’ “topology-valid” metric.
Visual quality: crowd-sourced raters picked DeepMesh over MeshGPT by 68 % on identical point-cloud prompts (exact numbers in paper’s Sec. 4.3).
Speed: a full 30 k-face generation takes ≈10 min, versus 20–25 min for LoRA-fine-tuned diffusion baselines reported in prior work.

A public demo gallery already shows clean Watertight dragons, furniture and stylised characters rendered straight from sparse point clouds.

Why this is bigger than 3D fan art

Game studios, AR platforms and online-creator tools alike are sitting on troves of unoptimised 3D scans. A transformer that understands connectivity as well as shape could batch-convert those scans into lightweight, animation-ready assets—no retopology pass required. And because DeepMesh’s DPO loop is “just” another RLHF recipe, the same pipeline could teach a mesh LLM brand-specific style or IP-safe anatomy without touching the base weights.

The authors hint at scaling past one billion parameters and adding text-conditioned generation. Given how fast 3D GenAI is snowballing, don’t bet against DeepMesh—or its tokenization trick—showing up in the next wave of text-to-world engines.

Paper link: arXiv 2503.15265 (PDF)

7.7.25

ARAG puts a multi-agent brain inside your RAG stack — and Walmart’s numbers look eye-popping

Retrieval-augmented generation (RAG) has become the go-to recipe for giving large language models real-world context, but most deployments still treat retrieval as a dumb, one-shot lookup. Researchers at Walmart Global Tech think that leaves serious money on the table — especially in e-commerce, where user intent shifts by the minute. Their new framework, ARAG (Agentic Retrieval-Augmented Generation), adds a four-agent reasoning layer on top of vanilla RAG and reports double-digit gains across every metric that matters.

Four specialists, one conversation

User-Understanding Agent distills long-term history and the current session into a natural-language profile.
NLI Agent performs sentence-level entailment to see whether each candidate item actually supports that intent.
Context-Summary Agent compresses only the NLI-approved evidence into a focused prompt.
Item-Ranker Agent fuses all signals and produces the final ranked list.

Each agent writes to — and reads from — a shared blackboard-style memory, so later agents can reason over earlier rationales rather than raw text alone.

How much better? Try 42 %

On three Amazon Review subsets (Clothing, Electronics, Home), ARAG beats both a recency heuristic and a strong cosine-similarity RAG baseline:

Dataset	NDCG@5 ↑	Hit@5 ↑
Clothing	+42.1 %	+35.5 %
Electronics	+37.9 %	+30.9 %
Home & Kitchen	+25.6 %	+22.7 %

An ablation test shows that yanking either the NLI or context-summary modules knocks as much as 14 points off NDCG, underlining how critical cross-agent reasoning is to the win.

Why it matters

Personalization that actually reasons. By turning retrieval and ranking into cooperative LLM agents, ARAG captures the nuance of why an item fits, not just whether embeddings are close.
No model surgery required. The team wraps any existing RAG stack; there’s no need to fine-tune the base LLM, making the upgrade cloud-budget friendly.
Explainability for free. Each agent logs its own JSON-structured evidence, giving product managers a breadcrumb trail for every recommendation.

The bigger picture

Agentic pipelines have taken off in code generation and web browsing; ARAG shows the same trick pays dividends in recommender systems, a multi-billion-dollar battleground where percent-level lifts translate into real revenue. Expect retailers and streaming platforms to test-drive multi-agent RAG as they chase post-cookie personalization.

Paper link: arXiv 2506.21931 (PDF)

6.7.25

LangGraph Rollout: how VeRL leveled-up multi-turn Agent RL

Why this matters

If you’ve ever tried to train an LLM-powered agent with many tool calls spread across a genuine back-and-forth conversation, you’ve probably discovered that “multi-turn” means different things to different frameworks. Yanbin Jiang’s latest post shows how the VeRL team punched through that ceiling by grafting LangGraph directly onto VeRL’s reinforcement-learning rollout engine. The result is a training loop that speaks the same language as production code.

1. Where they started

Native VeRL multi-turn – great for quick experiments. You enable multi_turn: True, write a YAML schema for each tool, implement an async Python class, and you’re off; their GSM8K benchmark ran in two days.
Pain points
1. Double bookkeeping: every tool had to be declared twice (YAML + Python).
2. Drift: schema and code fell out of sync, and prod tools (written for LangChain/LangGraph) diverged from the “training” clones.

2. A quick stop-gap: automatic tool wrapping

Yanbin added BaseTool.from_callable(), which introspects any plain Python function with transformers.utils.get_json_schema, then fabricates a VeRL-compatible wrapper on the fly. One list of callables (tool_list = [multiply, add, …]) now powers both training and prod.

My dev take: this is the same pattern I use in LangChain when I decorate business logic with @tool. Nice to see VeRL admit “if you can’t beat reflection, join it.”

3. The real blocker: orchestration power

Research quickly outgrew VeRL’s built-in rollout:

Need	Why VeRL fell short
Dynamic branches & backtracking	Native graph was too rigid.
True multi-turn dialogue (user follow-ups)	Any assistant message without tool calls ended the convo.
Per-node sampling / chat-template tweaks	Global settings only.

Enter LangGraph: a lightweight DAG engine already shipping in production.

4. Architectural insight: separation of concerns

“Let VeRL manage actor weights & hardware; let LangGraph drive the conversation.”

So they built a LangChain-compatible chat-model client for VeRL’s SGLang server. Training now works like this:

VeRL hands the initial messages + model handle to the user’s LangGraph.
The graph does its thing—branching, retrying, invoking tools—using the exact actor weights being optimized.
When the graph stops, VeRL collects the message history and rewards.

The PR shows a seven-line YAML snippet that swaps the old rollout for:

yaml
multi_turn:
  chat_template_kwargs: {enable_thinking: false}
  langgraph:
    path: /path/to/graph.py
    graph_config: {recursion_limit: 100}

…and a 60-line example graph that binds tools, counts turns, and lets you vary temperature node-by-node.

5. Why I’m excited

One graph to rule them all – deployment and training share code; no more “but it worked in prod!”
Easier ablations – want to test a new branch strategy? Edit the graph script; RL pipeline stays untouched.
Framework-agnostic future – the same bridge pattern could plug VeRL into OpenAI Function Calling, Microsoft’s AutoGen, or whatever framework wins next year.

My takeaway

VeRL just became a lot more attractive for serious agent RL work. By leaning on LangGraph instead of extending an in-house orchestration DSL, the team keeps VeRL laser-focused on fast rollouts, leaves graph logic to a dedicated library, and—crucially—lets devs iterate on one codebase. If you’re juggling duplicate tool definitions or fighting mismatch between training and production, clone Yanbin’s PR and breathe easier.

Explore it more here: https://jybsuper.github.io/posts/langgraph_rollout/