Showing posts with label Tool Orchestration. Show all posts
Showing posts with label Tool Orchestration. Show all posts

8.7.25

Context Engineering in AI: Designing the Right Inputs for Smarter, Safer Large-Language Models

 

What Is Context Engineering?

In classic software, developers write deterministic code; in today’s AI systems, we compose contexts. Context engineering is the systematic craft of designing, organizing and manipulating every token fed into a large-language model (LLM) at inference time—instructions, examples, retrieved documents, API results, user profiles, safety policies, even intermediate chain-of-thought. Well-engineered context turns a general model into a domain expert; poor context produces hallucinations, leakage or policy violations. 


Core Techniques

TechniqueGoalTypical Tools / Patterns
Prompt Design & TemplatesGive the model clear role, task, format and constraintsSystem + user role prompts; XML / JSON schemas; function-calling specs
Retrieval-Augmented Generation (RAG)Supply fresh, external knowledge just-in-timeVector search, hybrid BM25+embedding, GraphRAG
Context CompressionFit more signal into limited tokensSummarisation, saliency ranking, LLM-powered “short-former” rewriters
Chunking & WindowingPreserve locality in extra-long inputsHierarchical windows, sliding attention, FlashMask / Ring Attention
Scratchpads & CoT ScaffoldsExpose model reasoning for better accuracy and debuggabilitySelf-consistency, tree-of-thought, DST (Directed Self-Testing)
Memory & ProfilesPersonalise without retrainingVector memories, episodic caches, preference embeddings
Tool / API ContextLet models call and interpret external systemsModel Context Protocol (MCP), JSON-schema function calls, structured tool output
Policy & GuardrailsEnforce safety and brand styleContent filters, regex validators, policy adapters, YAML instruction blocks

Why It Matters

  1. Accuracy & Trust – Fact-filled, well-structured context slashes hallucination rates and citation errors.

  2. Privacy & Governance – Explicit control over what leaves the organisation or reaches the model helps meet GDPR, HIPAA and the EU AI Act.

  3. Cost Efficiency – Compressing or caching context can cut token bills by 50-80 %.

  4. Scalability – Multi-step agent systems live or die by fast, machine-readable context routing; good design tames complexity.


High-Impact Use Cases

SectorHow Context Engineering Delivers Value
Customer SupportRAG surfaces the exact policy paragraph and recent ticket history, enabling a single prompt to draft compliant replies.
Coding AgentsFunction-calling + repository retrieval feed IDE paths, diffs and test logs, letting models patch bugs autonomously.
Healthcare Q&AContext filters strip PHI before retrieval; clinically-approved guidelines injected to guide safe advice.
Legal AnalysisLong-context models read entire case bundles; chunk ranking highlights precedent sections for argument drafting.
Manufacturing IoTStreaming sensor data is summarised every minute and appended to a rolling window for predictive-maintenance agents.

Designing a Context Pipeline: Four Practical Steps

  1. Map the Task Surface
    • What knowledge is static vs. dynamic?
    • Which external tools or databases are authoritative?

  2. Define Context Layers
    Base prompt: role, format, policy
    Ephemeral layer: user query, tool results
    Memory layer: user or session history
    Safety layer: filters, refusal templates

  3. Choose Retrieval & Compression Strategies
    • Exact text (BM25) for short policies; dense vectors for semantic match
    • Summaries or selective quoting for large PDFs

  4. Instrument & Iterate
    • Log token mixes, latency, cost
    • A/B test different ordering, chunking, or reasoning scaffolds
    • Use self-reflection or eval suites (e.g., TruthfulQA-Context) to measure gains


Emerging Tools & Standards

  • MCP (Model Context Protocol) – open JSON schema for passing tool output and trace metadata to any LLM, adopted by Claude Code, Gemini CLI and IBM MCP Gateway.

  • Context-Aware Runtimes – vLLM, Flash-Infer and Infinity Lite stream 128 K-1 M tokens with optimized KV caches.

  • Context Observability Dashboards – Startups like ContextHub show token-level diff, attribution and cost per layer.


The Road Ahead

As context windows expand to a million tokens and multi-agent systems proliferate, context engineering will sit alongside model training and fine-tuning as a first-class AI discipline. Teams that master it will ship assistants that feel domain-expert-smart, honest and cost-efficient—while everyone else will chase unpredictable black boxes.

Whether you’re building a retrieval chatbot, a self-healing codebase or an autonomous research agent, remember: the model is only as good as the context you feed it.

6.7.25

LangGraph Rollout: how VeRL leveled-up multi-turn Agent RL

 

Why this matters

If you’ve ever tried to train an LLM-powered agent with many tool calls spread across a genuine back-and-forth conversation, you’ve probably discovered that “multi-turn” means different things to different frameworks. Yanbin Jiang’s latest post shows how the VeRL team punched through that ceiling by grafting LangGraph directly onto VeRL’s reinforcement-learning rollout engine. The result is a training loop that speaks the same language as production code. 


1. Where they started

  • Native VeRL multi-turn – great for quick experiments. You enable multi_turn: True, write a YAML schema for each tool, implement an async Python class, and you’re off; their GSM8K benchmark ran in two days. 

  • Pain points

    1. Double bookkeeping: every tool had to be declared twice (YAML + Python).

    2. Drift: schema and code fell out of sync, and prod tools (written for LangChain/LangGraph) diverged from the “training” clones. 


2. A quick stop-gap: automatic tool wrapping

Yanbin added BaseTool.from_callable(), which introspects any plain Python function with transformers.utils.get_json_schema, then fabricates a VeRL-compatible wrapper on the fly. One list of callables (tool_list = [multiply, add, …]) now powers both training and prod. 

My dev take: this is the same pattern I use in LangChain when I decorate business logic with @tool. Nice to see VeRL admit “if you can’t beat reflection, join it.”


3. The real blocker: orchestration power

Research quickly outgrew VeRL’s built-in rollout:

NeedWhy VeRL fell short
Dynamic branches & backtrackingNative graph was too rigid.
True multi-turn dialogue (user follow-ups)Any assistant message without tool calls ended the convo.
Per-node sampling / chat-template tweaksGlobal settings only.

Enter LangGraph: a lightweight DAG engine already shipping in production.

4. Architectural insight: separation of concerns

“Let VeRL manage actor weights & hardware; let LangGraph drive the conversation.” 

So they built a LangChain-compatible chat-model client for VeRL’s SGLang server. Training now works like this:

  1. VeRL hands the initial messages + model handle to the user’s LangGraph.

  2. The graph does its thing—branching, retrying, invoking tools—using the exact actor weights being optimized.

  3. When the graph stops, VeRL collects the message history and rewards. 

The PR shows a seven-line YAML snippet that swaps the old rollout for:

yaml
multi_turn:
chat_template_kwargs: {enable_thinking: false} langgraph: path: /path/to/graph.py graph_config: {recursion_limit: 100}

…and a 60-line example graph that binds tools, counts turns, and lets you vary temperature node-by-node. 


5. Why I’m excited

  • One graph to rule them all – deployment and training share code; no more “but it worked in prod!”

  • Easier ablations – want to test a new branch strategy? Edit the graph script; RL pipeline stays untouched.

  • Framework-agnostic future – the same bridge pattern could plug VeRL into OpenAI Function Calling, Microsoft’s AutoGen, or whatever framework wins next year.


My takeaway

VeRL just became a lot more attractive for serious agent RL work. By leaning on LangGraph instead of extending an in-house orchestration DSL, the team keeps VeRL laser-focused on fast rollouts, leaves graph logic to a dedicated library, and—crucially—lets devs iterate on one codebase. If you’re juggling duplicate tool definitions or fighting mismatch between training and production, clone Yanbin’s PR and breathe easier.

Explore it more here: https://jybsuper.github.io/posts/langgraph_rollout/ 

 If you’ve wondered whether general-purpose LLMs can truly reason across medical text and images, a new study out of Emory University says ...