Showing posts with label survey. Show all posts
Showing posts with label survey. Show all posts

1.9.25

Self-evolving AI agents: from static LLMs to systems that learn on the job

 Agent frameworks are great at demo day, brittle in the wild. A sweeping new survey argues the fix isn’t a bigger model but a new self-evolving paradigm: agents that keep improving after deployment using the data and feedback their work naturally produces. The paper pulls scattered ideas under one roof and offers a playbook for researchers and startups building agents that won’t ossify after v1.0. 

The big idea: turn agents into closed-loop learners

The authors formalize a feedback loop with four moving parts—System Inputs, the Agent System, the Environment, and Optimisers—and show how different research threads plug into each stage. Think: collecting richer traces from real use (inputs), upgrading skills or tools (agent system), instrumenting the app surface (environment), and choosing the learning rule (optimisers). 

A working taxonomy you can implement

Within that loop, the survey maps techniques you can mix-and-match:

  • Single-agent evolution: self-reflection, memory growth, tool discovery, skill libraries, meta-learning and planner refinements driven by interaction data.

  • Multi-agent evolution: division-of-labour curricula, role negotiation, and team-level learning signals so collectives improve—not just individuals.

  • Domain programs: recipes specialized for biomed, programming, and finance, where optimization targets and constraints are domain-specific. 

Evaluation and safety don’t lag behind

The paper argues for verifiable benchmarks (exact-match tasks, executable tests, grounded web tasks) so improvements aren’t just prompt luck. It also centers safety and ethics: guarding against reward hacking, data poisoning, distribution shift, and privacy leaks that can arise when models learn from their own usage. 

Why this matters now

  • Static fine-tunes stagnate. Post-training once, shipping, and hoping for the best leaves quality on the table as tasks drift.

  • Logs are learning fuel. Structured traces, success/failure signals, and user edits are free gradients if you design the loop.

  • From demos to durable systems. The framework gives teams a shared language to plan what to learn, when, and how to verify it—before flipping the “autonomous improvement” switch. 

If you’re building an assistant, coder, or web agent you expect to live for months, this survey is a pragmatic roadmap to keep it getting better—safely—long after launch.

Paper link: arXiv 2508.07407 (PDF)

16.8.25

“Speed Always Wins” is the field guide to building faster, cheaper LLMs

 Transformers scaled LLMs to jaw-dropping capabilities—but quadratic attention and ballooning KV caches are throttling real-world deployment. A new survey from Shanghai AI Lab, HKUST(GZ) and collaborators takes stock of what’s next, categorizing the ecosystem of efficient LLM architectures and where each shines. Think of it as a build sheet for teams trying to cut latency and cost without giving up quality. 

The efficiency playbook, in seven parts

  • Linear sequence modeling: from linearized attention to linear RNNs and state-space models that drop KV cache and push complexity toward O(N)

  • Sparse sequence modeling: static, dynamic, and training-free sparsity to compute only the most useful token-token interactions. 

  • Efficient full attention: keep softmax attention but make it practical with IO-aware, grouped, mixture, and quantized attention variants. 

  • Sparse Mixture-of-Experts: routing, expert designs and MoE conversion to grow capacity without proportional FLOPs.

  • Hybrid architectures: inter-layer and intra-layer mixes that blend linear blocks with full attention for a better speed/quality trade-off. 

  • Diffusion LLMs: non-autoregressive generation, bridges back to AR, and early steps to extend diffusion approaches to multimodality. 

  • Beyond text: how these efficiency ideas transfer to vision, audio, and multimodal stacks. 

Why this matters now

Long-context patterns—RAG, agentic tool use, deliberate reasoning, and multimodal inputs—are pushing sequence lengths and memory pressure through the roof. The survey frames these usage patterns and argues that architectural efficiency, not just better prompts or hardware, is the lever that scales the next wave of applications. 

A roadmap, not just a reading list

Beyond taxonomy, the paper stitches trends into a blueprint: pick linear/sparse methods to kill KV bloat, use efficient-full-attention where fidelity matters, layer in MoE for capacity, and consider hybrids or diffusion LLMs where generation style allows. There’s also a companion GitHub “Awesome-Efficient-Arch” list to track the space as it moves. 

If you’re building agents that browse, reason and call tools all day—or multimodal systems juggling video and audio—this survey is a timely map of the fastest lanes through today’s LLM bottlenecks.

Paper link: arXiv 2508.09834 (PDF)

21.7.25

The rise of Context Engineering: why LLM performance now lives and dies on what you feed it

 Prompt tricks and vector databases used to feel like nice-to-have extras for chatbots. A sprawling new study argues they have matured into a discipline of their own. Titled “A Survey of Context Engineering for Large Language Models,” the 165-page report from the Chinese Academy of Sciences, UC Merced and seven other universities positions context selection, shaping and storage as the primary lever for squeezing more capability out of ever-larger models. The team sifted through 1,400-plus research papers to build the first comprehensive roadmap of the space.

From prompt hacks to a three-pillar stack

The authors split Context Engineering into three foundational components:

  1. Context retrieval & generation – everything from classic prompt templates to dynamic external-knowledge acquisition.

  2. Context processing – long-sequence handling, self-refinement loops and multimodal or structured context fusion.

  3. Context management – memory hierarchies, compression schemes and token-budget optimisation.

These pillars support four dominant system archetypes: Retrieval-Augmented Generation (RAG), long-lived memory agents, tool-integrated reasoning (function calling, code execution) and fully fledged multi-agent frameworks.

Why the stakes keep rising

  • Bigger models, harsher limits. Even GPT-class contexts choke on enterprise-scale corpora; smarter pruning and compression decide whether answers stay on-topic or derail.

  • Agents need persistence. As LLM agents stretch across hours or days, hierarchical memory and context-refresh policies become as critical as the policy network itself.

  • Tool use explodes token demand. Function calls and code snippets are powerful but verbose; context engineering keeps them from crowding out the original question.

A looming research gap

Despite dramatic gains in understanding long and complex contexts, models remain weak at generating equally long, logically coherent outputs—a mismatch the survey brands the field’s “defining priority for future research.”

Practical takeaways for builders

  • Treat context like a first-class system resource—budget, cache and monitor it the way you would GPU memory.

  • Mix retrieval styles. Hybrid pipelines (keyword, dense, graph) outperform single-method RAG on complex queries.

  • Plan for multi-layer memory. Short-term windows, episodic buffers and long-term stores each have distinct TTLs and compression trade-offs.

Published July 17 2025 with an accompanying GitHub “awesome list,” the survey is already circulating among infra and agent teams looking to squeeze more mileage out of existing checkpoints before the next trillion-parameter beast lands.

Paper link: arXiv 2507.13334 (PDF)

 Most “agent” papers either hard-code reflection workflows or pay the bill to fine-tune the base model. Memento offers a third path: keep t...