Showing posts with label knowledge graphs. Show all posts
Showing posts with label knowledge graphs. Show all posts

1.9.25

MIRAGE: parallel GraphRAG turns test-time scaling into a team sport

 Most test-time scaling schemes still walk a single, linear chain of thought—great until an early mistake snowballs. MIRAGE (Multi-chain Inference with Retrieval-Augmented Graph Exploration) swaps that for many chains in parallel, each grounded in a medical knowledge graph and then cross-checked before answering. Think of it as ToT’s breadth, Search-o1’s retrieval, and GraphRAG’s structure—rolled into one pipeline. 

How it works (and why it’s different)

  • Entity-grounded decomposition. The system splits a clinical question into sub-questions tied to concrete entities (symptoms, diseases, treatments). Each sub-question spawns its own reasoning chain

  • Graph-based retrieval, two modes.

    • Anchor mode: query the KG around a single entity (local neighborhood).

    • Bridge mode: search paths between entity pairs to surface multi-hop relations. 

  • Adaptive evidence streaming. Chains iteratively expand neighbors/multi-hop trails, keeping only deduplicated, directionally relevant facts. 

  • Cross-chain verification. An answer synthesizer reconciles sub-answers, prefers explanations backed by broader, independent chains, and normalizes clinical terms—cutting contradictions and hallucinations. Outputs are serialized with full provenance traces for audit. 

Benchmarks: consistent wins over strong baselines

Evaluated on GenMedGPT-5k, CMCQA, and ExplainCPE (with paired medical KGs), MIRAGE tops GPT-4o, GPT-4o+ToT, QWQ-32B, MindMap (GraphRAG), and Search-o1 across GPT-4o ranking and/or accuracy. Highlights:

  • GenMedGPT-5k: best GPT-4o rank 1.8 (lower is better). 

  • CMCQA: rank 2.8, edging ToT, MindMap, and Search-o1. 

  • ExplainCPE: 84.8% accuracy vs GPT-4o 77.8%, Search-o1 80.7%, MindMap 84.6%

Swapping the backbone to DeepSeek-R1-32B preserves the lift (ExplainCPE 84.4%), suggesting MIRAGE is model-agnostic. A human study on GenMedGPT-5k prefers MIRAGE over all baselines, mirroring GPT-4o’s ranking. 

What moved the needle

  • Structured retrieval beats flat text. Graph-aware exploration is more stable than BM25/dense retrieval and less noisy than web-first Search-o1 on medical tasks. 

  • Right-sizing the knobs. Increasing the decomposition threshold (Nq) and retrieval depth (Nr) improves rank/accuracy up to a point—useful guidance for real deployments. 

  • Ablations matter. Removing the Question Decomposer or Answer Synthesizer drops win rates in GPT-4o pairwise tests, confirming both stages carry weight. 

Why it matters

Linear chains waste compute on dead ends; MIRAGE parallelizes exploration, grounds every claim in KG paths, and verifies across chains before speaking—exactly the traits clinicians and auditors want. The approach is plug-and-play with modern LRMs (QWQ-32B, DeepSeek-R1) and slots cleanly into safety-critical, knowledge-heavy domains beyond medicine.

Paper link: arXiv 2508.18260 (PDF)

21.7.25

Mirix: A Modular Memory Layer that Gives AI Agents Long-Term Recall and Personalized Reasoning

 

1 | Why “Memory” Is the Next AI Bottleneck

Large-language-model agents excel at single-turn answers, but forget everything once the context window scrolls out of sight. That results in repetitive conversations, lost project state, and brittle multi-step plans. Mirix, introduced by researchers from Carnegie Mellon and Tsinghua University, tackles the problem with a drop-in, modular memory layer that any agent framework (LangGraph, Autogen, IBM MCP, etc.) can call.


2 | How Mirix Works under the Hood

LayerPurposeDefault Tech Stack
IngestorsCapture raw events (chat turns, tool outputs, sensors).Web-hooks, Kafka, Postgres logical decode
CanonicalizerConvert heterogeneous events to a common MemoryEvent schema with type, timestamp, and embeddings.Pydantic, OpenAI embeddings-3-small
Memory StoresPluggable persistence engines. Ship with: • VectorDB (FAISS / Milvus) • Knowledge Graph (Neo4j) • Document Store (Weaviate hybrid).Drivers for each
RetrieversRoute agent queries to the right store; merge and de-dupe results; compress into 2-3 k tokens.Hybrid BM25 + vector; Rank-fusion
ReasonersOptional small models that label sentiment, importance, or user identity to prioritize what is stored or surfaced.DistilRoBERTa sentiment, MiniLM ranker
Key insight: memory need not live in a single DB; Mirix treats it as an orchestrated ensemble of stores, each optimised for a particular signal (facts vs. tasks vs. social cues).

3 | What It Enables

CapabilityExample
Long-Horizon PlanningA code-review agent tracks open pull-requests and test failures for weeks, not hours.
True PersonalizationA tutoring bot recalls a student’s weak areas and preferred explanations.
Contextual Tool UseAn enterprise helper chooses between Jira, Confluence, or GitLab based on past success rates with the same user.

Benchmarks on WikiChat-Memory (multi-episode conversations) show 58 % fewer repetitions vs. vanilla RAG and 3.4 × higher success on 15-step task chains.

4 | Plugging Mirix into an Existing Agent


from mirix.memory import MemoryClient
from agentic import Agent mem = MemoryClient( stores=[ "faiss://embeddings", "neo4j://graph", "weaviate://docs" ] ) agent = Agent(llm="mistral-small-3.2", memory=mem) response = agent.chat("Where did we leave the migration script last week?") print(response)

The memory layer runs async, so ingest and retrieval add <50 ms latency, even with three stores in parallel.


5 | Governance & Cost Controls

  • Policy Filters: PII redaction rules determine what is persisted.

  • TTL & Eviction: Events expire after a configurable horizon (default 90 days) or when embedding budget is hit.

  • Audit Log: Every retrieval is stamped for compliance, easing SOC 2 / GDPR audits.


6 | Limitations & Roadmap

  • Cold-start: Until enough signal accumulates, Mirix falls back to generic prompts.

  • Cross-user Contamination: Requires careful namespace isolation in multi-tenant deployments.

  • Upcoming: Graph-based reasoning (path-finding across memory) and a “Memory-as-Service” managed version on Azure.


Final Takeaway

Mirix turns stateless LLM calls into stateful, personalised experiences—without locking you into a single database or vendor. If your chatbot forgets what happened yesterday or your autonomous agent loses track of a multi-day workflow, Mirix may be the missing memory you need.

 Most “agent” papers either hard-code reflection workflows or pay the bill to fine-tune the base model. Memento offers a third path: keep t...