Showing posts with label function calling. Show all posts
Showing posts with label function calling. Show all posts

16.7.25

Mistral AI Introduces Voxtral — Open-Source Speech Models that Transcribe, Summarize and Act on Audio in Real Time

 

🎧 What Mistral Just Shipped

French startup Mistral AI has expanded beyond text with Voxtral, a pair of open-weight speech models—Voxtral Small and Voxtral Mini—designed for fast, accurate transcription and audio-aware chat. The launch positions Voxtral as an open alternative to OpenAI Whisper and Google Gemini’s voice modes. 

  • Context Length: 32 k tokens (≈ 40 minutes of speech)

  • Languages: English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian and more

  • Licensing: Apache 2.0 — free for commercial use

  • Deployments: Available via Mistral API or self-hosted binaries 


🧠 Key Capabilities

CapabilityWhat It Means
High-Fidelity TranscriptionUp to 30-minute files in a single call; optimized for noisy, real-world audio 
Spoken Q&A & SummariesUsers can ask questions about the recording or request concise overviews immediately after upload 
Function CallingVoice commands can trigger APIs or local automations (e.g., “Create a Jira ticket for this bug”) without extra agent code 
Lightweight “Mini” VariantRuns on edge devices for private, offline captioning or voice assistants; same API schema 

🔬 Under the Hood

Voxtral builds on a VLM-enhanced version of Mistral Small 3.2, pairing a convolutional audio encoder with the company’s long-context LLM backbone. Sliding-window attention plus quantization keeps inference under 2 GB VRAM for the Mini model, enabling smartphone or Jetson deployments without cloud latency. 


📊 Early Benchmarks

Task (open test set)Whisper Large-V3Gemini 2.5 VoiceVoxtral Small
LibriSpeech test-clean WER1.7 %1.6 %1.5 %
Common Voice 11 (avg.)7.2 %6.8 %6.5 %
Multilingual TEDx (8 langs)9.4 %9.1 %8.8 %

Numbers from Mistral’s internal evaluation, shared in the release notes. 

🚀 Developer On-Ramp


pip install mistralai from mistralai.client import MistralClient client = MistralClient(api_key="YOUR_KEY") audio = open("meeting.wav","rb").read() resp = client.chat( model="voxtral-small-latest", audio=audio, messages=[{"role":"user","content":"Give me action items"}] ) print(resp.choices[0].message.content)

Both voxtral-small-latest and voxtral-mini-latest share the chat endpoint; a dedicated /transcribe route streams plain-text results for cost-sensitive jobs. 


🌍 Real-World Use Cases

  • Meeting Assistants – Live note-taking, summarization and follow-up email drafts

  • Hands-Free DevOps – Voice-triggered MCP tools: “Deploy staging,” “Rollback API v2”

  • Media Captioning – Low-latency, multilingual subtitles for podcasts or YouTube creators

  • Edge Compliance Monitors – On-prem transcription + keyword spotting for regulated industries


🛣️ Roadmap & Community

Mistral hints at Voxtral-X (vision-speech multimodal) and a 128 k-context Voxtral-Pro later this year, plus native support in the company’s forthcoming Magistral agent framework. The team invites PRs for language adapters and domain-specific fine-tunes on GitHub. 


Takeaway: With Voxtral, Mistral AI brings open, high-quality voice intelligence to the masses—letting developers transcribe, understand and act on audio with the same simplicity they enjoy for text. For anyone building call-center analytics, wearable assistants or real-time translators, Voxtral offers GPT-grade performance without the proprietary lock-in.

21.6.25

Mistral Elevates Its 24B Open‑Source Model: Small 3.2 Enhances Instruction Fidelity & Reliability

 Mistral AI has released Mistral Small 3.2, an optimized version of its open-source 24B-parameter multimodal model. This update refines rather than reinvents: it strengthens instruction adherence, improves output consistency, and bolsters function-calling behavior—all while keeping the lightweight, efficient foundations of its predecessor intact.


🎯 Key Refinements in Small 3.2

  • Accuracy Gains: Instruction-following performance rose from 82.75% to 84.78%—a solid boost in model reliability.

  • Repetition Reduction: Instances of infinite or repetitive responses dropped nearly twofold (from 2.11% to 1.29%)—ensuring cleaner outputs for real-world prompts.

  • Enhanced Tool Integration: The function-calling interface has been fine-tuned for frameworks like vLLM, improving tool-use scenarios.


🔬 Benchmark Comparisons

  • Wildbench v2: Nearly 10-point improvement in performance.

  • Arena Hard v2: Scores jumped from 19.56% to 43.10%, showcasing substantial gains on challenging tasks.

  • Coding & Reasoning: Gains on HumanEval Plus (88.99→92.90%) and MBPP Pass@5 (74.63→78.33%), with slight improvements in MMLU Pro and MATH.

  • Vision benchmarks: Small trade-offs: overall vision score dipped from 81.39 to 81.00, with mixed results across tasks.

  • MMLU Slight Dip: A minor regression from 80.62% to 80.50%, reflecting nuanced trade-offs .


💡 Why These Updates Matter

Although no architectural changes were made, these improvements focus on polishing the model’s behavior—making it more predictable, compliant, and production-ready. Notably, Small 3.2 still runs smoothly on a single A100 or H100 80GB GPU, with 55GB VRAM needed for full-floating performance—ideal for cost-sensitive deployments.


🚀 Enterprise-Ready Benefits

  • Stability: Developers targeting real-world applications will appreciate fewer unexpected loops or halts.

  • Precision: Enhanced prompt fidelity means fewer edge-case failures and cleaner behavioral consistency.

  • Compatibility: Improved function-calling makes Small 3.2 a dependable choice for agentic workflows and tool-based LLM work.

  • Accessible: Remains open-source under Apache 2.0, hosted on Hugging Face with support in frameworks like Transformers & vLLM.

  • EU-Friendly: Backed by Mistral’s Parisian roots and compliance with GDPR/EU AI Act—a plus for European enterprises.


🧭 Final Takeaway

Small 3.2 isn’t about flashy new features—it’s about foundational refinement. Mistral is doubling down on its “efficient excellence” strategy: deliver high performance, open-source flexibility, and reliability on mainstream infrastructure. For developers and businesses looking to harness powerful LLMs without GPU farms or proprietary lock-in, Small 3.2 offers a compelling, polished upgrade.

9.6.25

Enable Function Calling in Mistral Agents Using Standard JSON Schema

 This updated tutorial guides developers through enabling function calling in Mistral Agents via the standard JSON Schema format Function calling allows agents to invoke external APIs or tools (like weather or flight data services) dynamically during conversation—extending their reasoning capabilities beyond text generation.


🧩 Why Function Calling?

  • Seamless tool orchestration: Enables agents to perform actions—like checking bank interest rates or flight statuses—in real time.

  • Schema-driven clarity: JSON Schema ensures function inputs and outputs are well-defined and type-safe.

  • Leverage MCP Orchestration: Integrates with Mistral's Model Context Protocol for complex workflows 


🛠️ Step-by-Step Implementation

1. Define Your Function

Create a simple API wrapper, e.g.:

python
def get_european_central_bank_interest_rate(date: str) -> dict: # Mock implementation returning a fixed rate return {"date": date, "interest_rate": "2.5%"}

2. Craft the JSON Schema

Define the function parameters so the agent knows how to call it:

python
tool_def = { "type": "function", "function": { "name": "get_european_central_bank_interest_rate", "description": "Retrieve ECB interest rate", "parameters": { "type": "object", "properties": { "date": {"type": "string"} }, "required": ["date"] } } }

3. Create the Agent

Register the agent with Mistral's SDK:

python
agent = client.beta.agents.create( model="mistral-medium-2505", name="ecb-interest-rate-agent", description="Fetch ECB interest rate", tools=[tool_def], )

The agent now recognizes the function and can decide when to invoke it during a conversation.

4. Start Conversation & Execute

Interact with the agent using a prompt like, "What's today's interest rate?"

  • The agent emits a function.call event with arguments.

  • You execute the function and return a function.result back to the agent.

  • The agent continues based on the result.

This demo uses a mocked example, but any external API can be plugged in—flight info, weather, or tooling endpoints 


✅ Takeaways

  • JSON Schema simplifies defining callable tools.

  • Agents can autonomously decide if, when, and how to call your functions.

  • This pattern enhances Mistral Agents’ real-time capabilities across knowledge retrieval, action automation, and dynamic orchestration.

 If large language models have one redeeming feature for safety researchers, it’s that many of them think out loud . Ask GPT-4o or Claude 3....