22.5.25

Google Unveils MedGemma: Advanced Open-Source AI Models for Medical Text and Image Comprehension

 At Google I/O 2025, Google announced the release of MedGemma, a collection of open-source AI models tailored for medical text and image comprehension. Built upon the Gemma 3 architecture, MedGemma aims to assist developers in creating advanced healthcare applications by providing robust tools for analyzing medical data. 

MedGemma Model Variants

MedGemma is available in two distinct versions, each catering to specific needs in medical AI development:

  • MedGemma 4B (Multimodal Model): This 4-billion parameter model integrates both text and image processing capabilities. It employs a SigLIP image encoder pre-trained on diverse de-identified medical images, including chest X-rays, dermatology, ophthalmology, and histopathology slides. This variant is suitable for tasks like medical image classification and interpretation. 

  • MedGemma 27B (Text-Only Model): A larger, 27-billion parameter model focused exclusively on medical text comprehension. It's optimized for tasks requiring deep clinical reasoning and analysis of complex medical literature. 

Key Features and Use Cases

MedGemma offers several features that make it a valuable asset for medical AI development:

  • Medical Image Classification: The 4B model can be adapted for classifying various medical images, aiding in diagnostics and research. 

  • Text-Based Medical Question Answering: Both models can be utilized to develop systems that answer medical questions based on extensive medical literature and data.

  • Integration with Development Tools: MedGemma models are accessible through platforms like Google Cloud Model Garden and Hugging Face, and are supported by resources such as GitHub repositories and Colab notebooks for ease of use and customization. 

Access and Licensing

Developers interested in leveraging MedGemma can access the models and related resources through the following platforms:

The use of MedGemma is governed by the Health AI Developer Foundations terms of use, ensuring responsible deployment in healthcare settings.

Google's Stitch: Transforming App Development with AI-Powered UI Design

 Google has introduced Stitch, an experimental AI tool from Google Labs designed to bridge the gap between conceptual app ideas and functional user interfaces. Powered by the multimodal Gemini 2.5 Pro model, Stitch enables users to generate UI designs and corresponding frontend code using natural language prompts or visual inputs like sketches and wireframes. 

Key Features of Stitch

  • Natural Language UI Generation: Users can describe their app concepts in plain English, specifying elements like color schemes or user experience goals, and Stitch will generate a corresponding UI design. 

  • Image-Based Design Input: By uploading images such as whiteboard sketches or screenshots, Stitch can interpret and transform them into digital UI designs, facilitating a smoother transition from concept to prototype. Google Developers Blog

  • Design Variations: Stitch allows for the generation of multiple design variants from a single prompt, enabling users to explore different layouts and styles quickly. 

  • Integration with Development Tools: Users can export designs directly to Figma for further refinement or obtain the frontend code (HTML/CSS) to integrate into their development workflow. 

Getting Started with Stitch

  1. Access Stitch: Visit stitch.withgoogle.com and sign in with your Google account.

  2. Choose Your Platform: Select whether you're designing for mobile or web applications.

  3. Input Your Prompt: Describe your app idea or upload a relevant image to guide the design process.

  4. Review and Iterate: Examine the generated UI designs, explore different variants, and make adjustments as needed.

  5. Export Your Design: Once satisfied, export the design to Figma or download the frontend code to integrate into your project.

Stitch is currently available for free as part of Google Labs, offering developers and designers a powerful tool to accelerate the UI design process and bring app ideas to life more efficiently.

Google Unveils Next-Gen AI Innovations: Veo 3, Gemini 2.5, and AI Mode

 At its annual I/O developer conference, Google announced a suite of advanced AI tools and models, signaling a major leap in artificial intelligence capabilities. Key highlights include the introduction of Veo 3, an AI-powered video generator; Gemini 2.5, featuring enhanced reasoning abilities; and the expansion of AI Mode in Search to all U.S. users. 

Veo 3: Advanced AI Video Generation

Developed by Google DeepMind, Veo 3 is the latest iteration of Google's AI video generation model. It enables users to create high-quality videos from text or image prompts, incorporating realistic motion, lip-syncing, ambient sounds, and dialogue. Veo 3 is accessible through the Gemini app for subscribers of the $249.99/month AI Ultra plan and is integrated with Google's Vortex AI platform for enterprise users. 

Gemini 2.5: Enhanced Reasoning with Deep Think

The Gemini 2.5 model introduces "Deep Think," an advanced reasoning mode that allows the AI to consider multiple possibilities simultaneously, enhancing its performance on complex tasks. This capability has led to impressive scores on benchmarks like USAMO 2025 and LiveCodeBench. Deep Think is initially available in the Pro version of Gemini 2.5, with broader availability planned. 

AI Mode in Search: Personalized and Agentic Features

Google's AI Mode in Search has been rolled out to all U.S. users, offering a more advanced search experience with features like Deep Search for comprehensive research reports, Live capabilities for real-time visual assistance, and personalization options that incorporate data from users' Google accounts. These enhancements aim to deliver more relevant and context-aware search results.

Karpathy doesn't use a fancy app to manage his research. He uses a folder, Obsidian, and an AI — and I want to copy it. He posted about ...