Showing posts with label healthcare AI. Show all posts
Showing posts with label healthcare AI. Show all posts

8.8.25

GPT-5 Arrives: A Quantum Leap or an Incremental Step Toward Everyday AGI?

 OpenAI CEO Sam Altman opened the launch keynote with a statistic that still jolts me: 700 million weekly ChatGPT users. If accurate, that is the fastest adoption curve of any software platform in history. Altman framed GPT-5 as the model that finally feels like “talking to a PhD-level expert in anything,” capable of planning a birthday party, writing a full software stack, or parsing biopsy results in seconds. As someone who has lived through GPT-3’s flashes of brilliance and GPT-4o’s solid utility, I’m impressed by the live demos—particularly the on-the-fly 3-D castle game and the finance dashboard spun up in minutes. Yet part of me wonders how often real-world edge-cases will still trip the model, PhD metaphors aside.

Reasoning + Speed = Default
One genuine breakthrough is that GPT-5 merges OpenAI’s slow “reasoning models” and fast “standard models” into a single pipeline. The system decides—dynamically—how much chain-of-thought to spend on each request. As a developer, I love the promise of no more model-picker gymnastics. But the skeptic in me notes that latency remains physics-bound; the keynote glossed over how much extra compute the “perfect amount of thinking” really burns.

Safer, but Still a Work in Progress
Safety lead Saachi emphasized safe completions: instead of the binary comply/refuse we’ve grown used to, GPT-5 offers partial, contextual answers plus policy pointers. I applaud the nuance (the potassium perchlorate fireworks example was spot-on), and early physician-audited benchmarks suggest lower hallucination rates. Still, bi-modal safety often fails at scale. Until we see longitudinal data from millions of prompts, I reserve judgment on whether “significantly less deceptive” translates into materially fewer bad outcomes.

Coding Superpowers—and Benchmarks That May Be Peaking
On SWEBench, GPT-5 posts 74.9 %—state-of-the-art by a wide margin—and Cursor’s integration shows real autonomy: the model searches code, patches errors after compiling, and writes explanatory READMEs. That’s developer candy. Yet I can’t ignore Michael Truell’s aside that models are saturating classic evals. When a leaderboard hits 99 %, the next delta in usefulness won’t come from marginal accuracy boosts; it will come from deeper tool integration, live debugging, and sustained multi-day agent runs—areas GPT-5 only begins to address.

Health and Personalization
The on-stage story of Carolina using GPT-5 to weigh radiation options was moving and highlights the model’s strength as a patient advocate. Free-tier voice chat, Gmail/calendar integration, and memory all point toward a more personal assistant future. My worry is data consent and provenance: when GPT-5 merges personal email with medical queries, the privacy surface expands dramatically. OpenAI’s policies will need the same iterative care the model architecture received.

What I’m Excited About—and Watching Carefully
I love the 400 K context window, the new “minimal reasoning” knob for latency-sensitive tasks, and regular-expression-constrained outputs. Those are practical, developer-driven wins. I’m less convinced by the AGI framing; Altman downplayed compute bottlenecks and energy costs, and benchmark fatigue is real. GPT-5 feels like the best general-purpose model we’ve seen—but whether it inaugurates a “team of experts in your pocket” or reveals the limits of current scaling will depend on how it behaves over the next billion prompts.

Overall, GPT-5 is a thrilling upgrade—smarter, faster, and more context-aware. Just remember: even PhD-level experts can be confidently wrong, and the same will be true for the most intuitive model yet.

Anthropic launched Fast Mode for Claude Opus 4.6 in research preview. The feature delivers up to 2.5x higher output tokens per second from t...