Wandering Nomad

22.7.25

Building Startups at the Speed of AI: Key Takeaways from Andrew Ng’s Startup School Talk

1 Speed Is the Leading Indicator of Success

At AI Fund, Andrew Ng’s venture studio, teams launch roughly one startup a month. After hundreds of “in-the-weeds” reps, Ng sees a clear pattern: the faster a founding team can execute and iterate, the higher its survival odds. Speed compounds—small delays in shipping, learning, or pivoting quickly snowball into lost market share.

2 The Biggest Opportunities Live in the Application Layer

Much of the media hype sits with semiconductors, hyperscalers, or foundation-model vendors. Yet the lion’s share of value has to accumulate at the application layer—products that create revenue and, in turn, pay the upstream providers. For AI enthusiasts, building real workflows that users love is still the clearest path to outsized impact.

3 Agentic AI Unlocks Quality (at the Cost of Raw Latency)

Traditional prompting forces a language model to produce output linearly, “from the first word to the last without backspace.” Agentic AI flips that paradigm: outline → research → draft → critique → revise. The loop is slower but consistently yields far more reliable results—crucial for domains such as compliance review, medical triage, or legal reasoning. Ng sees an entire orchestration layer emerging to manage these multi-step agents.

4 Concrete Ideas Trump Grand Generalities

“Use AI to optimize healthcare assets” sounds visionary but is impossible to execute. “Let hospitals book MRI slots online to maximize scanner utilization” is concrete—an engineer can sprint on it this afternoon, gather user feedback, and prove or disprove the hypothesis fast. Vague ideas feel safe because they’re rarely wrong; concrete ideas create momentum because they’re immediately testable.

5 AI Coding Assistants Turn One-Way Doors into Two-Way Doors

With tools like Claude-Code, Cursor, and GitHub Copilot, rapid prototyping is 10× faster and radically cheaper. Entire codebases can be rebuilt in days—a shift that converts many architecture decisions from irreversible “one-way doors” into reversible “two-way doors.” The result: startups can afford to explore 20 proof-of-concepts, discard 18, and double-down on the two that resonate.

6 Product Management Becomes the New Bottleneck

When engineering accelerates, the slowest link becomes deciding what to build. Ng’s teams now experiment with PM-to-engineer ratios as high as 2 PMs per 1 engineer. Tactics for faster feedback range from gut checks and coffee-shop usability tests to 100-user beta cohorts and AB tests—each slower but richer in insight than the last. Crucially, teams should use every data point not just to pick a variant but to sharpen their intuition for the next cycle.

7 Everyone Should Learn to Code—Yes, Everyone

Far from replacing programmers, AI lowers the barrier to software creation. Ng’s CFO, recruiters, and even front-desk staff all write code; each role levels up by automating its own drudgery. The deeper you can “tell a computer exactly what you want,” the more leverage you unlock—regardless of your title.

8 Stay Current or Chase Dead Ends

AI is moving so quickly that a half-generation lag in tools can cost months. Knowing when to fine-tune versus prompt, when to swap models, or how to mix rag, guardrails, and evals often spells the difference between a weekend fix and a three-month rabbit hole. Continuous learning—through courses, experimentation, and open-source engagement—remains a decisive speed advantage.

Bottom line: In the age of agentic AI, competitive moats are built around execution velocity, not proprietary algorithms alone. Concrete ideas, lightning-fast prototypes, disciplined feedback loops, and a culture where everyone codes form the core playbook Andrew Ng uses to spin up successful AI startups today.

Qwen3-235B-A22B-Instruct-2507: Alibaba’s New Open-Weight Flagship Redefines Efficient Megamodels

When the Qwen team hit “post” on X announcing Qwen3-235B-A22B-Instruct-2507—plus a lightweight FP8 variant—the tweet felt less like routine release notes and more like a thunderclap across AI Twitter. The thread promised “better across the board” performance and immediate open-weights access, positioning Qwen as the most aggressive big-model vendor in the open ecosystem.

Inside the Model

Under the hood, the new model keeps the mixture-of-experts (MoE) recipe that made earlier Qwen3 builds special: 128 experts, but only 8 fire on each forward pass, so just 22 B parameters are active even though the full network tops out at 235 B. That efficiency allows 256 K tokens of native context and enables consumer-grade deployments that once demanded datacenter GPUs.

Benchmark Shockwaves

Numbers published with the release show why the community’s jaw dropped. On the notoriously tricky ARC-AGI benchmark, Qwen3-235B-A22B-Instruct-2507 scores 41.8 %, eclipsing Moonshot’s freshly minted Kimi K2 by nearly 29 points and edging ahead of Claude Opus 4 in non-thinking mode. Coding (LiveCodeBench v6) jumps to 51.8 %, and reasoning tasks like AIME25 leap to 70.3 %. In most rows of the evaluation table, the new Qwen flags sit comfortably ahead of DeepSeek-V3, o3-mini, and OpenAI’s o1 reference.

Why an FP8 Build Matters

Alongside the bf16 release, Alibaba published a fully FP8-quantised version. Dropping to eight-bit floats slashes VRAM by roughly 40 % while preserving accuracy, paving the way for single-GPU inference or even multi-GPU laptop rigs. Apache-2.0 licensing means startups can bake the FP8 weights directly into commercial products without costly negotiations.

Community Reception: K2 Who?

Reddit’s r/singularity lit up within minutes: “Kimi K2 is already irrelevant,” read the top-voted post, linking to the Qwen tweet and highlighting the model’s 4.2× smaller total size yet broader win-rate. Analysts on Interconnects echoed the sentiment, framing the drop as part of a summer in which Chinese labs “continue to dominate” the open-weight leaderboard and openly court Western builders.

Beyond Benchmarks: Agentic DNA

Qwen3’s team stresses that the instruct model is tuned for tool-calling and agent workflows. The official model card shows code snippets for integrating with Qwen-Agent and MCP config files, underscoring Alibaba’s push toward practical automation at 262 K-token scale—think mega-docs, legal contracts or multi-day chat histories without windowing hacks.

Why It Matters

Qwen3-235B-A22B-Instruct-2507 sets a new bar for “open yet frontier-grade.” By decoupling “thinking” and “non-thinking” modes into separate models, Alibaba embraced community feedback while sidestepping latency complaints. The result is a release that:

outperforms larger proprietary models on knowledge, reasoning, and multilingual tests;
ships under a permissive license;
arrives in both bf16 and FP8 flavors for hobbyists and enterprises alike;
proves that giant MoEs can be resource-friendly—and, crucially, available today.

For AI enthusiasts and builders, the message is clear: grab the weights, spin up your agent stack, and see how far 22 B active parameters can take you. The open-source race just found a new pacesetter.

Gemini “Deep Think” Hits Gold-Medal Performance at the International Mathematical Olympiad

From Silver to Gold in Twelve Months

Last year, DeepMind’s AlphaGeometry and AlphaProof systems collectively solved four of six IMO problems, earning a silver-medal equivalent. In July 2025 the research team leap-frogged that result: an advanced version of Gemini running in “Deep Think” mode solved five of six tasks for 35 points—crossing the 2025 gold-medal threshold and setting a new AI milestone.

International coordinators graded Gemini’s written solutions using the same rubric applied to student competitors. According to IMO President Gregor Dolinar, the proofs were “clear, precise, and, in several cases, easy to follow”.

What Makes Deep Think Different?

Technique	Purpose	Impact on Performance
Parallel Thinking	Explores multiple proof avenues simultaneously, then merges the strongest ideas.	Avoids dead-end, single-thread chains of thought.
Reinforcement-Learning Fine-Tune	Trains on curated theorem-proving and problem-solving data with reward signals for conciseness and rigor.	Raises success rate on multi-step reasoning challenges.
High-Quality Solution Corpus	Ingests expertly written IMO proofs plus heuristic “tips & tricks.”	Gives the model stylistic and structural templates for clearer presentation.

These upgrades let Gemini run longer “scratch-pads” internally while staying within a feasible compute budget—no multi-day cluster runs were required, unlike earlier systems.

Benchmark Significance

35 / 42 points → comparable to a top-25-percent human gold medalist.
Perfect scores on five problems; only one combinatorics task eluded the model.
Order-of-magnitude speed-up vs. AlphaGeometry 2 + AlphaProof, which needed days of inference in 2024.

While specialized theorem solvers have mastered narrow domains, Gemini Deep Think is a general LLM—capable of chat, code, and multimodal tasks—now showing elite mathematical reasoning.

Broader Implications

Curriculum Design for AI
Gemini’s success underscores the value of domain-targeted reinforcement learning on top of large-scale pre-training.
Parallel Thinking as a New Primitive
Instead of a single “chain of thought,” future models may default to branch-and-merge reasoning, akin to how human teams brainstorm proofs.
Human–AI Collaboration
DeepMind notes the technique could become a “proof assistant” for mathematicians—surfacing lemmas or counter-examples at gold-medal quality within minutes.
Educational Outreach
Publishing the solutions provides a free study resource for aspiring IMO contestants and teachers, potentially leveling the global playing field.

Limitations & Next Steps

Interpretability: Despite clearer written proofs, the internal decision tree remains opaque—researchers are now probing why certain branches survive the merge.
Generalization: Performance on under-represented areas (e.g., functional equations) still lags; future training will widen topic coverage.
Trust & Verification: Formal proof checkers like Lean are being integrated to machine-verify each Gemini output before publication.

DeepMind plans to open selected Deep Think capabilities via its Gemini API later this year, with safeguards to prevent misuse in academic competitions.

Key Takeaway

Gemini Deep Think’s gold-medal performance doesn’t just raise the bar for AI mathematics—it redefines what general-purpose language models can achieve when armed with structured parallel reasoning and tailored RL training. The achievement brings researchers a step closer to AI systems that can tackle longstanding open problems and act as partner mathematicians rather than mere calculators.