This Is The Closest We’ve Ever Been To AGI
Benchmark shattering intelligence, multimodal reasoning, the most advanced tool use we've seen yet... If this isn't AGI, what is?
Good morning AI entrepreneurs & enthusiasts,
OpenAI has officially released o3 and o4-mini — two cutting-edge reasoning models that OpenAI President Greg Brockman hailed as "closest thing to AGI that humanity has ever seen."
With o3 setting new benchmarks and showing signs of being able to generate original scientific thought, many are wondering: is this the inflection point that finally signals AGI is within reach?
IN TODAY’S AI NEWS:
OpenAI’s o3 & o4-mini + new open-source coding agent
Copilot now directly operates computers
Claude’s new autonomous research mode
Today's Top Tools & Quick News
OpenAI launches o3, o4-mini, and a powerful coding agent
The News: OpenAI just dropped its most powerful reasoning models yet — o3 and o4-mini — marking the biggest leap since GPT-3.5. These models now have access to the complete ChatGPT toolset, and for the first time ever, are capable of advanced multimodal reasoning. OpenAI also introduced Codex CLI, an open-source agent that brings this power to developers' terminals.
The Details:
o3 now holds SOTA performance, scoring over 99% on AIME math benchmarks and achieving more than a 25% success rate on "humanity’s last exam."
o3 redefines benchmarks with SWE-bench Verified jumping from 48.9% to 71.7%, a Codeforces Elo of 2727, and 12x gains in advanced math — thanks to a new architecture leveraging both training-time and test-time compute.
Both models now operate as fully agentic systems - o3 uses tools, executes workflows, and integrates multimodal reasoning. In a live demo, it executed over 600 tool calls in a single response.
o3 is built for precision and deep workflows at $10/$40 per million tokens, while o4-mini offers scalable multimodal reasoning at $1.10/$4.40. Each serves a different purpose in your AI stack.
These are the first models to genuinely reason with images, integrating visual inputs into their cognitive workflows.
Why it matters: This isn’t just a product release — it’s a generational inflection point. o3 brings OpenAI dangerously close to AGI territory, merging top-tier accuracy with fully agentic reasoning and real-world tool use. For the first time, we’re seeing a model that doesn’t just think — it acts. This drop redefines what’s possible with AI in 2025 and beyond.
What do you think? Does o3 cross the ‘AGI threshold’?
Copilot learns to control your desktop
The News: Microsoft's Copilot Studio just gained a major new feature: "computer use", enabling agents to interact with desktop software and websites — effectively turning AI into a full UI automation agent.
The Details:
Copilot agents can now click buttons, select menus, type into fields, and navigate software interfaces — even in the absence of APIs.
The system uses real-time reasoning to adapt to changing UIs, minimizing manual script updates.
All automation runs on Microsoft's cloud infrastructure, and enterprise data is protected from model training, ensuring compliance and privacy.
Why it matters: This marks a leap beyond traditional RPA. With Copilot Studio now automating any GUI-based task and adapting in real time, Microsoft enters the agentic AI arena, directly competing with OpenAI and Anthropic. Businesses can now automate complex workflows, including legacy desktop apps and modern web tools, without requiring technical expertise or connectors.
Claude now conducts autonomous research
The News: Anthropic’s Claude now features autonomous research capabilities and deep Google Workspace integration, enabling it to intelligently retrieve information from both the web and internal documents.
The Details:
Claude can now conduct multi-step research, combining web and document searches to deliver comprehensive, cited responses.
Through Google Workspace, Claude can access Gmail, Docs, and Calendar, synthesizing relevant data without manual uploads.
Enterprise users benefit from advanced RAG-style document retrieval, helping surface key insights across entire file ecosystems.
The features are in beta for paid users across the U.S., Brazil, and Japan.
Why it matters: With this rollout, Claude is now a fully agentic research assistant — streamlining workflows by automating information retrieval and maintaining enterprise-grade privacy. Anthropic has effectively narrowed the gap with competitors like OpenAI, Microsoft, and Google, reinforcing Claude’s role in the AI productivity race.
Today's Top Tools
KLING 2.0 Master – Enhanced prompt-to-video generation
Grok Studio – Collaborative doc editing via AI canvas
Embed 4 – Cohere’s new enterprise-grade multimodal search engine
Quick News
OpenAI is in talks to acquire Windsurf (formerly Codeium) in a deal valued at up to $3B.
Tencent launches FireEdit, an image editing AI with region-aware VLMs.
Claude’s upcoming voice mode will offer three distinct AI voices: Airy, Mellow, and Buttery.
Metr’s latest o3 analysis suggests OpenAI may have shortened safety testing timelines.
Economist Tyler Cowen claims o3 meets the threshold for AGI, suggesting April 16 could mark the dawn of a new era.