OpenAI o3 achieved human-level results
GPT-5 hits a wall
Fed’s Daly and AI-driven productivity
Savvy Investors Know Where to Get Their News—Do You?
Here’s the truth: Most financial media focuses on clicks, not clarity. The Daily Upside, created by Wall Street insiders, delivers valuable insights—free. Join 1M+ readers and subscribe today!
Join 1M+ readers and subscribe today!
***
OpenAI o3 achieved human-level results
Sam Altman says OpenAI’s new o3 ‘reasoning’ models begin the ‘next phase’ of AI. OpenAI's o3 AI achieved a remarkable 85% score on the ARC-AGI benchmark, significantly surpassing the previous AI best of 55% and matching the average human score. Additionally, the o3 model excelled in a challenging mathematics test, demonstrating human-level performance on a measure of "general intelligence." The ARC-AGI test evaluates an AI system's "sample efficiency" in adapting to new situations—essentially, how quickly it can learn from limited examples. This ability to generalize from minimal data is considered a crucial aspect of intelligence.
This new system is the successor to o1, which was launched earlier this year. According to OpenAI, o3 outperformed o1 by over 20 percent in a series of common programming tasks and even surpassed the company's chief scientist, Jakub Pachocki, in a competitive programming test. OpenAI plans to make this technology available to individuals and businesses early next year.
This development is part of a broader initiative to create AI systems capable of reasoning through intricate tasks. Earlier this week, Google introduced a similar technology, Gemini 2.0 Flash Thinking Experimental, to a select group of testers.
OpenAI developed o3 using "reinforcement learning," a process where the system learns through extensive trial and error. By solving numerous math problems, the system identifies which techniques yield correct answers and which do not, allowing it to recognize patterns through repeated practice....
***
Learn AI in 5 minutes a day
What’s the secret to staying ahead of the curve in the world of AI? Information. Luckily, you can join 800,000+ early adopters reading The Rundown AI — the free newsletter that makes you smarter on AI with just a 5-minute read per day.
***
GPT-5 hits a wall
In mid-2023, OpenAI embarked on a training run to test a new design for Orion. However, the process was slow, indicating that a larger training run would be extremely time-consuming and expensive. The project, named Arrakis, revealed that developing GPT-5 would not be as straightforward as anticipated.
To enhance Orion, OpenAI researchers made technical adjustments and determined that more diverse, high-quality data was necessary. They found that the public internet did not provide enough of this data. Previously, OpenAI had used data scraped from the internet, including news articles, social media posts, and scientific papers. To make Orion smarter, OpenAI needs to expand it, requiring even more data, which is scarce. "It gets really expensive and it becomes hard to find more equivalently high-quality data," said Ari Morcos, CEO of DatologyAI, a startup focused on improving data selection. Morcos advocates for using less but higher-quality data to make AI systems more capable.
OpenAI's solution was to generate data from scratch. The company is hiring people to write new software code and solve math problems for Orion to learn from. These workers, including software engineers and mathematicians, also provide explanations for their work to Orion. Many researchers believe that code, the language of software, can help large language models (LLMs) tackle problems they haven't encountered before. OpenAI scientists think they can avoid these issues by using data generated by another AI model, o1.
The challenges with Orion led OpenAI researchers to a new approach: reasoning. They believe that spending more time "thinking" could enable LLMs to solve difficult problems they haven't been trained on. OpenAI's o1 model generates multiple responses to each question and analyzes them to find the best one. It can handle complex tasks, such as writing a business plan or creating a crossword puzzle, while explaining its reasoning, which helps the model learn from each answer.…
***
Fed’s Daly and AI-driven productivity
In the late 1990s, Mary Daly, then a young economist at the Federal Reserve Bank of San Francisco, assisted Chair Alan Greenspan in identifying a significant surge in US productivity. Now, Daly, who is the president of the San Francisco Fed, believes a similar surge is occurring, this time driven by artificial intelligence (AI).
"We're seeing it everywhere," Daly remarked, referring to the widespread adoption of AI by companies. "It's really about machine learning, robotic processing, automation—just people and businesses doing things." She noted that while we might not see immediate gains in measured productivity, the momentum for change is evident.
Recent years have seen a jump in US productivity, though economists remain divided on the contributing factors and the sustainability of this trend. Productivity, which measures the amount of labor or labor plus capital required to produce a good or service, is crucial for raising living standards but is notoriously difficult to measure.
Daly is revisiting the type of research Greenspan pursued a generation ago. "We're spending a lot of time with researchers, but we're also spending a lot of time with CEOs and CIOs, asking them what they are doing," she said. "It is astounding how many companies in the United States, and probably globally, are using these technologies." She emphasized that businesses are leveraging technology to enhance their teams, reduce tedious work, and improve efficiency. "I think that's potentially a huge benefit. It could take a decade, but it is happening," Daly concluded.…