OpenAI's New "Strawberry" AI Is Still Making Idiotic Mistakes

Hello!

OpenAI Unveils Long-Awaited o1-Preview Model, Formerly Known as “Strawberry”

OpenAI has released its long-awaited AI model, previously codenamed “Strawberry.”

OpenAI's New "Strawberry" AI Is Still Making Idiotic Mistakes The Sam Altman-led company made ambitious claims in its announcement, stating that the new “o1-preview” model “performs similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology.”

Reasoning Capabilities and Real-World Limitations

Thanks to its new “human-like” ability to “reason,” the AI is designed to tackle more “complex tasks” and “harder problems,” according to OpenAI. Early testers, however, have quickly discovered that the model remains far from replacing human scientists or coders.

Social-media reports indicate that o1-preview still struggles with fundamental tasks. INSA Rennes researcher Mathieu Acher found that it repeatedly suggests illegal chess moves when solving certain puzzles. Basic counting also remains challenging: Meta AI scientist Colin Fraser highlighted an example in which the model tackled a classic river-crossing word puzzle about a farmer transporting sheep, only to abandon the correct answer for illogical output at the end.

OpenAI's New "Strawberry" AI Is Still Making Idiotic Mistakes Even when users entered the exact logic puzzle OpenAI demonstrated—one involving a strawberry—the model produced inconsistent results. One tester reported that “o1-preview gives the wrong answer to this prompt 75 percent of the time.”

Persistent Challenges with Simple Language Tasks

Some users also note that the model continues to stumble over one of the most notorious word problems for AI systems: counting the letter “R” in the word “strawberry.”

OpenAI itself acknowledged from the outset that the model is still a work in progress. “As an early model, it doesn’t yet have many of the features that make ChatGPT useful, like browsing the web for information and uploading files and images,” the company stated. “For many common cases GPT-4o will be more capable in the near term.”

OpenAI's New "Strawberry" AI Is Still Making Idiotic Mistakes

How the “Chain-of-Thought” Approach Works

The o1-preview model differs markedly from predecessors such as GPT-4o thanks to a new “chain of thought” process. Instead of generating the first plausible answer, it builds iterative reasoning steps before concluding. This approach can significantly extend response times—one user measured 92 seconds for a simple word riddle, after which the model still delivered an incorrect answer.

OpenAI research scientist Noam Brown, who contributed to the model, suggested that longer reasoning could yield major breakthroughs: “OpenAI’s o1 thinks for seconds, but we aim for future versions to think for hours, days, even weeks. Inference costs will be higher, but what cost would you pay for a new cancer drug? For breakthrough batteries? For a proof of the Riemann Hypothesis?”

Expert Reactions and Industry Debate

OpenAI's New "Strawberry" AI Is Still Making Idiotic Mistakes Noted AI critic Gary Marcus pushed back on the tweet, arguing that extended processing times would not eliminate the need for laboratory validation or clinical trials. “This is not realistic,” he wrote. “As you acknowledge o1 is still unreliable even at tic-tac-toe, and in some cases no better than earlier models. Longer processing times are unlikely to reach transcendent reasoning.”

(Brown himself conceded that the model still makes errors on basic tasks such as tic-tac-toe.)

OpenAI's New "Strawberry" AI Is Still Making Idiotic Mistakes Marcus’s comments reflect broader skepticism amid intense AI-industry hype. As companies pursue massive funding rounds—OpenAI is reportedly seeking $6.5 billion that would value it at $150 billion—concerns about return on investment and environmental impact continue to grow.

OpenAI has framed the o1-preview release as a fresh start, symbolically resetting the model counter to “1.” Given its early stumbles, the name may prove fitting.

OpenAI's New "Strawberry" AI Is Still Making Idiotic Mistakes

OpenAI Unveils Long-Awaited o1-Preview Model, Formerly Known as “Strawberry”

Reasoning Capabilities and Real-World Limitations

Persistent Challenges with Simple Language Tasks

How the “Chain-of-Thought” Approach Works

Expert Reactions and Industry Debate

Subscribe to our newsletter