Apple's AI Reality Check​​​​​​​
Apple has built its reputation on redefining the technology landscape. With innovations like the iPod, MacBook, iPhone, iPad, and Siri, they set benchmarks and spawned imitators across the globe. But here’s the puzzler: why isn’t Apple charging ahead in the AI race? Unlike other tech titans, Apple hasn’t fully bought into the AI hype dominating the industry. You’d think they’d want to lead this transformative wave. Now, we might have a clue why.
Recently, Apple’s AI scientists published research indicating that even the most advanced Large Language Models (LLMs) lack fundamental reasoning abilities, making them less capable than their developers would have you believe. So, how did Apple’s team reach this conclusion, and what does it mean for the AI landscape?
To probe LLMs’ capabilities, Apple’s researchers tested models from Meta and OpenAI, including OpenAI's cutting-edge “o1” model that automates a prompting method known as the “chain of thought” to supposedly improve reasoning. But instead of tackling complex equations, the researchers posed simple, grade-school-level math problems with a twist: they added bits of irrelevant information.
One example involved asking the LLMs about kiwi counts: "Oliver picks 44 kiwis on Friday, 58 on Saturday, and twice as many on Sunday as Friday. Five were smaller than average. How many kiwis in total?" Despite their supposed reasoning skills, models like Meta’s Llama-3–8B and OpenAI's o1 got tripped up, incorrectly subtracting five from the total. Essentially, they misunderstood the question, revealing their “reasoning” is anything but dependable.
Adding even slight modifications, such as irrelevant information, drastically reduced the models’ accuracy—by up to 65%. As the researchers noted, there was no evidence of genuine reasoning at work; rather, LLMs appear to engage in advanced pattern-matching. Even changing names in a question altered results, suggesting the fragile nature of their so-called intelligence.
This runs contrary to the bold claims of AI companies, with flashy ads like Google’s Gemini AI promoting LLMs as smart business assistants. Given this study’s findings, that notion seems shaky at best.
But won’t AI improve over time? With tech giants like Google, Meta, Microsoft, and OpenAI pumping billions into AI, surely this reasoning issue will soon be solved, right?
Not so fast. Evidence suggests that simply expanding AI models by training them on more data doesn’t inherently improve their reasoning skills. AI is fundamentally built on statistics, but true human reasoning requires more than number-crunching. Furthermore, scaling AI is hitting diminishing returns; maintaining growth demands exponentially more data, resources, and cash—a daunting challenge when even OpenAI faces limits on funding and data.
OpenAI’s “o1” model attempted to break through this barrier by automating logical thought processes, but Apple’s study suggests it remains a pipe dream.
So, what does this mean for Apple’s approach to AI? Historically, Apple has let others rush into new tech and stumble through mistakes, choosing to either innovate further or steer clear. They refined smartphones after Blackberry, reimagined tablets after the JooJoo, and launched Vision Pro after Google Glass. When it comes to AI, it seems Apple may be opting for caution over frenzy. Are LLMs the future of AI or just sophisticated pattern-matchers masquerading as intelligent systems?
The question remains: Are we at the brink of true AI intelligence, or are we being dazzled by flashy algorithms with more limits than they let on?
Back to Top