13:43
15:47
15:00
17:12
13:01
12:56
13:43
15:47
15:00
17:12
13:01
12:56
13:43
15:47
15:00
17:12
13:01
12:56
13:43
15:47
15:00
17:12
13:01
12:56
A recent study by Apple's AI research team has revealed critical weaknesses in the logical reasoning capabilities of large language models (LLMs), including those from OpenAI and Meta.
Published on arXiv, the study found that slight changes in the wording of mathematical questions can cause significant variations in the models' answers, undermining their reliability in tasks requiring logical consistency.
The study showed that LLMs tend to rely on pattern matching rather than genuine reasoning. In one test, adding irrelevant information to a math problem caused the models to miscalculate, highlighting their unreliability.
We found no evidence of formal reasoning in language models. Their behavior is better explained by sophisticated pattern matching—so fragile, in fact, that changing names can alter results by ~10%.
Apple suggests that combining neural networks with traditional symbol-based reasoning, known as neurosymbolic AI, may be necessary to improve AI's problem-solving abilities.