Recently, researchers from the University of California, Los Angeles, and Amazon conducted an in-depth analysis of the reasoning capabilities of large language models (LLMs). For the first time, they systematically differentiated between inductive and deductive reasoning, exploring the challenges these present to AI systems.
Image Source: Generated by AI, provided by Midjourney
Inductive reasoning involves deriving general rules from specific observations, whereas deductive reasoning applies general rules to specific cases. The study aims to understand which reasoning ability poses a greater challenge for large language models. To this end, the research team developed a new method called "SolverLearner." This method allows the model to learn a function from a few examples, mapping inputs to outputs. External programs then use this function, thereby avoiding confusion with deductive reasoning.
The results show that language models like GPT-4 perform exceptionally well in inductive reasoning, achieving nearly 100% accuracy with the "SolverLearner" method. However, in deductive reasoning, especially in "counterfactual" tasks, the models struggle. For instance, while the models perform well on decimal arithmetic tasks, they encounter difficulties with calculations in other numerical systems. Additionally, the models are less flexible when analyzing sentences with unusual word orders or altered spatial orientations.
The researchers concluded that deductive reasoning is a significant challenge for current LLMs. The correct application of learned rules often depends on the frequency of these tasks during training. Although prompting methods like chain of thought can slightly improve the deductive reasoning abilities of models, the results are still unsatisfactory. It is worth noting that OpenAI's newly released model, o1, was not included in this test.
Another study conducted by researchers from Ohio State University and Carnegie Mellon University examined the logical reasoning capabilities of Transformer models. They investigated whether the models could acquire the ability to "grok" implicit inferences, especially in combination and comparison tasks.
The results showed that these models could indeed acquire the ability to make implicit inferences after prolonged training, but only in comparison tasks could they generalize to unseen examples. The researchers pointed out that this difference is related to the internal structure of the learned circuits and suggested adjustments to the Transformer architecture to achieve quality improvements in preliminary experiments.
Key Points:
🌟 LLMs excel in inductive reasoning, with accuracy nearing 100%.
🧩 Deductive reasoning remains a challenge, particularly in handling counterfactual tasks.
🔍 Another study indicates that Transformer models can acquire implicit inference abilities in combination tasks, but with limited generalization capabilities.