Approximating Reasoning with Transformer Language Models

University essay from Göteborgs universitet/Institutionen för data- och informationsteknik

Abstract: We conduct experiments with BART, a generative language-model architecture, to investigate its capabilities for approximating reasoning by learning from data. For this we use the SimpleLogic dataset, a dataset of satisfiability problems in propositional logic originally created by Zhang et al. (2022). Their previous work on SimpleLogic has highlighted the pitfalls when trying to solve such inference problems with aclassifier: it tends to learn spurious correlations of the dataset, not reasoning. Building on research by Wei et al. (2022), showing that prompting transformer models to produce inference steps in math improves performance, we augment the SimpleLogic dataset with inference steps using two different approaches. In the first approach, inference steps are generated top-down with BART with one input-output run for each inference problem. In the second, inference steps are generated in a bottom-up manner, alternating between a generative mechanism and a symbolic mechanism. The symbolic module updates the input to the generative module (BART) by utilizing the output from the previous generational step. In this approach, BART thus attempts to solve a small part of the problem during each iteration. We demonstrate that an iterative bottom-up approach with a generative model fine-tuned on proofs better approximates logical reasoning, also on out-of-distribution data. The second approach achieves near-perfect accuracy on all test sets, and proofs are fully consistent in more than 99 % of the cases, even on out-of-distribution data. Previous research on chain-of-thought prompting indicates that teaching generative models inference steps can improve performance on reasoning problems, which is consistent with our findings. A limitation of the study is that the bottom-up and top-down approaches are not fully comparable. Further investigation is necessary to determine whether the improved performance of the second approach can be attributed to its neuro-symbolic architecture, its bottom-up training-data or a combination of both factors. Future research could specifically explore this question by training both fully neural and neuro-symbolic models to employ a bottom-up approach to address this question more directly.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)