Publications

You can also find my articles on my Google Scholar profile.

Weak-to-Strong In-Context Optimization of Language Model Reasoning

Published in NeurIPS, 2024

Large language models (LLMs) have demonstrated remarkable in-context learning capabilities, leveraging demonstrations to adeptly perform a task. Recent works have shown that such models can perform optimization over a response scoring function, evaluating the quality of suboptimal generations and applying them as exemplars to produce a better response. In this work, we seek to further explore this phenomenon and determine whether strong LLMs can optimize their reasoning paths by leveraging differentiated copies of a weak model. Central to our approach is the use of filler tokens interleaved after each step in the reasoning chain. We then define reasoning optimality, our implicit objective function, in terms of the “efficiency” as measured by the number of steps. At inference time, three copies of the weak model fine-tuned on synthetic data with varying degrees of efficiency are used to generate responses for in-context optimization with the strong model. We evaluate this method on the MMLU benchmark with Gemma-2 2B-it weak learners and Llama-3.1-405B-Instruct as the strong model, and demonstrate that our approach improves performance in a cheap manner.

Recommended citation: https://vedantgaur.com/publications

Investigating Language Model Dynamics using Meta-Tokens

Published in NeurIPS, 2024

Transformers have achieved remarkable success across various domains, but much remains unknown about their internal reasoning and training dynamics. This paper presents a novel approach using meta-tokens, special tokens injected into the input sequence, and a dedicated meta-attention mechanism to improve model performance and interpretability. We hypothesize that meta-tokens store and retrieve global contextual information by interacting through meta-attention. We test this by pretraining a modified GPT-2 architecture equipped with meta-attention, in addition to causal multi-headed attention, and demonstrate its efficacy through empirical gains on the MMLU benchmark. Furthermore, we explore the distribution of attention scores and residual stream alterations by visualizing model internals. By applying the language model head at key points in the residual stream, we find that meta-tokens accelerate layer-wise logit convergence to the correct output token. These results suggest that meta-tokens effectively capture global dependencies, providing enhanced performance on long-context tasks while offering new insights into the flow of attention scores and subsequently training behavior in transformers.

Recommended citation: https://vedantgaur.com/publications

Reasoning in Large Language Models Through Symbolic Math Word Problems

Published in ACL, 2023

Large language models (LLMs) have revolutionized NLP by solving downstream tasks with little to no labeled data. Despite their versatile abilities, the larger question of their ability to reason remains ill-understood. This paper addresses reasoning in math word problems (MWPs) by studying symbolic versions of the numeric problems, since a symbolic expression is a “concise explanation” of the numeric answer. We create and use a symbolic version of the SVAMP dataset and find that GPT-3’s davinci-002 model also has good zero-shot accuracy on symbolic MWPs. To evaluate the faithfulness of the model’s reasoning, we go beyond accuracy and additionally evaluate the alignment between the final answer and the outputted reasoning, which correspond to numeric and symbolic answers respectively for MWPs. We explore a self-prompting approach to encourage the symbolic reasoning to align with the numeric answer, thus equipping the LLM with the ability to provide a concise and verifiable reasoning and making it more interpretable. Surprisingly, self-prompting also improves the symbolic accuracy to be higher than both the numeric and symbolic accuracies, thus providing an ensembling effect. The SVAMP-Sym dataset will be released for future research on symbolic math problems.

Recommended citation: Vedant Gaur and Nikunj Saunshi. 2023. Reasoning in Large Language Models Through Symbolic Math Word Problems. In Findings of the Association for Computational Linguistics: ACL 2023, pages 5889–5903, Toronto, Canada. Association for Computational Linguistics. https://aclanthology.org/2023.findings-acl.364.pdf

Symbolic Math Reasoning with Language Models

Published in IEEE, 2023

The emergence of large language models (LLMs) such as OpenAI’s GPT-3, Google’s LaMDA, Meta’s OPT, etc. have revolutionized the field of natural language processing (NLP). These models with upwards of hundreds of billions of parameters are trained on large unlabeled text corpora and can subsequently solve downstream tasks with little to no labeled data. While these models are increasingly versatile in their abilities, e.g., solving math word problems, the larger question of their ability to reason remains. Using and modifying the SVAMP dataset, we find that GPT-3’s davinci-002 model, in addition to having good performance on numerical math word problems, also performs well on the potentially harder symbolic version of the same problems. Furthermore, adopting a two-step approach (solve symbolically and then substitute numerical values) leads to better accuracy on the numerical test set in the zero-shot regime. Additionally, we find that the use of specific prompting techniques pushes the model, in many cases, to actively describe its thought process and aid in the final answer output when faced with a complex, multi-step problem, aligning with recent observations.

Recommended citation: V. Gaur and N. Saunshi, "Symbolic Math Reasoning with Language Models," 2022 IEEE MIT Undergraduate Research Technology Conference (URTC), Cambridge, MA, USA, 2022, pp. 1-5, doi: 10.1109/URTC56832.2022.10002218. keywords: {Optimized production technology;Natural language processing;Data models;Cognition;Numerical models;Task analysis;Natural Language Processing;Zero-shot;Large Language Models}, https://ieeexplore.ieee.org/document/10002218

Lucas-Kanade Optical Flow Machine Learning Implementations

Published in JSR, 2022

Optical flow is an effective measurement to gauge motion in a scene, which allows for the computation of pixel-by-pixel motion in a frame pair. This paper aims to address the ambiguity with determining how to gain optical flow results for a given sequence. Due to varying speeds and nuances of a sequence, where it’s set, how fast it’s moving, a different amount of blur radius, i.e., the extent to which the image is blurred, may have to be applied to gain realistic flow maps. Furthermore, this paper touches on the many variables that can impact the efficacy of the flow outputted by an optical flow algorithm. Thus, we aim to determine whether the composition of results obtained through different blur values provides for more ground-truth flow outputs.

Recommended citation: Gaur, V. (2022). Lucas-Kanade Optical Flow Machine Learning Implementations. Journal of Student Research, 11(3). https://doi.org/10.47611/jsrhs.v11i3.2957 https://www.jsr.org/hs/index.php/path/article/view/2957