New England NLP Meeting Series

Keynote Speakers

Yilun Du

Yilun Du

Harvard University

Constructing Generalizable Multimodal Models through Compositional Generation

Abstract

To construct intelligent embodied agents, it is important that models can ingest and output signals in the form of images, text, actions, and various other sensor modalities. However, gathering data that covers all such combinations of input data is often challenging, and existing multimodal models are often frail and sensitive to distribution shift. In this talk, I'll illustrate how we can circumvent some of these data challenges by instead constructing multimodal models compositionally -- combining models that focus on different modalities such as text, videos, and actions. I'll illustrate how such compositional systems enable us to construct zero-shot multimodal systems that can accomplish various tasks such as reasoning, vision-language captioning and robotic planning.

Bio

Yilun Du is an incoming assistant professor at Harvard University in the Kempner Institute and Computer Science. He is currently a senior research scientist at Google Deepmind and previously received his PhD at MIT.

He He

He He

New York University

How Transformers Search in Reasoning Tasks

Abstract

This talk will study two modes of reasoning in transformer language models: latent reasoning through the forward pass and explicit reasoning through chain-of-thought. For latent reasoning, we show that transformers can learn efficient parallel search algorithms. For explicit reasoning, we show that models know when their intermediate answers are correct during search, and this can be used to improve reasoning efficiency.

Bio

He He is an assistant professor in computer science and data science at New York University. She is interested in how large language models work and potential risks of this technology.

Yoon Kim

Yoon Kim

MIT

On the Future of Transformers

Abstract

Transformers are still the dominant architecture for language modeling (and generative AI more broadly). This talk will speculate on how Transformers (in particular the attention mechanism) will change over the near future.

Bio

Yoon Kim is an assistant professor at MIT. He obtained his PhD from Harvard University.

Alex Lew

Alex Lew

Yale University

Language Model Probabilistic Programming

Abstract

Even after fine-tuning and reinforcement learning, language models can be difficult to control reliably with prompts alone. This talk proposes a new approach to specifying and executing tasks using language models, based on probabilistic programming. Language model probabilistic programs precisely and compositionally specify target distributions from which to generate samples. These distributions generally arise by applying hard and soft constraints to the default output distributions of one or more LMs. Samples can then be drawn approximately from these target distributions using sequential Monte Carlo — and their quality improved via test-time scaling of the number of particles used by the algorithm. This talk will demonstrate that language model probabilistic programs can improve downstream performance on a broad variety of tasks, from Python code generation to molecule synthesis to trip planning. It will also show how language models can themselves write language model probabilistic programs, enabling small LMs can outperform frontier models on a number of challenging tasks.

Bio

Alex Lew is an incoming Assistant Professor of Computer Science at Yale. His research focuses on the intersection of programming languages and probabilistic machine learning, aiming to develop systems and theory that facilitate the invention, application, and understanding of scalable algorithms for probabilistic modeling and inference. Alex attended graduate school under the supervision of Vikash Mansinghka and Joshua Tenenbaum. Prior to his PhD, Alex taught computer science to high school students at the Commonwealth School in Boston, MA. Alex's work has been recognized with a 2019 Facebook Probability and Programming Award, a 2020 NSF Graduate Research Fellowship, and 2023 ACM SIGPLAN and ACM SIGLOG Distinguished Paper awards.

Byron Wallace

Byron Wallace

Northeastern University

LLMs for healthcare: Risks and interpretability methods to (possibly) mitigate them

Abstract

Abstract coming soon.

Bio

Bio coming soon.

Jason Weston

Jason Weston

Meta / New York University

Self-Improvement of LLMs

Abstract

We describe recent methods that enable large language models (LLMs) to self-improve, increasing their performance on tasks relevant to human users. In particular we describe the methods of Self-Rewarding LLMs (https://arxiv.org/abs/2401.10020), Iterative Reasoning Preference Optimization (https://arxiv.org/abs/2404.19733), Thinking LLMs (https://arxiv.org/abs/2410.10630), Meta-Rewarding LLMs (https://arxiv.org/abs/2407.19594), and more!

Bio

Jason Weston is a research scientist at Meta AI, USA and a Visiting Research Professor at NYU. He earned his PhD in machine learning at Royal Holloway, University of London and at AT&T Research in Red Bank, NJ (advisors: Alex Gammerman, Volodya Vovk and Vladimir Vapnik) in 2000. From 2000 to 2001, he was a researcher at Biowulf technologies. From 2002 to 2003 he was a research scientist at the Max Planck Institute for Biological Cybernetics, Tuebingen, Germany. From 2003 to 2009 he was a research staff member at NEC Labs America, Princeton. From 2009 to 2014 he was a research scientist at Google, NY. Jason's publications include best paper awards at ICML and ECML, and a Test of Time Award for his work "A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning", ICML 2008 (with Ronan Collobert). He was part of the YouTube team that won a National Academy of Television Arts & Sciences Emmy Award for Technology and Engineering for Personalized Recommendation Engines for Video Discovery. Some of his notable work influencing the field of NLP includes the "NLP from scratch" work starting in 2008 which introduced pretraining and fine-tuning of language models, Memory Networks in 2014-2015 which introduced multi-layer attention pre-Transformers, DrQA in 2017 which introduced RAG-like methods, BlenderBot 1-3 and other LLM dialogue research pre-chatGPT in 2018-2022, and more recently work like Self-Rewarding LLMs for self-improvement.