Oral (3:30-4:20 PM)
- AI Alignment at Your Discretion
(Best Paper Awardee! 🏆🎉🎉)
- CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners
- Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
(Outstanding Paper Awardee! 🏆)
- Language Models use Lookbacks to Track Beliefs
- How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence
(Outstanding Paper Awardee! 🏆)
Poster (12:50-2:10 PM)
Listed by random order.
- Bayesian Teaching Enables Probabilistic Reasoning in Large Language Models
- What's Producible May Not Be Reachable: Measuring the Steerability of Generative Models
- Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with (TS2)
- TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models
- A LSTM language model learns Hindi-Urdu case-agreement interactions, and has a linear encoding of case
- A Probabilistic Inference Approach to LLM Inference-Time Scaling
- A Taxonomy of Transcendence
- Are Foundation Models Foundational? Synthetic Tasks Reveal World Models
- Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
- Boosting Large Language Models with Mask Fine-Tuning
- Building A Unified AI-centric Language System: analysis, framework and future work
- Can model interpretations predict behavior on unseen data?
- Capturing Human Cognitive Styles with Language: Towards an Experimental Evaluation Paradigm
- ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context
- Chunk-Distilled Language Modeling
- Classical Computation in Connectionist Models
- CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance
- Communication Makes Perfect: Persuasion Dataset Construction via Multi-LLM Communication
- Contextual morphologically-guided tokenization for pretrained Latin BERT models
- Continued Pre-training LLMs to Learn Simulated Knowledge Updates
- Do Automatic Factuality Metrics Measure Factuality?
- Escaping Collapse: The Strength of Weak Data for Large Language Model Training
- Explaining GPT-4's Schema of Depression Using Machine Behavior Analysis
- Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation
- Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling
- Focus Directions Make Your Language Models Pay More Attention to Relevant Contexts
- Generating Text from Uniform Meaning Representation
- HYBRIDMIND: Meta Selection of Natural Language and Symbolic Language for Enhanced LLM Reasoning
- In Search of Lost Language Model Training Dynamics
- In-Context Learning of Representations
- Inductive Linguistic Reasoning with Large Language Models
- Is analogy enough to draw novel adjective-noun inferences?
- JumpStarter: A Multi-Agent System for Getting Started on Personal Goals via Adaptive Personal Context Curation
- K-Paths: Reasoning over Graph Paths for Drug Repurposing and Drug Interaction Prediction
- KARL: Knowledge-Aware Retrieval and Representations aid Retention and Learning in Students
- LLMs can Perform Multi-Dimensional Analytic Writing Assessments: A Case Study of L2 Graduate-Level Academic English Writing
- Loss in the Crowd: Hidden Breakthroughs in LM Training
- Mind the Gap: Assessing Crowd-Sourced Linguistic Knowledge on Morphological Gaps of Two Related Languages
- Name of Thrones: Evaluating How LLMs Rank Student Names, Race, and Gender in Status Hierarchies
- NüshuRescue: Reviving the Endangered Nüshu Language with AI
- Performing Scientific Research with Artificial Intelligence Researcher: A Comprehensive Study with Expert-Involved Evaluation
- Planetarium🪐: A Rigorous Benchmark for Translating Text to Structured Planning Languages
- Potemkin Understanding in Large Language Models: Formalizing and Benchmarking Conceptual Comprehension
- Privacy Ripple Effects from Adding or Removing Personal Information in Language Model Training
- Probing the Capacity of Language Model Agents to Operationalize Disparate Experiential Context Despite Distraction
- Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
- PUPA: Private User Prompt Annotations Benchmark
- Re-Evaluating Evaluation for Multilingual Summarization
- Scaling Makes It Possible: How Large Models Master Impossible Languages
- Scaling Sparse and Dense Retrieval in Decoder-Only LLMs
- Scaling the Wall: Scaling Vanilla RNNs by Stealing Transformer Geometry
- Self-Steering Language Models
- Sociolinguistic Simulacra: Interactions Between Language and Attitudes in Finetuned Language Models
- Steering Fine Tuning with Targeted Concept Ablation
- Superpower🦸⚡️ of the Contrastive Decoding📈 comes from its Imagination🧠💡!
- Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo
- Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding
- TextArena: Beyond Traditional Benchmarks - Evaluating Social Intelligence in Language Models
- The Same but Different: Structural Similarities and Differences in Multilingual Language Modeling
- The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
- Toward Reliable Ad-hoc Scientific Information Extraction: A Case Study on Two Materials Datasets
- Working Memory Identifies Reasoning Limits in Language Models
- Auto-encoding Scientific Conclusions for Hypothesis Generation
- (How) Do Language Models Track State?
- A Systematic Evaluation of Transformer-LM Representations for Capturing Author States and Traits
- Accelerating robust in-context language learning
- Controlling Factual Associations and Visual Perception in Vision-Language Models
- Discovering Forbidden Topics in Language Models
- Do LLMs synthesize technical information like humans?
- Early Detection of Mild Cognitive Impairment Through Voice Assistant Interactions: An LLM-Driven Approach
- Evolutionary Dynamics of Syntax and Semantics in BERT: A Hyperbolic Geometry Perspective
- Exploring the Emergence of Shared Multilingual Concept Representations in LLMs
- First things first: Universal path dependence of learning
- ImpScore: A Learnable Metric For Quantifying The Implicitness Level of Sentences
- Investigating the Knowledge-Perception Trade-off in Vision-Language Models through Visual Counterfactuals
- Modeling Noisy-Channel Language Processing with Reanalysis of Possible Errors: A Probabilistic Inference Approach
- Paths Not Taken: Optimize Multilingual Factual Recall Pathways via Simple Zero-Rank Interventions
- Reasoning-based Regression: Teaching Language Models to Score Natural Language Features
- Supporting Biomedical Discovery with Human Agent Collaboration for Literature Grounded Search and Reasoning
- The Dual-Route Model of Induction
- The Role of PropBank Sense Numbers in AMR-to-text Generation and Text-to-AMR Parsing
- ThoughtCoder: Structured and adaptive problem solving via language model programming
- Reviving Endangered and Extinct Languages with Large Language Models
- United We Stand: Multi-LLM Collaboration for Advancing Scientific Research
- What's Hidden in Flemish Stories: How Can LLMs Unveil the Affective Nuances in Daily Narratives?