CS@CU At ICML 2024

Papers from CS researchers were accepted to the 41st International Conference on Machine Learning (ICML 2024). They join the machine learning research community in Vienna, Austria, on July 21 – 27, 2024. ICML brings together the brightest minds in the field to share their latest findings, foster collaborations, and inspire new directions in machine learning.

The links to the papers and the abstracts are below:

SelfIE: Self-Interpretation of Large Language Model Embeddings
Haozhe Chen Columbia University, Carl Vondrick Columbia University, Chengzhi Mao Columbia University

Abstract:
How do large language models (LLMs) obtain their answers? The ability to explain and control an LLM’s reasoning process is key for reliability, transparency, and future model developments. We propose SelfIE (Self-Interpretation of Embeddings), a framework that enables LLMs to interpret their own embeddings in natural language by leveraging their ability to respond to inquiries about a given passage. Capable of interpreting open-world concepts in the hidden embeddings, SelfIE reveals LLM internal reasoning in cases such as making ethical decisions, internalizing prompt injection, and recalling harmful knowledge. SelfIE’s text descriptions on hidden embeddings open avenues to control LLM reasoning. We propose Supervised Control, which allows editing open-ended concepts while only requiring gradient computation of individual layer. We extend RLHF to hidden embeddings and propose Reinforcement Control that erases harmful knowledge in LLM without supervision targets.

 

Counterfactual Image Editing
Yushu Pan Columbia University, Elias Bareinboim Columbia University

Abstract:
Counterfactual image editing is a challenging task within generative AI. The current literature on the topic focuses primarily on changing individual features while being silent about the causal relationships between features, which are present in the real world. In this paper, we first formalize this task through causal language, modeling the causal relationships between latent generative factors and images through a special type of causal model called augmented structural causal models (ASCMs). Second, we show two fundamental impossibility results: (1) counterfactual editing is impossible from i.i.d. image samples and their corresponding labels alone; (2) also, even when the causal relationships between latent generative factors and images are available, no guarantees regarding the output of the generative model can be provided. Third, we propose a relaxation over this hard problem aiming to approximate the non-identifiable target counterfactual distributions while still preserving features the users care about and that are causally consistent with the true generative model, which we call ctf-consistent estimators. Finally, we develop an efficient algorithm to generate counterfactual image samples leveraging neural causal models.

 

Exploiting Code Symmetries for Learning Program Semantics
Kexin Pei Columbia University, Weichen Li Columbia University, Qirui Jin University of Michigan, Shuyang Liu Huazhong University of Science and Technology, Scott Geng Univerisity of Washington, Lorenzo Cavallaro University College London, Junfeng Yang Columbia University, Suman Jana Columbia University

Abstract:
This paper tackles the challenge of teaching code semantics to Large Language Models (LLMs) for program analysis by incorporating code symmetries into the model architecture. We introduce a group-theoretic framework that defines code symmetries as semantics-preserving transformations, where forming a code symmetry group enables precise and efficient reasoning of code semantics. Our solution, SymC, develops a novel variant of self-attention that is provably equivariant to code symmetries from the permutation group defined over the program dependence graph. SymC obtains superior performance on five program analysis tasks, outperforming state-of-the-art code models, including GPT-4, without any pre-training. Our results suggest that code LLMs that encode the code structural prior via the code symmetry group generalize better and faster.

 

MGit: A Model Versioning and Management System
Wei Hao Columbia University, Daniel Mendoza Stanford University, Rafael Mendes Microsoft Research, Deepak Narayanan NVIDIA, Amar Phanishayee Columbia University, Asaf Cidon Columbia University, Junfeng Yang Columbia University

Abstract:
New ML models are often derived from existing ones (e.g., through fine-tuning, quantization or distillation), forming an ecosystem where models are *related* to each other and can share structure or even parameter values. Managing such a large and evolving ecosystem of model derivatives is challenging. For instance, the overhead of storing all such models is high, and models may inherit bugs from related models, complicating error attribution and debugging. In this paper, we propose a model versioning and management system called MGit that makes it easier to store, test, update, and collaborate on related models. MGit introduces a lineage graph that records the relationships between models, optimizations to efficiently store model parameters, and abstractions over this lineage graph that facilitate model testing, updating and collaboration. We find that MGit works well in practice: MGit is able to reduce model storage footprint by up to 7x. Additionally, in a user study with 20 ML practitioners, users complete a model updating task 3x faster on average with MGit.

 

Position: TrustLLM: Trustworthiness in Large Language Models
Yue Huang Lehigh University, Lichao Sun Lehigh University, Haoran Wang Illinois Institute of Technology, Siyuan Wu CISPA, Qihui Zhang CISPA, Yuan Li University of Cambridge, Chujie Gao CISPA, Yixin Huang Institut Polytechnique de Paris, Wenhan Lyu William & Mary, Yixuan Zhang William & Mary, Xiner Li Texas A&M University, Hanchi Sun Lehigh University, Zhengliang Liu University of Georgia, Yixin Liu Lehigh University, Yijue Wang Samsung Research America, Zhikun Zhang Stanford University, Bertie Vidgen MLCommons, Bhavya Kailkhura Lawrence Livermore National Laboratory, Caiming Xiong Salesforce Research, Chaowei Xiao University of Wisconsin, Madison, Chunyuan Li Microsoft Research, Eric Xing Carnegie Mellon University, Furong Huang University of Maryland, Hao Liu University of California, Berkeley, Heng Ji University of Illinois Urbana-Champaign, Hongyi Wang Rutgers University, Huan Zhang University of Illinois Urbana-Champaign, Huaxiu Yao UNC Chapel Hill, Manolis Kellis Massachusetts Institute of Technology, Marinka Zitnik Harvard University, Meng Jiang University of Notre Dame, Mohit Bansal UNC Chapel Hill, James Zou Stanford University, Jian Pei Duke University, Jian Liu University of Tennessee, Knoxville, Jianfeng Gao Microsoft Research, Jiawei Han  University of Illinois Urbana-Champaign, Jieyu Zhao University of Southern California, Jiliang Tang Michigan State University, Jindong Wang Microsoft Research Asia, Joaquin Vanschoren Eindhoven University of Technology, John Mitchell Drexel University, Kai Shu Illinois Institute of Technology, Kaidi Xu Drexel University, Kai-Wei Chang University of California, Los Angeles, Lifang He Lehigh University, Lifu Huang Virginia Tech, Michael Backes CISPA, Neil Gong Duke University, Philip Yu University of Illinois Chicago, Pin-Yu Chen IBM Research, Quanquan Gu University of California, Los Angeles, Ran Xu Salesforce Research, Rey Ying Yale University, Shuiwang Ji Texas A&M University, Suman Jana Columbia UniversityI, Tianlong Chen UNC Chapel Hill, Tianming Liu University of Georgia, Tianyi Zhou University of Maryland, William Wang University of California, Santa Barbara, Xiang Li Massachusetts General Hospital, Xiangliang Zhang University of Notre Dame, Xiao Wang Northwestern University, Xing Xie Microsoft Research Asia, Xun Chen Samsung Research America, Xuyu Wang Florida International University, Yan Liu University of Southern California, Yanfang Ye University of Notre Dame, Yinzhi Cao Johns Hopkins University, Yong Chen University of Pennsylvania, Yue Zhao University of Southern California

Abstract:
Large language models (LLMs) have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and capability (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones, suggesting that open-source models can achieve high levels of trustworthiness without additional mechanisms like moderator, offering valuable insights for developers in this field. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Besides these observations, we’ve uncovered key insights into the multifaceted trustworthiness in LLMs. We emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. We advocate that the establishment of an AI alliance between industry, academia, the open-source community to foster collaboration is imperative to advance the trustworthiness of LLMs.