Research Fair Fall 2024

The Fall 2024 Research Fair will be held on Thursday, September 5th, and Friday, September 6th, from 12:00 to 14:00 in the CS Lounge (CSB 452). This is an opportunity to meet faculty and Ph.D. Students working in areas of interest to you and possibly work on these projects.

Please read their requirements carefully! There will be a couple of Zoom sessions and recordings available – see below for all details.

In Person

Thursday, Sept 5th

Faculty/Lab: Prof. Corey Toler-Franklin

Brief Research Project Description: The Graphics Imaging & Light Measurement Lab (GILMLab) has research opportunities in the following areas:

About the lab: https://coreytolerfranklin.com/gilmlab/
About the PI: Corey Toler-Franklin https://coreytolerfranklin.com/
Click to apply: https://forms.gle/jvNDT7UmdVdXachN9

AI for Cancer Detection
Identifying Cancer Cells and Their Biomarker Expressions
Cell quantitation techniques are used in biomedical research to diagnose and treat cancer. Current quantitation methods are subjective and based mostly on visual impressions of stained tissue samples. This time-consuming process causes delays in therapy that reduce the effectiveness of treatments and add to patient distress. Our lab is developing computational algorithms that use deep learning to model changes in protein structure from multispectral observations of tissue. Once computed, the model can be applied to any tissue observation to detect a variety of protein markers without further spectral analysis. The deep learning model will be quantitatively evaluated on a learning dataset of cancer tumors.

AI for Neuroscience
Deep Learning for Diagnosing and Treating Neurological Disorders
Advances in biomedical research are based upon two foundations, preclinical studies using animal models, and clinical trials with human subjects. However, translation from basic animal research to treatment of human conditions is not straightforward. Preclinical studies in animals may not replicate across labs, and a multitude of preclinical leads have failed in human clinical trials. Inspired by recent generative models for semi-supervised action recognition and probabilistic 3D human motion prediction, we are developing a system that learns animal behavior from unstructured video frames without labels or annotations. Our approach extends a generative model to incorporate diffusion models, adversarial inference, and transformer-based self-attention modules.

AI for Quantum Physics & Appearance Modeling
Quantum Level Optical Interactions in Complex Materials
The wavelength dependence of fluorescence is used in the physical sciences for material analysis and identification. However, fluorescent measurement techniques like mass spectrometry are expensive and often destructive. Empirical measurement systems effectively simulate material appearance but are time consuming, requiring densely sampled measurements. Leveraging GPU processing and shared super computing resources, we develop deep learning models that incorporate principles from quantum mechanics theory to solve large scale many-body problems in physics for non-invasive identification of complex proteinaceous materials.

AI for Multimodal Data & Document Analysis
This project develops neural rendering methods and multimodal transformer networks to decipher findings from the Tulsa Race Massacre Death Investigation. The Tulsa Race Massacre (1921) destroyed a flourishing Black community and left up to 300 people dead. More than 1000 homes were burned and destroyed. Efforts are underway to locate the bodies of victims and reconstruct lost historical information for their families. Collaborating with the Tulsa forensics team, we are developing spectral imaging methods (on-site) for deciphering information on eroded materials (stone engravings, rusted metal, and deteriorated wood markings), and a novel multimodal transformer network to associate recovered information on gravestones with death certificates and geographical information from public records.

Required/preferred prerequisites and qualifications: Python and/or C/C++, Computer Graphics and/or Machine Learning experience

Faculty/Lab: Prof. Roxana Geambasu

Brief Research Project Description: With the impending removal of third-party cookies from major browsers and the introduction of new privacy-preserving advertising APIs, our group of privacy experts has a unique timely opportunity to assist industry in qualitatively improving the Web’s privacy. We are working with major companies and with a W3C working group to improve the privacy stance of emerging advertising APIs. We analyze designs from Google, Apple, Meta and Mozilla, and augment them with a more rigorous and efficient differential privacy (DP) budgeting component. Our approach, called Alistair, enforces well-defined DP guarantees and enables advertisers to conduct more private measurement queries accurately. By framing the privacy guarantee in terms of an individual form of DP, we can make DP budgeting more efficient than in current systems that use a traditional DP definition. We incorporate Alistair into Chrome and evaluate it on microbenchmarks and advertising datasets. Across all workloads, Alistair significantly outperforms baselines in enabling more advertising measurements under comparable DP protection. The project is in collaboration with Meta and Mozilla and part of our engagement with a W3C community group that works toward developing a standard for new privacy-preserving advertising for the Web. Come join our team in this effort to finally qualitatively improve the Web’s privacy! (More info in our arxiv draft: https://arxiv.org/abs/2405.16719.)

Required/preferred prerequisites and qualifications: Strong theoretical/mathematical background and systems coding abilities. The project involves both design and implementation, as well as evaluation of ad-measurement systems that have mathematically provable privacy properties. Both implementation and theoretical skills are therefore important.

Faculty/Lab: Prof. Brian Plancher

Brief Research Project Description: The work of the A²R Lab at Barnard College, Columbia University, focuses on performance engineering for computational robotics at the edge. In particular we’ve done a lot of work on the design of robotic algorithms and implementations for deployment on non-standard computational architecture (e.g., GPUs, MCUs). Thus, our work has been at the intersection of computer systems / architecture, numerical optimization, and machine learning. Historically we’ve done most of our work on GPU acceleration of numerical optimal control algorithms in the pursuit of whole body nonlinear model predictive control for locomotion. This semester, we will be continuing with that line of work and generalizing our solvers to be more useful for other researchers with a survey and implementation of various constraint methods for parallel differential dynamic programming (building on our past Parallel DDP work – https://a2r-lab.org/publication/parallelddp/). We’ll also be exploring smaller scale solvers on microcontrollers for tiny robots building on our recent, award winning, TinyMPC work (https://a2r-lab.org/publication/tinympc/), and integrating into that framework aspects of machine learning and computer vision with a focus on embedded systems engineering. We will also hopefully be launching additional collaborations with other labs (at CU and beyond) and industry partners (e.g., Intel Labs) to further advance edge robotic performance and deploy novel computer hardware and both learning and optimization-based algorithms onto the edge.

Required/preferred prerequisites and qualifications: Prerequisites and Qualifications depend on the particular project and role. None are required, but candidates with relevant experience will gain preference. Such experiences may include: C(++) programming, embedded programming, CUDA/parallel programming, machine learning, numerical optimization / optimal control, electronics, robot hardware, CAD design and prototyping.

For more details and to apply, please fill out the form on our lab join page: https://a2r-lab.org/join

Faculty/Lab: Prof. Venkat Venkatasubramanian & his students Naz Pinar Taskiran, Arijit Chakraborty, and Collin Szczepanski

Brief Research Project Description:

We are looking for undergraduate and graduate students to join our efforts for five projects:
1. Text-to-causal modeling for generating mechanistic explanations (3-4 students)
The knowledge about a physicochemical process present in textbooks, articles, handbooks, manuals, etc, is a trove of relevant information that upon extraction, can be utilized to generate mechanistic explanations. In this project, we would like to explore the potential of utilizing the extracted information from these textual sources, to generate explanations for physicochemical processes by virtue of the symbolic model(s) obtained from a machine learning (ML) algorithm. We would like to explore embeddings that enable us to discover causal relations between variables and utilize it to provide insights about the processes being modeled.
Prerequisites: Proficiency in Python; curiosity; and eagerness to learn new concepts (e.g., causal modeling, digraphs, etc).

2. Financial modeling using symbolic model discovery engine (3-4 students)
Using a symbolic model discovery engine that we have developed internally in our group (AI-DARWIN), we would like to explore the possibility to stock price prediction using the same on low latency data (time-series modeling). Preliminary results seem promising, when fundamentals are used to train a model for quarterly stock-price prediction. We would like to expand on this and attempt to predict intraday variation using time-series modeling and inclusion of domain knowledge.
Proficiency in Python; curiosity; and eagerness to learn new concepts. Finance background is advantageous, but not mandatory.

3. Accelerating drug discovery through automatic, knowledge-graph-based ontology population (3-4 students)
In pharmaceutical discovery and manufacturing, engineers must process thousands of unstructured documents to obtain the necessary information. These documents are rich in technical information and domain-specific terms which, despite their success in other fields, large language models like ChatGPT have difficulty tackling. We developed SUSIE, an ontology-based pharmaceutical information extraction tool that is built to extract semantic triples and present them to the user as knowledge graphs (KGs). The ontology that the student will be interacting with is the Columbia Ontology of Pharmaceutical Engineering (COPE). The student will be working with PhD students on exploring different methods to interface the generated knowledge graphs and the ontology. Students will also work on building a reasoner to infer logical consequences, classify entities, and consistency checking for the populated ontology to accelerate the drug discovery pipeline.
Prerequisites: Python programming, machine learning, knowledge about data structures and inference techniques (recommended), and curiosity to learn about a new field.

4. Inner workings of LLMs for science and engineering applications (3-4 students)
Data-driven AI models learn from the vast amounts of data that they are able to process, yet we do not know how that knowledge is organized within their vast network, though it may be a form of geometrical organization in their hyperspace. However, despite their success in other fields, LLMs like chatGPT have difficulty with technical and scientific content, which indicates that this geometric organization is not enough to solve such problems where domain knowledge is critical. We aim to explore in what ways we can infuse first principles knowledge into LLMs. This would give the LLMs the ability to reason and explain, similarly to how a human expert may behave when faced with an issue. Students will be exploring methods of building reasoner tools into this geometric organization of existing LLMs to solve problems pertinent to chemical engineering, in particular drug discovery, where new drug applications span years of work and thousands of technical documents that experts need to parse through, and explainability of the arrived conclusion is critical.
Prerequisites: Python programming, machine learning, knowledge about data structures and inference techniques (recommended), and curiosity to learn about a new field .

5. Emergent Behavior in Deep Learning
Statistical Teleodynamics is a mathematical framework grounded in statistical mechanics and game theory that predicts emergent behaviors of complex networks using information about the behavior of individual agents. Recently, we have applied statistical teleodynamics in the context of deep neural networks (DNNs), accurately predicting the distribution of weights and biases in a variety of large, well-trained DNNs. We are now interested in extending our model to understand emergent behaviors during the training process of a DNN. We are particularly interested in leveraging knowledge of newfound emergent properties to train large-scale DNNs more efficiently and robustly.
Prerequisites: Proficiency in Python; interest in theoretical machine learning, game theory, and/or physics. Experience with probability/statistics, optimization, and multivariate calculus.

Required/preferred prerequisites and qualifications: The prerequisites for each project can be found above.

Faculty/Lab: Prof. Marko Jovanovic & Benjamin Bokor

Brief Research Project Description: Developing an analytical tool to discover protein interaction dynamics throughout time-courses.

Required/preferred prerequisites and qualifications: Experience with python and its application to large datasets.

Faculty/Lab: Prof. Steve Feiner

Brief Research Project Description: The Computer Graphics and User Interfaces Lab (Prof. Feiner, PI) does research in the design of 3D and 2D user interfaces, including augmented reality (AR) and virtual reality (VR), and mobile and wearable systems, for people interacting individually and together, indoors and outdoors. We use a range of displays and devices: head-worn, hand-held, and table-top, including Varjo XR-3, HoloLens 2, and Magic Leap 2, in addition to consumer headsets such as Meta Quest 3. Multidisciplinary projects potentially involve working with faculty and students in other schools and departments, from medicine and dentistry to earth and environmental sciences and social work.

Required/preferred prerequisites and qualifications: We’re looking for students who have done excellent work in one or more of the following courses or their equivalents elsewhere: COMS W4160 (Computer graphics), COMS W4170 (User interface design), COMS W4172 (3D user interfaces and augmented reality), and COMS E6998 (Topics in VR & AR), and who have software design and development expertise. For those projects involving 3D user interfaces, we’re especially interested in students with Unity experience.

Faculty/Lab: Prof. Xia Zhou, Prof. Salvatore Stolfo, & Xiaofeng Yan

Brief Research Project Description:

Authentication is crucial in protecting our data, finances, and personal information. Unlike traditional methods such as passwords or physical keys, biometric authentication is gaining popularity due to its convenience and robustness. We interact with biometric systems daily, like Face ID and fingerprint scanners. Among these, palm vein recognition is emerging as a highly secure modality. Palm veins, located under the skin and normally invisible, have a high degree of biological entropy and are even unique between identical twins. Additionally, palm vein patterns remain stable over time, making them an excellent candidate for long-term authentication.

Despite these advancements, there are still many open challenges in palm vein research. For example, how can we verify palm veins reliably under varying environmental conditions (e.g., wet or dirty palms) or different gestures and poses? How can we effectively extract features after image enhancement, such as distinguishing surface prints from vascular patterns? Furthermore, individual differences in vein visibility in NIR images pose another significant challenge.

Our project aims to build the first publicly available dataset that includes both standard and challenging palm vein images. The data will be collected in video format, resulting in millions of frames that capture variations. This dataset will benefit machine learning model training and AI applications in security systems. Additionally, we will explore algorithms and systems to improve the robustness and performance of palm vein authentication technologies.

Required/preferred prerequisites and qualifications: Proficiency in Python programming. Completed coursework in computer vision and machine learning.

Faculty/Lab: Prof Kathy McKeown & Zachary Horvitz

Brief Research Project Description: Professor McKeown’s lab is looking for undergraduate and master’s researchers to assist on several NLP projects, including:

– Summarizing perspectives across sociodemographic groups and dialect

– Multilingual (non-specific) detection of mental health, nuanced emotions (e.g. grief, anxiety), and psychological attitudes

– Harnessing LLMs for humor generation and understanding

– Text authorship attribution and textual style transfer

– Understanding conversation dynamics through LLMs and text generation

Required/preferred prerequisites and qualifications: Familiarity with NLP methods, previous experience with machine learning.

Faculty/Lab: Prof. Itsik Pe’er & Philippe Chlenski

Brief Research Project Description: Modern AI is based on representing objects as points in a vector space, typically a Euclidean one. Yet, for datapoints that come about through a tree-like generative process, as in biological development or evolution, non-Euclidean (hyperbolic, spherical, or mixed) geometry is more appropriate. Several project slots are available for developing and evaluating ML methods in non-Euclidean spaces on biological high throughput data. See details here https://bit.ly/474rlbk

Required/preferred prerequisites and qualifications: Different slots require different levels of background in computation, eith an emphasis on AI/probability & statistics and/or in biology (including no biology at all).

Friday, Sept 6th

Faculty/Lab: Vishal Misra

Brief Research Project Description: Help develop an AI based course assistant that will be used in all classes at Columbia

Required/preferred prerequisites and qualifications: Expertise in python is a must. Knowledge of LLMs and retrieval augmented generation (RAG) techniques is a plus

Faculty/Lab: Prof. Matthew Connelly

Brief Research Project Description:

Student researchers will apply machine learning and NLP techniques to large (>3 million) corpora of declassified documents. We will be working on a RAG system for LLMs using the documents. We are also looking for a student to assist with enhancing and maintaining a Columbia Sites website (Drupal-based CMS). The role requires a keen interest in History, and previous website development experience is preferred.

Required/preferred prerequisites and qualifications: Some knowledge of Python, SQL; interest in NLP, image processing and its application to research in international history and politics.

Faculty/Lab: Prof. Ken Ross & Junyoung Kim

Brief Research Project Description: There will be two projects. One project involves the implementation of matrix-multiplication and matrix-multiplication-like primitives within the DuckDB database management system.
The second project involves the automated analysis of images generated using aggregation queries using techniques from computer vision.

Required/preferred prerequisites and qualifications: Both projects require successful completion of 4111 or an equivalent DB course. Completion of an advanced DB course is a plus. For the second project, successful completion of a computer vision course is a plus.

Faculty/Lab: Prof. Junfeng Yang & Raphael Sofaer

Brief Research Project Description: Bridging the Proof Assistant – SMT Solver Gap, Formal Verification with inferred models of external APIs.

Required/preferred prerequisites and qualifications: Required: Interest in formal verification and programming language theory. Relevant: Rust, C, Coq or other proof assistant, languages with formal guarantees such as Dafny, Verus, and SPARK Ada.

Faculty/Lab: Prof. Baishakhi Ray, Prof. Junfeng Yang & their student Jinjun Peng

Brief Research Project Description:As the wide application of Code LLMs on coding assistance, security issues of AI generated code arise and pose threat to computer systems. We plan to explore how to build Code LLMs with better security awareness while keeping its functionality to fulfil user requests.

We will learn and explore latest advances on AI safety, AI security, Coding LLMs, etc. Participating students will experience as many parts as possible in a complete research project, inlcuding paper reading, ideas brainstorming, algorithm implementation, experiment analysis, paper writing, etc, which is helpful to applying for PhD programs in the future.

To apply, see the instructions (https://urldefense.proofpoint.com/v2/url?u=https-3A__co1in.me_apply-5Fra&d=DwIFaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=MEdQCnmvLwzYPjHQI7DSC26JYbrP88z20NCeH0mOTQI&m=PQEyfxReA9ec597-VMVkzUYLyP-IKnv3PQxgyk_7-hdfdljfy2WgLb8OGYMw5J0v&s=cWDRndxnciBEqj1cTMk87RlwUJE53L8auxBPFV9P7PY&e= ) and email jinjun.peng@columbia.edu .

Required/preferred prerequisites and qualifications: Proficient Python programming; Understanding of C++/C and some popular languages; The ability to train a neural network (in PyTorch) and understand its mechanism; The ability to read research papers is preferred; Understanding the mechanism of LLMs is preferred; Hands-on experience of using open-source LLMs in Python is preferred.

Faculty/Lab: Albert Boulanger

Brief Research Project Description: Data Engineering and AI for Electronic Data Capture in Medical Research Studies. |Supervisor: Albert Boulanger
This work involves data engineering, interface design, AI, and system integration to develop systems to intelligently manage data for research studies.

1) The Joint Cohort Explorer, is centered on data engineering to build a seamless interface to explore and combine multiple cohort datasets.

2) A Global ID system to identify study subjects who enroll in research studies using the REDCap platform with some enrolled in several studies. Study subjects as assigned study IDs and with the Global ID overlay, they also get an anonymous ID across studies.

Integrated Telehealth After Stroke Care | Supervisors: Dr. Syeda Imama Ali Naqvi & Albert Boulanger
This project, integrated Telehealth After Stroke Care, is a follow up the study to https://www.thieme-connect.com/products/ejournals/pdf/10.1055/s-0043-1772679.pdf and is geared to apply informatics-based approaches to deliver equitable care and improve wellbeing among minoritized stroke populations with hypertension. The platform currently consists of a web-based blood pressure telemonitoring database using Django and wireless devices that push data after every measurement. Data is to be processed through R shiny to create Clinical Decision Support tools for providers and participants using visually tailored infographics created through iterative community-based participation with a human-centered design process.

Required/preferred prerequisites and qualifications: For Data Engineering and AI for Electronic Data Capture in Medical Research Studies: Relational and graph databases and data base management concepts like datbase replication; Python (Django, Django CMS, Plotly Dash, & data science/ML/NLP libraries); JavaScript (node, WebAssembly, Bootstrap, React , Next.js, etc; PHP. for REDCap development.

For Project in Computer Vision and MR Images: Prior experience in deep learning for computer vision and image processing, especially of the brain, including segmentation, clustering, classification, and regression are highly desired.

For Integrated Telehealth After Stroke Care: The development involves the use of Python, Django, R Shiny, and an eye for good infographics design. System integration skills and AWS ecosystem experience are desired.

Faculty/Lab: Prof. Venkat Venkatasubramanian & his students Naz Pinar Taskiran, Arijit Chakraborty, and Collin Szczepanski

Brief Research Project Description:

Required/preferred prerequisites and qualifications: The prerequisites for each project can be found above.

Faculty/Lab: Prof. Julia Hirschberg & PhD student Run Chen

Brief Research Project Description: We are looking for NLP engineers for building models for understanding and discovering empathy. https://www.cs.columbia.edu/speech/projects.cgi#empathy

Required/preferred prerequisites and qualifications: Completion of NLP course or relevant NLP modeling experience. Experience with spoken language processing is a plus.

Zoom

See Details

Please check back for possible Zoom session details

Faculty/Lab: Prof. Steve Feiner

Zoom: Thursday, September 5th 4pm – 5:30pm Zoom Link

Faculty/Lab: DVMM Lab (Hammad Ayyubi (https://hammad001.github.io/))

Brief Research Project Description: The project deals with video understanding. Specifically, we will work on reasoning about actions. Some of the question that we will look to answer are: Can the action be predicted from before state and after state? Is an action possible given start state and end state?

Required/preferred prerequisites and qualifications: The student should be familiar with the fundamentals of Deep Learning and Computer Vision: CNNs/RNNs, training and evaluating models using PyTorch/Tensorflow. Students with knowledge about Transformer architecture will be preferred.

Zoom: September 4th, 3pm https://columbiauniversity.zoom.us/j/93410905272?pwd=aEuvLXQByX4hazbaRxTFbtUToiaFAT.1

Faculty/Lab: Prof. Simha Sethumadhavan & Evgeny Manzhosov

Brief Research Project Description: A variety of projects about system security and reliability (servers, AI, etc. )

Required/preferred prerequisites and qualifications: Computer Architecture, ML training

Zoom: September 6th, 2pm https://columbiauniversity.zoom.us/my/evgeny

Faculty/Lab: Prof Tal Korem (tal.korem@columbia.edu), & Andrey Zaznaev, George Austin

Brief Research Project Description:

Please see:

https://docs.google.com/document/d/117y4o93qJjBdebW8MNs8wazEjC5iolQ7-QXKQMqjBjM/edit?usp=sharing

https://docs.google.com/document/d/18GyWgTU_4Vonjigev0kvtgJTfqdkcSTmkrbaNdZW9EI/edit?usp=sharing

https://docs.google.com/document/d/1DUcsijI-pYDYSiwSCeH-cdXLtXaRMstwWbBhWhQk4g8/edit?usp=sharing

Required/preferred prerequisites and qualifications: Please see in project descriptions above.

Zoom: September 4th, 4pm

https://columbiacuimc.zoom.us/j/91996755998?pwd=7pvjyO24Ayyp4n2oqpvbvpUnVEKTsf.1

Albert Boulanger

1) The Joint Cohort Explorer, is centered on data engineering to build a seamless interface to explore and combine multiple cohort datasets.

Zoom: Monday Sept 9 at noon – https://columbiacuimc.zoom.us/j/98530188967?pwd=RFJHUlhlSCs4NlVJcGZpdkRqd1d3Zz09

Faculty/Lab: Prof. Junfeng Yang

Brief Research Project Description: We study trustworthy large language models. We are interested in creating innovative techniques for understanding and jailbreaking LLM’s. This work includes identifying LLM pre-training data, understanding training principles, detecting vulnerabilities of such systems, and the various applications of such techniques. Applications include: evaluating the security and privacy of user data in an LLM system; measuring the fairness and potential bias of LLM’s for vulnerable racial, gender, etc. demographic groups; understanding the security and alignment of fine-tuned LLM’s in particular; etc. We aim to publish results in top ML/CV/NLP conferences. Come and conduct research with us this fall!

Required/preferred prerequisites and qualifications:

– Required: Python

– Preferred: Knowledge of NLP/ML concepts, Deep Learning Framework like Pytorch

– Stand-out: experience fine-tuning LLM’s (i.e., GPT, Llama, etc.)

– Potential research assistants should also demonstrate autonomy/self-motivation (i.e., not afraid to push back on your advisor, propose individual ideas, etc.).

Google Meeting Info: Membership Inference Project Info Session Monday, September 9 · 12:00 – 1:00pm Time zone: America/New_York Google Meet joining info Video call link: https://meet.google.com/xjx-azem-jxg Or dial: ‪(US) +1 443-671-8676‬ PIN: ‪702 164 851‬# More phone numbers: https://tel.meet/xjx-azem-jxg?pin=1727179410674

Research Fair Fall 2024

Research Fair Fall 2024

In Person

Thursday, Sept 5th

Friday, Sept 6th

Zoom

See Details

Computer Science at Columbia University

Upcoming Events

CSMS New Incoming Student Pre-Arrival Orientation SP 25

Research Fair

In the News

Press Mentions