Local Event Detection
Not every event that interests us is considered newsworthy by the traditional media. Some events that do not fall under the news category include local annual gatherings, convention, smaller scale parades, film festivals, rallies and more. This project focuses on developing strategies for discovering events of any size, type, and duration. For this, we leverage the wide spectrum of information sources available on the Web, beyond traditional news articles or Web pages. These sources include blogs, social networking sites, and photo and video sharing sites.
Dynamic ranking techniques using expert advice
Real-world data streams are noisy and often exhibit concept drift, making the learning task very challenging and non-trivial.This project explores online learning approaches to ranking with concept drift, using weighted majority techniques. By continuously modeling different snapshots of the data and tuning a measure of belief in these models over time, it is possible to capture changes in the underlying concept and adapt the algorithm's predictions accordingly.
This research was inspired by the problem of generating real-time rankings of the components of an electrical system, according to their susceptibility to impending failure.
Crunch: Automatic context-based content extraction
Crunch is a framework that employs an easily extensible set of techniques, for enabling and integrating heuristics concerned with content extraction from HTML web pages. In particular, I worked on reducing human involvement in the application of thresholds for the heuristics by automatically detecting and utilizing the content genre (context) of a given website. This was accomplished by developing a method for clustering a large corpus of websites according to their genre.