Machine Learning For The Power Gride

We proposed machine learning models and algorithms to rank the underground primary feeders and transformers of New York City's electrical grid according to their susceptibility to outages. Funding for this work is provided by a contract between Columbia University and the Consolidated Edison Company of New York.
Rule Learning

We explored mining frequent patterns indatabases. This expensive step is common to several data mining tasks. Westudied, in particular, the mining of association and characterization rules,both relying on mining frequent patterns.
On the one hand, we have studied mining frequent patterns in theso-called 'transaction databases'. A transaction database is a finite multi-setof transactions, each one composed of a set of items, called an itemset.We propose a boolean based approach for mining frequent itemsets. The ideais to represent a transaction database by a function taking as inputs boolean variables andgiving integer output values. The undertaken study shows theeffectiveness and efficiency of the approach to represent and to load densetransaction databases in memory, but also the interest of theuse of this condensed format for mining maximal frequent itemsets.
On the other hand, mining frequent patterns in databases storing objects and theirrelationships, such as relational and geographic databases, is not straightforward, becauseof a large and complex search space. We propose an original framework for mining characteristic rules. A frequent pattern inthis case is a rule characterizing concisely a target set of objects according to the objects linked to themdirectly or indirectly. This framework relies on new concepts such as theconcept of quantified path defined on the objects related to the target set of objects and the propertiesdescribing these objects.
Knowledge Discovery in Geographic Data

We propose approaches for knowledge discovery in geographic information systems, from the field of mineral exploration, developed by the BRGM: French Geological Survey. We addressed two tasks: mining association and characterization rules, and thus tested our approaches to real databases.
Crowd Labeling

Crowd Labeling emerged from the need to label large-scale and complex data, a tedious, expensive and time-consuming task. Each object to label is generally annotated by multiple crowd labelers, and the collected labels are combined to infer one final estimated label. One open problem is the quality and integration of different labels, especially when the labelers participating in the task are of unknown expertise. In order to address this challenge, we propose a new framework that automatically combines and boosts bulk crowd labels with a limited number of ground truth labels from experts. We show through extensive experiments that, unlike other state-of-the-art approaches, our method is robust to estimate true labels even with the presence of a large proportion of not-so-good labelers in the crowd.
