Alpa Jain

Ph.D. Alumni

Database Research Group
Computer Science Department
Columbia University


Contact Information
450 Computer Science
500 W 120 St
New York, NY 10027
Telephone: 212-939-7117
Fax: 212-666-0140


  • Research interests: Search user assistance and behavior mining, information extraction, text mining.
  • DBLP

Publications

Papers in Refereed Journals

  1. A Quality-Aware Optimizer for Information Extraction,
    Alpa Jain and Panagiotis Ipeirotis, ACM Transactions on Database Systems (TODS), March 2009.

Papers in Refereed Conferences

  1. Synthesizing High Utility Suggestions for Rare Web Search Queries,
    Alpa Jain, Umut Ozertem, and Emre Velipasaoglu, SIGIR, 2011.
  2. Dynamic Relationship and Event Discovery,
    Anish Das Sarma, Alpa Jain and Cong Yu, WSDM, 2011.
  3. Domain-Independent Entity Extraction from Web Search Query Logs,
    Alpa Jain and Marco Pennacchiotti, WWW 2011 (short paper).
  4. Understanding the Functions of Business Accounts on Twitter,
    Ana-Maria Popescu and Alpa Jain, WWW 2011 (short paper).
  5. Organizing Query Completions for Web Search,
    Alpa Jain and Gilad Mishne, CIKM, 2010.
  6. PROBER: Ad-Hoc Debugging of Extraction and Integration Pipelines,
    Anish Das Sarma, Alpa Jain and Phil Bohannon Technical report, 2010.
  7. I4E: Interactive Investigation of Iterative Information Extraction,
    Anish Das Sarma, Alpa Jain and Divesh Srivastava, SIGMOD, 2010.
  8. Open Entity Extraction fromWeb Search Query Logs,
    Alpa Jain and Marco Pennacchiotti, COLING, 2010.
  9. FactRank: Random Walks on a Web of Facts,
    Alpa Jain and Patrick Pantel, COLING, 2010.
  10. Identifying Comparable Entities on the Web,
    Alpa Jain and Patrick Pantel, CIKM , 2009 (short paper).
  11. Exploring a Few Good Tuples From a Text Database,
    Alpa Jain and Divesh Srivastava, ICDE , 2009.
  12. Join Optimization of Information Extraction Output: Quality Matters!,
    Alpa Jain, Panagiotis Ipeirotis, AnHai Doan, and Luis Gravano, ICDE , 2009.
  13. Optimizing SQL Queries over Text Databases,
    Alpa Jain, AnHai Doan, and Luis Gravano, ICDE, 2008
  14. Acronym-Expansion Recognition and Ranking on the Web,
    Alpa Jain, Silviu Cucerzan, and Saliha Azzam, IEEE-IRI, 2007.
  15. SQL Queries Over Unstructured Text Databases,
    Alpa Jain, AnHai Doan, and Luis Gravano, ICDE, 2007 (short poster paper).
  16. Names and Similarities on the Web: Fact Extraction in the Fast Lane,
    Marius Pasca, Dekang Lin, Jeffrey Bigham, Andrei Lifchits, and Alpa Jain, COLING-ACL, 2006.
  17. Organizing the World Wide Web of Facts - Step One: the One-Million Fact Extraction Challenge, Marius Pasca, Dekang Lin, Jeffrey Bigham, Andrei Lifchits, and Alpa Jain, AAAI, 2006.
  18. Decentralized Information Spaces for Composition and Unification of Services,
    Alpa Jain and Gail Kaiser, Position paper in the Object-Oriented Web Services (OOWS) Workshop, OOPSLA Conference, 2002

Papers in Posters, and Demonstration Sessions

  1. Relational Query Processing Over Text Documents, Alpa Jain and Luis Gravano, in New York DB/IR Day (April 2005: Best Technical Presentation Award, November 2005: Honorable Mention Award)

Invited Papers

  1. Building Query Optimizers for Information Extraction: The SQoUT Project,
    Alpa Jain, Panagiotis Ipeirotis, and Luis Gravano, SIGMOD Record, Special Issue on "Managing Information Extraction," vol. 37, no. 4, December 2008.

Projects

  • SQOUT: Structured Query Processing over Text Documents: Developing efficient strategies for "structured" relational query processing over plain text documents by relying on information extraction and information retrieval techniques.

  • Past Projects
    • DISCUS: Decentralised Information Spaces for Composition and Unification of Services : A prototype framework that enables secured, ad-hoc communication between hetrogeneous software components that may span organisational boundaries, to rapidly deal with a unique and temporary problem.
    • WGC: Workgroup Cache : A system that enables enables collaboration within and among workgroups by providing a shared repository for information and thereby reducing distribution latency and costs.

Teaching

  • Database Management Systems (Fall 2004). Teaching Assistant for Prof. Gail Kaiser (Extraordinary Teaching Assistant Award).
  • Advanced Web Applications (Spring 2000). Teaching Assistant for Dr.Alfred Spector, IBM
  • Internet Communication Programming (Spring 2000). Teaching Assistant for Dr.Doree Seligmann, Lucent Technologies

Tutorials