450 Computer Science Building
Columbia University 1214 Amsterdam Avenue New York, NY 10027 |
Office: (212)939-7054
Home: (212)663-7211 junr@cs.columbia.edu http://www.cs.columbia.edu/~junr |
Education
Ph.D. | Computer Science, Columbia University (expected
in May 2000)
Advisor: Prof. Kenneth A. Ross |
1995-2000 |
M.Sc. | Computer Science, Columbia University
GPA: 4.094/4.0 |
1996 |
B.E. | Tsinghua University, Beijing, China
major in Computer Science, minor in Applied Mathematics GPA: 90/100 |
1994 |
Honor
Intern Excellence Award, Sybase IQ | Summer 1997 |
I was the only one among more than 60 interns in the northeastern part of Sybase who was given such an award. |
Dissertation: Advanced Query Processing
Complex queries such as correlated queries are common in decision support systems. Traditional nested iteration evaluation method is time-consuming and query rewriting technique does not always apply. I have designed and implemented a new ``invariant'' technique that can evaluate arbitrary correlated queries efficiently. The technique recognizes the part of the subquery that is uncorrelated and tries to cache and reuse the invariant result. The method can be integrated smoothly into a conventional cost-based optimizer. |
As random access memory gets cheaper, it becomes possible to keep database tables memory resident. An important factor in main memory query processing is the cache behavior. Cache miss penalty is high, given the current speed gap between cache access and main memory access. I propose two cache conscious main memory indexing structures: Cache Sensitive Search (CSS) Trees and Cache Sensitive B+(CSB+) Trees. By better utilizing each cache line, both indexes are faster in searching than existing indexing structures such as B+-Trees and T-Trees. While CSS-Trees are designed for OLAP environment where updates are batched and infrequent, CSB+-Trees support incremental updates in a fashion similar to B+-Trees and can be used for a more general purpose. |
Patents
Research Assistant, Columbia University | 1995-present | |||||||
|
||||||||
Summer Intern, IBM Almaden Research Center, San Jose, CA | Summer 1999 | |||||||
Mentors: Dr. Guy Lohman and Dr. Bruce Lindsay. | ||||||||
I worked on outerjoin and antijoin optimization. Outerjoin and antijoin reordering differs from inner join reordering in the fact that not all the join orders give the correct result. I designed an algorithm for reordering outerjoins and antijoins that can be smoothly incorporated into commercial systems such as DB2. The design distinguishes two approaches: with compensation and without compensation. The former only considers valid join orders and the latter allows invalid join orders and then compensates the incorrect result. I also did some work on cardinality and cost estimation for outerjoins and antijoins. The cardinality estimation avoids bias and guarantees that different join orders of the same subplan will have the same join cardinality. The work has been prototyped in DB2. | ||||||||
Summer Intern, Microsoft Research Lab, Redmond, WA | Summer 1998 | |||||||
Mentor: Dr. Surajit Chaudhuri. | ||||||||
I worked on sampling on queries. As data set gets larger in OLAP systems, query results tend to be large also. Users may not be exactly sure what they want initially. It then makes sense to first provide users with just a random sample of the result. A sample can be obtained much faster if we can avoid materializing the complete result set. The challenge is to reduce the evaluation time and guarantee the randomness of the sample. I designed several techniques for join queries and had them prototyped in Microsoft's SQL Server system. The result shows that our techniques can give a reasonably good random sample using just a small fraction of the full evaluation time. | ||||||||
Summer Intern, Sybase IQ, Burlington, MA | Summer 1997 | |||||||
Mentor: Dr. Roger MacNicol. | ||||||||
I worked on the optimization of correlated queries. The idea is to find the invariant part of the subquery in an execution plan and to avoid the unnecessary reexecution of the invariant part by caching and reusing the invariant result. I found a method of incorporating the invariant technique seamlessly into an existing join optimizer. I finished most of the implementation in a commercial version of Sybase IQ within three months. In follow-up work, I also investigated the tradeoffs between the invariant technique and the query rewriting technique. |
Teaching Experience
Instructor, Columbia University | 1997 |
Taught ``Data Structures in C.'' This is a 3-credit course for non-cs undergraduate students. The class had about 40 students. | |
Teaching Assistant, Columbia University | 1997 |
Course: Data Structures and Algorithms | |
Teaching Assistant, Columbia University | 1996 |
Course: Database Systems |
Work Experience
Software Engineer, Hua Yi Software Company, Beijing, China | 1994-1995 |
I participated in the implementation of a MIS system for an Import/Export company using Sybase and PowerBuilder over a Novell network. I also designed and implemented an automatic report generating system. Additionally, I was the database administrator of our system. |
Professional Activities
Referee for the ACM SIGMOD and VLDB conferences, the ACM TODS journal, the VLDB Journal and the International Journal on Cooperative Information Systems (IJCIS). |
Departmental Service
Ph.D. Admission Committee Member | 1997, 1998 |
Implemented a web interface to facilitate the process of applications to the Ph.D. program of the Computer Science Department. |