COMS E6261: Advanced Cryptography

Spring 2010: Data Privacy

Main Page Project Page Additional Papers

Lecture Topics

Lecture 1 (1/20): Introduction to class, motivation for data privacy, brief overview of secure multi-party computation approach ("how") and contrast with the approach in this class ("what"), an even briefer overview of other approaches we're not taking in this class.
Lectre 2 (1/27): Differential privacy: Impossibility result in the presence of auxiliary information (statistical/anonymized database/any summarythat has utility, must leak some information, even if the individual is not in the database.). Setting and definition of epsilon-differential privacy (interactive vs non-interactive model, discussion of the definition and alternatives, etc). Overview of what's coming up next -- output perturbation based approach to achieving differential privacy.
Reading for Lecture 2:
Differential Privacy. C. Dwork. ICALP 06.
Students were encouraged to attend the following talk: Network Trace Anonymization.
Lecture 3 (2/3): Perturbation based approach for achieving differential privacy for sum functions (generalized to low sensitivity functions).
Required readings:
Revealing Information while preserving privacy. I. Dinur, K. Nissim. PODS 03.. Presented by: Krzysztof Choromanski.
Practical Privacy: The SuLQ Framework. A. Blum, C. Dwork, F. McSherry, K. Nissim. PODS 05.
Optional readings:
Calibrating Noise Sensitivity in Private Data Analysis. C. Dwork, F. McSherry, K. Nissim, A. Smith. TCC 06.
Privacy Preserving Datamining on Vertically Partitioned Databases. C. Dwork. K. Nissim. Crypto 04.
What we actually did: Krzystof described the lower bounds from the Dinur/Nissim paper listed above. He then started to discussing his extension to the Dinur/Nissim result, based on the Chaidee/Tuntapthai paper listed below.
Lecture 4 (2/17): Geetha will discuss positive results--private output perturbation algorithms for functions such as sums, and its generalization. She will discuss (as time permits) results from the line of papers Dinur-Nissim, SuLQ, sensitivity paper (see reading list). Whatever she does not get to will be covered in the next lecture.
Required readings:
Practical Privacy: The SuLQ Framework. A. Blum, C. Dwork, F. McSherry, K. Nissim. PODS 05.
Calibrating Noise Sensitivity in Private Data Analysis. C. Dwork, F. McSherry, K. Nissim, A. Smith. TCC 06.
Lecture 5 (2/24): Krzysztof will describe the Berry Essen bound for non-iid random variables (link below), and then show his extension of the Dinur/Nissim negative result that uses this theorem (he will focus on presentation of the theorems, and not on proof details -- maybe just a quick sketch of how the proof goes relative to the proof of the original DN theorem that we saw last week). He will also describe negative results for a wider class of functions (not just sums).
Optional readings:
Berry-Esseen Bounds for Random Sums of Non-i.i.d. Random Variables. N. Chaidee and M. Tuntapthai. Int. Math. Forum 09.
What we actually did:
Krzysztof started by proving that the Dinur-Nissim algorithm (for the lower bound) can be applied even to the case where we allow the adversary to recover all but nf(n) bits where f(n) is o(1) but 1/f(n) is O(n^{1/3}) (*). He showed that as long as the perturbation is of the order (sqrt(nf(n))), the Dinur-Nissim algorithm still allows the adversary to recover all but nf(n) of the bits. The remaining open question is whether (*) is necessary. The proof used by Krzysztof uses a classic version of the Berry-Esseen inequality, which must assume (*). There may be another way to prove the result, without using the Central Limit Theorem.
Next, Krzysztof established a family of functions for which there is a modification of the Dinur-Nissim algorithm (again, for the lower bound) such that as long as the perturbation is of the order (sqrt(n)), the adversary can recover all but epsilon*n of the bits. He calls such a family a "quasi-linear" family of functions. Briefly, functions belonging to this family must be sensitive on small changes and should not blow up or increase rapidly.
Lecture 9 (3/24): Sharadh and Tal presented the paper:
Distributed private data analysis: Simultaneously solving how and what. A. Beimel, K. Nissim, and E. Omri. CRYPTO 08
Lecture 12 (4/14): Dana presented the paper:
When and How Can Data be Efficiently Released with Privacy? C. Dwork, M. Naor, O. Reingold, G. Rothblum, S. Vadhan. STOC 09
The following is a quote from the abstract:
We show that when |C| and |X| are both polynomial in k, it is possible to efficiently (in k) privately construct synthetic data sets maintaining approximately correct counts, even when the original data set is very small (roughly, O(2^{\sqrt{\log |C|}} \log |X|)).
Lecture 13 (4/21): Seung Geol and Shannon presented the paper:
Computational Differential Privacy. I. Mironov, O. Pandey, O. Reingold, S. Vadhan. CRYPTO 09