COMS E6261: Advanced
Cryptography
Spring 2010: Data Privacy
Lecture Topics
- Lecture 1 (1/20): Introduction to class, motivation for data privacy, brief overview of secure
multi-party computation approach ("how") and contrast with the approach in this class
("what"), an even briefer overview of other approaches we're not taking in this class.
- Lectre 2 (1/27): Differential privacy: Impossibility result in the presence of auxiliary
information (statistical/anonymized database/any summarythat has utility, must leak some
information, even if the individual is not in the database.). Setting and definition of
epsilon-differential privacy (interactive vs non-interactive model, discussion of the
definition and alternatives, etc). Overview of what's coming up next -- output
perturbation based approach to achieving differential privacy.
Reading for Lecture 2:
Differential Privacy.
C. Dwork. ICALP 06.
Students were encouraged to attend the following talk:
Network Trace Anonymization.
- Lecture 3 (2/3): Perturbation based approach for achieving differential privacy for sum functions
(generalized to low sensitivity functions).
Required readings:
Revealing
Information while preserving privacy. I. Dinur, K. Nissim. PODS
03.. Presented by: Krzysztof Choromanski.
Practical
Privacy: The SuLQ Framework. A. Blum, C. Dwork,
F. McSherry, K. Nissim. PODS 05.
Optional readings:
Calibrating Noise Sensitivity in Private Data Analysis.
C. Dwork, F. McSherry, K. Nissim, A. Smith. TCC 06.
Privacy Preserving Datamining on Vertically Partitioned Databases.
C. Dwork. K. Nissim. Crypto 04.
What we actually did:
Krzystof described the lower bounds from the Dinur/Nissim paper listed above.
He then started to discussing his extension to the Dinur/Nissim result,
based on the Chaidee/Tuntapthai paper listed below.
- Lecture 4 (2/17):
Geetha will discuss positive results--private
output perturbation algorithms for functions such as sums, and its
generalization. She will discuss (as time permits) results from the
line of papers Dinur-Nissim, SuLQ, sensitivity paper (see reading list).
Whatever she does not get to will be covered in the next lecture.
Required readings:
Practical
Privacy: The SuLQ Framework. A. Blum, C. Dwork,
F. McSherry, K. Nissim. PODS 05.
Calibrating Noise Sensitivity in Private Data Analysis.
C. Dwork, F. McSherry, K. Nissim, A. Smith. TCC 06.
- Lecture 5 (2/24):
Krzysztof will describe the Berry Essen bound for non-iid random variables
(link below), and then show his extension of the Dinur/Nissim
negative result that uses this theorem (he will focus on presentation of the theorems,
and not on proof details -- maybe just a quick sketch of how the proof goes relative to
the proof of the original DN theorem that we saw last week). He will also describe
negative results for a wider class of functions (not just sums).
Optional readings:
Berry-Esseen Bounds for Random Sums of Non-i.i.d. Random Variables.
N. Chaidee and M. Tuntapthai. Int. Math. Forum 09.
What we actually did:
Krzysztof started by proving that the Dinur-Nissim algorithm (for the lower bound) can be applied even to the case
where we allow the adversary to recover all but nf(n) bits where f(n) is o(1) but 1/f(n) is O(n^{1/3}) (*).
He showed that as long as the perturbation is of the order (sqrt(nf(n))), the Dinur-Nissim algorithm still allows
the adversary to recover all but nf(n) of the bits. The remaining open question is whether (*) is necessary.
The proof used by Krzysztof uses a classic version of the Berry-Esseen inequality, which must assume (*).
There may be another way to prove the result, without using the Central Limit Theorem.
Next, Krzysztof established a family of functions for which there is a modification of the Dinur-Nissim algorithm
(again, for the lower bound) such that as long as the perturbation is of the order (sqrt(n)),
the adversary can recover all but epsilon*n of the bits. He calls such a family a "quasi-linear" family of
functions. Briefly, functions belonging to this family must be sensitive on small changes and
should not blow up or increase rapidly.
- Lecture 9 (3/24): Sharadh and Tal presented the paper:
Distributed private data analysis: Simultaneously solving how and what. A.
Beimel, K. Nissim, and E. Omri. CRYPTO 08
- Lecture 12 (4/14): Dana presented the paper:
When and How Can Data be Efficiently Released with Privacy? C. Dwork,
M. Naor, O. Reingold, G. Rothblum, S. Vadhan. STOC 09
The following is a quote from the abstract:
We show that when
|C|
and |X|
are both polynomial in k, it is possible to efficiently (in k) privately
construct synthetic data sets maintaining approximately correct counts, even
when the original data set is very small (roughly, O(2^{\sqrt{\log
|C|
}}
\log |X|
)).
- Lecture 13 (4/21): Seung Geol and Shannon presented the paper:
Computational Differential Privacy. I. Mironov, O. Pandey, O. Reingold,
S. Vadhan. CRYPTO 09