We consider the following basic inference problem: there is an unknown high-dimensional vector w in Rn, and an algorithm is given access to labeled pairs (x,y) where x is a n-dimensional measurement and y = w.x +noise. What is the complexity of deciding whether the target vector w is (approximately) k-sparse? The recovery analogue of this problem --- given the promise that w is sparse, find or approximate the vector w --- is the famous sparse recovery problem, with a rich body of work in signal processing, statistics, and computer science.
We study the decision version of this problem (i.e. deciding whether the unknown w is k-sparse) from the vantage point of property testing. Our focus is on answering the following high-level question: when is it possible to efficiently test whether the unknown target vector w is sparse versus far-from-sparse using a number of samples which is completely independent of the dimension n?
We consider the natural setting in which x is drawn from an i.i.d. product distribution D over Rn and the noise process is independent of the input x. As our main result, we give a general algorithm which solves the above-described testing problem using a number of samples which is completely independent of the ambient dimension n, as long as D is not a Gaussian. In fact, our algorithm is fully noise tolerant, in the sense that for an arbitrary w, it approximately computes the distance of w to the closest k-sparse vector.
To complement this algorithmic result, we show that weakening any of our conditions makes it information-theoretically impossible for any algorithm to solve the testing problem with fewer than essentially log n samples. Thus our conditions essentially characterize when it is possible to test noisy linear functions for sparsity with constant sample complexity.
Our algorithmic approach is based on relating the cumulants of the output distribution (i.e. of y) with elementary power sum symmetric polynomials in w and using the latter to measure the sparsity of w. This approach crucially relies on a theorem of Marcinkiewicz from probability theory. In fact, to obtain effective sample complexity bounds with our approach, we prove a new finitary version of Marcinkiewicz's theorem. This involves extending the complex analytic arguments used in the original proof with results about the distribution of zeros of entire functions.