We consider the problem of learning {\em mixtures of product distributions over discrete domains} in the distribution learning framework introduced by Kearns et al.~\cite{KMR+:94}. We give a $\poly(n/\eps)$ time algorithm for learning a mixture of $k$ arbitrary product distributions over the $n$-dimensional Boolean cube $\{0,1\}^n$ to accuracy $\eps$, for any constant $k$. Previous polynomial time algorithms could only achieve this for $k = 2$ product distributions; our result answers an open question stated independently in~\cite{Cryan:99} and~\cite{FreundMansour:99}. We further give evidence that no polynomial time algorithm can succeed when $k$ is superconstant, by reduction from a notorious open problem in PAC learning. Finally, we generalize our $\poly(n/\eps)$ time algorithm to learn any mixture of $k = O(1)$ product distributions over $\{0,1, \dots, b\}^n$, for any $b = O(1)$.