We give an algorithm that learns any monotone Boolean function $\fisafunc$ to any constant accuracy, under the uniform distribution, in time polynomial in $n$ and in the decision tree size of $f.$ This is the first algorithm that can learn arbitrary monotone Boolean functions to high accuracy, using random examples only, in time polynomial in a reasonable measure of the complexity of $f.$ A key ingredient of the result is a new bound showing that the average sensitivity of any monotone function computed by a decision tree of size $s$ must be at most $\sqrt{\log s}$.
We generalize the basic inequality and learning result described above in various ways; specifically, to partition size (a stronger complexity measure than decision tree size), $p$-biased measures over the Boolean cube (rather than just the uniform distribution), and real-valued (rather than just Boolean-valued) functions.