The Russian Tail Bounds
We record a simple proof of Markov's inequality via indicator functions, then discuss how how to tighten it using various transformations.
16 March 2025
Markov's inequality is fundamentally a statement about indicator functions of superlevel sets. Given a non-negative function and a number , consider the superlevel set defined by
It is clear, by construction, the minimum that can take on is . In other words, on ,
If we partition , then we can convert (2) into an inequality on by
where denotes the indicator function of the set . Let denote the measure on . Integrating (3) yields
Dividing through by yields Markov's inequality. When is a probability measure, we can interpret this as a weak tail bound. For a non-negative random variable with finite mean ,
Let's begin improving this. Using the second moment yields an even tighter bound that gives a quadratic decay at infinity. Therefore, we additionally assume that has finite variance. The key insight is that
since squaring is monotonically increasing. Applying to (6) and using Markov's inequality yields Chebyshev's inequality
Our Pavlovian reaction to witnessing Chebyshev's inequality is to try higher and higher central moments in order to improve the polynomial decay. Chernoff does us one better by obtaining an exponential bound using all the central moments at once via the exponential function . We additionally assume the mgf of is defined in a neighborhood of , and let be a small enough free parameter. Then,
Note that in the fourth line, all the randomness is concnetrated in , allowing us to pull out the . Minimizing over in the domain where the mgf of is defined yields Chernoff's inequality. It's important to note that we've dropped the absolute value in the discussion of Chernoff, so this is really a 1-sided bound, i.e. we implicitly assumed that lies above its mean.