For deriving upper bounds both in the stochastic online learning setting and the multi-armed bandit problems we will use two classical tools: Hoeffding's inequality in combination with union bounds.

In this post we will prove a corollary that allows us to control the deviation of the empirical mean from the true mean for all $$t \leq n$$, given $$k \leq K$$ sources.

That is, we will look at $$X_{k,1}, \ldots, X_{k,n}$$ i.i.d. random variables and bound the probability of the following event

$$\bigg \{ \bigcap\limits_{k \leq K} \bigcap\limits_{t \leq n}|\frac{1}{t} \sum_\limits{i\leq t} X_{k,i} - \mu_k | \leq c \bigg \} \tag{1}\label{intro}$$

for some constant $$c$$, which we will specify at the end of this article.

## Theorem 1 (Hoeffding's inequality)

Let $$X_1, \ldots, X_n$$ be i.i.d. random variables such that their distribution has support in $$[0,1]$$, and mean $$\mu$$. It holds that

$$\mathbb{ P }\bigg(|\frac{1}{n} \sum_\limits{i=1}^n X_i - \mu| > \epsilon \bigg) \leq 2\exp\left(-2n\epsilon^2\right)$$

Equivalently it holds with probability larger than $$1-\delta$$ that

$$|\frac{1}{n} \sum_\limits{i=1}^n X_i - \mu| \leq \sqrt{\frac{\log(2/\delta)}{2n}}.$$

## Theorem 2 (Union bound)

Let $$\xi_1, \ldots, \xi_n$$ be events of probability $$1 - \delta_1, \ldots, 1 - \delta_n$$. It holds that

$$\mathbb{ P }\left(\bigcap\limits_{i}\xi_i \right) \geq 1 - \sum\limits_i \delta_i.$$

A proof for Hoeffding can be seen e.g. on Wikipedia. Union bound can be proven by looking at the complement of the event and using the sub-additivity of the probability measure.

The Hoeffding inequality gives us an upper bound on the probability that the empirical mean deviates from the expected value by more than a certain amount. Note that this holds for an arbitrary but fixed $$n$$. The following corollary provides us an upper bound for all $$t \leq n$$.

## Corollary (Hoeffding and union bound)

Let for any $$k \leq K, X_{k,1}, \ldots, X_{k,n}$$ be i.i.d random variables such that their distribution $$\nu_k$$ has support in $$[0,1]$$, and mean $$\mu_k$$. It holds that

$$\mathbb{ P }(\exists k \leq K, \exists t\leq n, |\frac{1}{t} \sum\limits_{i\leq t} X_{k,i} - \mu_k | > \frac{\epsilon}{\sqrt{t}}) \leq 2nK \exp\left(-2\epsilon^2\right)$$

### Proof

By Hoeffding we know that for some $$t \leq n$$

\begin{equation} \mathbb{ P }(|\frac{1}{t} \sum\limits_{i\leq t} X_{k,i} - \mu_k | > \frac{\epsilon}{\sqrt{t}} ) \leq 2 exp\left(-2\epsilon^2\right) \tag{2}\label{bound} \end{equation}

With the sub-additivity of the probability measure we get

\begin{align*} &\mathbb{ P }\bigg(\exists k \leq K, \exists t\leq n, |\frac{1}{t} \sum\limits_{i\leq t} X_{k,i} - \mu_k | > \frac{\epsilon}{\sqrt{t}} \bigg) \\ &= \mathbb{ P }\bigg(\bigcup\limits_{k \leq K} \bigcup\limits_{t \leq n} \bigg\{ |\frac{1}{t} \sum\limits_{i\leq t} X_{k,i} - \mu_k | > \frac{\epsilon}{\sqrt{t}} \bigg\} \bigg) \\ &\leq \sum\limits_{k\leq K}\sum\limits_{t\leq n} \mathbb{ P }\left( |\frac{1}{t} \sum\limits_{i\leq t} X_{k,i} - \mu_k | > \frac{\epsilon}{\sqrt{t}} \right) \\ &\overset{\eqref{bound}}{\leq} \sum\limits_{k\leq K}\sum\limits_{t\leq n} 2 \exp\left(-2 \epsilon^2 \right)\\ &\leq Kn 2 \exp\left(-2 \epsilon^2 \right) \tag*{$\blacksquare$} \end{align*}

## Implications

Let $$\epsilon = \sqrt{\frac{\log(2nK/\delta)}{2}}$$. It follows

$$\mathbb{ P }\bigg(\exists k \leq K, \exists t\leq n, |\frac{1}{t} \sum\limits_{i\leq t} X_{k,i} - \mu_k | > \sqrt{\frac{\log(2nK/\delta)}{2t}} \bigg) \leq \delta.$$

So for the complement we get

$$\mathbb{ P }\bigg(\color{royalblue}{\forall k \leq K}, \color{royalblue}{\forall t\leq n}, |\frac{1}{t} \sum\limits_{i\leq t} X_{k,i} - \mu_k | \color{royalblue}{\leq} \sqrt{\frac{\log(2nK/\delta)}{2t}} \bigg) \color{royalblue}{> 1- \delta} .$$

Using the set notation once more, we get

$$\mathbb{ P }\bigg(\color{royalblue}{\bigcap\limits_{k \leq K}} \color{royalblue}{\bigcap\limits_{t \leq n}} |\frac{1}{t} \sum\limits_{i\leq t} X_{k,i} - \mu_k | \leq \frac{\epsilon}{\sqrt{t}} \bigg) > 1- \delta$$

and we finally have the bound on the event $$\eqref{intro}$$ from the beginning of this post.

Over the next few posts we will use the above technique for proving upper bounds of the Follow the Leader and UCB algorithm.

Getagged mit: