I was wondering what the formal justification is behind the following reasoning:
Lemma On average a fair coin needs to be tossed two times before a "heads" is seen.
Proof. Let $E$ be the expected number of tosses before a heads is seen. We certainly need at least one toss, and if it's heads, we're done. If it's tails (which occurs with probability $1 / 2$ ), we need to repeat. Hence $E=1+\frac{1}{2} E$, which works out to $E=2$.
I tried to justify this formally using how $\mathbb{E}[X]=\mathbb{E}[\mathbb{E}[X \mid Y]]$ where $X$ was the number of coin flips until we saw heads, and $Y$ was whether the first flip was a heads, but this just got me the unhelpful $\mathbb{E}[X]=p + \mathbb{E}[X] - p$. So how can we formally justify the reasoning presented here?