Nassim Nicholas Taleb’s Archer Example for Nonexistence of the Expected Value with a little Python

Example from Nassim Taleb for \(E(g(x)) \neq g(E(x))\)

The Problem:

An archer stands one meter away from a wall and shoots uniformly randomly to his right with his angle between zero and \( \pi / 2 \) . Mark a spot right in front of the archer on the wall. What is the average distance between the arrows mark and that spot?

 

Intuitively it seems plausible that this average distance does not exist. This can be examined analytically and visualized with Monte Carlo Simulations.

The angle is uniformly distributed between 0 and \( \pi \):

Angle \(\alpha \sim U[0, 2\pi]\)

Thus the distance becomes a random variable depending on the angle:  \(D = 1 \cdot tan(\alpha)\)

The average distance corresponds to the expected value of D = \(E(D)\) .

The angle is continously distributed with the probability density function \(f(x)=\frac{1}{\frac{ \pi}{2}}\) , so could try to calculate \(E(D)\) as:

\( E(D) = \int_0^{\frac{\pi}{2} } \tan(\alpha) \cdot f(\alpha) d \alpha = \int_0^{\frac{\pi}{2} } \tan(\alpha) \cdot \frac{1}{\frac{ \pi}{2}} d \alpha = \frac{2}{\pi} \cdot \int_0^{\frac{\pi}{2} } \tan(\alpha) d \alpha \)

It is obvious that the upper bound of the integral has to be excluded from the domain of the tan(x)-Function, so the Integral can not be calculated straight forward with the antiderivative. Still, it could be an improper integral, so one idea could be to „sneak“ to the upper bound from within the domain of tan(x) and examine the boundary value of the integral. Let’s try:

We set the upper bound to \(\frac{\pi}{2} – \epsilon \) and then find out what happens to the value of integral for \(\epsilon \rightarrow 0 \) .

\(\frac{2}{\pi} \cdot \int_0^{\frac{\pi}{2} – \epsilon} tan(\alpha)d\alpha = \left[-\ln(|\cos(\alpha)|)\right]_0^{\frac{\pi}{2} – \epsilon} = -\ln(|\cos(\frac{\pi}{2} – \epsilon)|) – \ln(|\cos(0)|) = -\ln(|\cos(\frac{\pi}{2} – \epsilon)|)\)

For \(\epsilon \rightarrow 0 \) the result is \(\cos(\frac{\pi}{2} – \epsilon) \rightarrow 0\) and thus

\(\lim\limits_{\epsilon \rightarrow 0}{\left( -\ln(|\cos(\frac{\pi}{2} – \epsilon)|) \right) } = \infty \)

or to be precise, this boundary value does not exist and thus also the integral for the expected value does not exist. The analytical examination shows the nonexistence of the expected value E(D) for the regarded problem.

How does this nonexistence show itself in in (simulated) data? Let’s find out using some Monte Carlo and Python!

The simulation examples shows the mean values of \(\alpha\) and $D$ as approximations for \(E(\alpha)\) and \(E(D)\) .

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
%matplotlib inline
 
import numpy as np
import matplotlib.pyplot as plt
import math
 
n_sim = 1000
angle_means = np.zeros(n_sim)
distance_means = np.zeros(n_sim)
for i in range(0 ,  n_sim):
    angle = np.random.uniform(low=0, high= (np.pi/2.0), size=(i+1,))
    distance = np.tan(angle)
    angle_means[i] = np.mean(angle)
    distance_means[i] = np.mean(distance)
 
 
plt1 = plt.plot(angle_means)
plt.title("Mean of Angles as a Function of n")
plt.xlabel("n")
plt.ylabel("Mean of Angle")
plt.grid()
plt.savefig("angle_means.png", bbox_inches='tight')
 
plt.show()
 
 
plt1 = plt.plot(distance_means)
plt.title("Mean of Distances as a Function of n")
plt.ylabel("Mean of Distances")
plt.xlabel("n")
plt.yscale('log')
plt.grid()
plt.savefig("distance_means.png", bbox_inches='tight')
plt.show()

It can be  observed that the mean value of the „unproblematic“ variable \(\alpha\) stabilizes  and converges to the expected value of \(\frac{\pi}{4} \approx \) 0.785 . No convergence is visible for the mean of the „problematic“ variable of the distance. This is how data look like when the first moment (\(=E(D) \)) does not exist. During the first 100 „shots“ it can be seen how the mean appears to be rising due to the unsymmetric distribution allowing positive high distance values, but not the opposite.

Perhaps you wonder why the code allows \(\frac{\pi}{2} \) as an input value for the tan-function. The reason is that the tangens here is approximated by numerical calculations.