II. Combining Uncertain Numbers: Diversification
Another Risky Business
Imagine that before committing to the start up business discussed in section
I, another business opportunity presents itself. In this second potential
venture, the uncertainty in revenue can be modeled as if God
spins two spinners, takes the average of the two
resulting numbers, then multiplies that by $1Million. Again you will face
personal bankruptcy if your revenues are less than $200,000.
Tutorial 2a - Simulating Uncertain Revenue of the Second Business
A Monte Carlo simulation of the average of two spins of the spinner will
provide insight into the second business opportunity.
PUZZLE 2: Before going on to look at the histogram of the average of
two spins, estimate this shape on your own using histogram.xls.
- On a blank
worksheet place =(RAND()+RAND())/2 in a cell as shown below. Make sure to
get the parentheses correct. Press the F9 key a few times to get a feel
for the uncertainty.

- Invoke the
Simulate, Simulation command.
- Specify cell
A2 as the output cell and cell A1 as the output name. Specify the
number of trials, then click OK.
- As with the
first business example, the average is about 0.5, indicating expected
revenue of $500,000.

- Now invoke the Simulate,
Graphs command and draw a histogram with 10 bins and 2 decimal places. You
should get a graph like the one below.

This time, unlike the single spin, the histogram
goes up in the middle. Because the bars must total 100%, this implies that it
goes down at the ends. Therefore the chance of bankruptcy, along with the
chance of making a full $Million must be lower with the second business than
with the first.
Before we examine why it went up in the middle take
a look at the cumulative distribution.
- Click on the
Cumulative button (if you closed the graph dialog box re-open it using
Simulate, Graphs). Instead of a straight line, the cumulative distribution
for the second business is a curve. By looking at 0.2 on the x axis, we
see that this business has only around an 8% chance of sending you to the
poor house, less than half the chance of disaster of the first venture.

Diversification and The Central Limit Theorem
To understand why the histogram of the average of two spins went up in the
middle, consider the numbers that can arise when you roll a single die versus a
pair of dice. With a single die there are six things that can happen with equal
likelihood. With a pair of dice, there are eleven things that can happen. But
they are not equally likely because some numbers can be formed by more
combinations than others as shown below.
| It's no different with the spinner. There are many
combinations of two spins that average around .5, (.1 and .9, .2 and .8, .3 and
.7, and so on.) but very few, that average around 0 (you need to spin two 0’s
in a row). Diversify across two dice or two spinners, and the
distribution goes up in the middle and down on the ends. This is the primary
effect of diversification, as described in the old adage about not putting
"all your eggs in one basket." |
|
Diversification is intimately related to the important central
limit theorem, which is illustrated by the graph on the right. This
is the result of simulating a single spin, the average of 2 spins, 3 spins
and so on up to the averages of 10 spins, each in a separate output cell.
This graph was formed using the Simulation, Common Graphs command. All
outputs were selected, then a smoothed 2D Line histogram was then created. As
you can see, the histogram becomes progressively more centralized, until, by
10 spins, it is indistinguishable from the famous normal
distribution.
|
|
|
The incredible thing about the central limit theorem is
that it states that you can start with almost any distribution,
not just the flat histogram of the spinner. The illustration on the right
shows the same progression of 1 sample through the average of 8 independent
samples, but this time the original distribution is bi-modal. Note how
quickly it approaches the normal bell shaped curve as more and more sample
from this distribution are averaged together.
|
|
|
This animation was made with the Interactive Histogram option of the Common
Graphs command. The file Bimodal.xls contains the model from which this was
generated.
NOTE: Since the average is just the sum of the trials divided by the number
of trials, sums of uncertain numbers also tend to be normal. Strictly speaking, for the
central limit theorem to hold, the samples must be independently drawn from the
same distribution. We will discuss independence further in section IV.
Some consequences of the central limit theorem
The central limit theorem plays a fundamental role in virtually all branches
of science. Some simplified examples follow:
Physics
The velocity of a given air molecule is determined by adding up the results of
all the collisions it has had with other air molecules. Hence air molecule
velocities are normally distributed.
Biology
The size of a given animal involves averages of the sizes of all its ancestors.
Hence the size of animals of a given species (along with almost everything else
you can measure in biology) is normally distributed.
Finance
Consider an initial investment I, that over time gets repeatedly multiplied
by random growth factors. Suppose, for example, that in the first month it grew
by a factor of 1.1, then shrank by a factor of .97 in the second month, and
grew by a factor of 1.2 in the third month. The value at the end of three
months would be I*1.1*.97*1.2. If you have studied logarithms you may remember
that they convert multiplications to additions. So the logarithm of the value
after three months is Log(I)+Log(1.1)+Log(.97)+Log(1.2). And therefore the log
of the value of the investment after a large number of periods is the sum of a
bunch of random numbers, and is hence normally distributed. An uncertain number
whose logarithm is normally distributed is known as log normal.
This is why investment prices are often modeled as log normal random variables.
The Normal Distribution and the Standard Deviation
By now you should be convinced of the importance of the normal distribution.
However, it must not be overdone. For example, between a third and a half of
the graduates of statistics courses that I test on the histogram of a single
spinner, erroneously think it has a bell shaped distribution (see Reference [3]).
Nonetheless it is important to know something about the normal distribution.
The standard deviation (sigma) and variance (the square of the standard
deviation, sigma^2) are measures of the width of the distribution. 68%,
95% and 99% of the normal distribution lies within 1, 2 and 3 standard
deviations of the mean, mu. In XLSim to generate a normal random
variable use gen_Normal(mu,sigma). |
When the spinner was introduced in section I. as a model of uncertain
revenue, you may have considered it far-fetched. But by combining a few
spinners, you can model all sorts of realistic phenomena. In fact in the early
days of computing, a common way to generate normal random numbers was to add
together the results of 12 simulated spinners and subtract the number 6. The
number 12 was chosen because each spinner has a variance of 1/12th, so the
resulting sum had a variance of 1. Since the average of the sum of 12 spinners
is 6, this number was subtracted off to get the mean of the random number back
to 0. The result was a Standard Normal random variable, that is with mean =0
and standard deviation =1. To this day, simulations equivalent to the spinner
are at the heart of most Monte Carlo simulation techniques.
Resampling for Matching any Distribution
The normal distribution is an idealization that serves as a useful
approximation in may cases. Other such idealized distributions include the log
normal, as discussed earlier, the binomial, Poisson, and many others with
meanings only to statisticians. Although theoretically powerful and elegant,
they are not intuitive to the average manager.
With the advent of computers, a new approach to dealing with statistics emerged
that was computational in nature instead of theoretical. The basic idea is to
draw random samples from data. This is known as resampling, and is the basis
for a modern statistical technique known as bootstrapping. This is accomplished
in XLSim with gen_Resample(Data). Imagine that each of the numbers in the range
Data are painted on ping pong balls, which are then tossed in a lottery basket.
Every time gen_Resample calculates, it spins the basket, pulls out a ball,
reads the number into the cell, and returns the ball to the basket. This
results in a random number with the same distribution as the original data.
There is a further discussion of gen_Resample in section IV. For more
information see References ([1], [4], and [6]).
|
Summary
 |
When more and more independent uncertain numbers are averaged or added together, the
histogram of the resulting new uncertain number goes up in the middle and eventually
becomes bell-shaped. |
 | Because
of this, normal distributions are used to model a wide range of uncertainties. |
 | When
sufficient past data is available it may be re-sampled directly regardless of
distribution. |
|