Introduction Section I
Uncertain
Numbers:
Random
Variables
Section II
Combining
Uncertain
Numbers:
Diversification
Section III
Plans Based on
Uncertain Inputs:
Functions of
Random Variables
Section IV
Uncertain Numbers
that Depend on
Other Uncertain
Numbers: Statistical
Dependence
Command
Reference
References Acknowledge-
ments

II. Combining Uncertain Numbers: Diversification

Another Risky Business

Imagine that before committing to the start up business discussed in section I, another business opportunity presents itself. In this second potential venture, the uncertainty in revenue can be modeled as if God spins two spinners, takes the average of the two resulting numbers, then multiplies that by $1Million. Again you will face personal bankruptcy if your revenues are less than $200,000.

Tutorial 2a - Simulating Uncertain Revenue of the Second Business

A Monte Carlo simulation of the average of two spins of the spinner will provide insight into the second business opportunity.

PUZZLE 2: Before going on to look at the histogram of the average of two spins, estimate this shape on your own using histogram.xls.

  1. On a blank worksheet place =(RAND()+RAND())/2 in a cell as shown below. Make sure to get the parentheses correct. Press the F9 key a few times to get a feel for the uncertainty.

  1. Invoke the Simulate, Simulation command.
  2. Specify cell A2 as the output cell and cell A1 as the output name. Specify the number of trials, then click OK.
  3. As with the first business example, the average is about 0.5, indicating expected revenue of $500,000.

  1. Now invoke the Simulate, Graphs command and draw a histogram with 10 bins and 2 decimal places. You should get a graph like the one below.

This time, unlike the single spin, the histogram goes up in the middle. Because the bars must total 100%, this implies that it goes down at the ends. Therefore the chance of bankruptcy, along with the chance of making a full $Million must be lower with the second business than with the first.

Before we examine why it went up in the middle take a look at the cumulative distribution.

  1. Click on the Cumulative button (if you closed the graph dialog box re-open it using Simulate, Graphs). Instead of a straight line, the cumulative distribution for the second business is a curve. By looking at 0.2 on the x axis, we see that this business has only around an 8% chance of sending you to the poor house, less than half the chance of disaster of the first venture.

Diversification and The Central Limit Theorem

To understand why the histogram of the average of two spins went up in the middle, consider the numbers that can arise when you roll a single die versus a pair of dice. With a single die there are six things that can happen with equal likelihood. With a pair of dice, there are eleven things that can happen. But they are not equally likely because some numbers can be formed by more combinations than others as shown below.
It's no different with the spinner. There are many combinations of two spins that average around .5, (.1 and .9, .2 and .8, .3 and .7, and so on.) but very few, that average around 0 (you need to spin two 0’s in a row). Diversify across two dice or two spinners, and the distribution goes up in the middle and down on the ends. This is the primary effect of diversification, as described in the old adage about not putting "all your eggs in one basket."

Diversification is intimately related to the important central limit theorem, which is illustrated by the graph on the right. This is the result of simulating a single spin, the average of 2 spins, 3 spins and so on up to the averages of 10 spins, each in a separate output cell. This graph was formed using the Simulation, Common Graphs command. All outputs were selected, then a smoothed 2D Line histogram was then created. As you can see, the histogram becomes progressively more centralized, until, by 10 spins, it is indistinguishable from the famous normal distribution.

 

The incredible thing about the central limit theorem is that it states that you can start with almost any distribution, not just the flat histogram of the spinner. The illustration on the right shows the same progression of 1 sample through the average of 8 independent samples, but this time the original distribution is bi-modal. Note how quickly it approaches the normal bell shaped curve as more and more sample from this distribution are averaged together.

This animation was made with the Interactive Histogram option of the Common Graphs command. The file Bimodal.xls contains the model from which this was generated.

NOTE: Since the average is just the sum of the trials divided by the number of trials, sums of uncertain numbers also tend to be normal. Strictly speaking, for the central limit theorem to hold, the samples must be independently drawn from the same distribution. We will discuss independence further in section IV.

Some consequences of the central limit theorem

The central limit theorem plays a fundamental role in virtually all branches of science. Some simplified examples follow:

Physics
The velocity of a given air molecule is determined by adding up the results of all the collisions it has had with other air molecules. Hence air molecule velocities are normally distributed.

Biology
The size of a given animal involves averages of the sizes of all its ancestors. Hence the size of animals of a given species (along with almost everything else you can measure in biology) is normally distributed.

Finance
Consider an initial investment I, that over time gets repeatedly multiplied by random growth factors. Suppose, for example, that in the first month it grew by a factor of 1.1, then shrank by a factor of .97 in the second month, and grew by a factor of 1.2 in the third month. The value at the end of three months would be I*1.1*.97*1.2. If you have studied logarithms you may remember that they convert multiplications to additions. So the logarithm of the value after three months is Log(I)+Log(1.1)+Log(.97)+Log(1.2). And therefore the log of the value of the investment after a large number of periods is the sum of a bunch of random numbers, and is hence normally distributed. An uncertain number whose logarithm is normally distributed is known as log normal. This is why investment prices are often modeled as log normal random variables.

The Normal Distribution and the Standard Deviation

By now you should be convinced of the importance of the normal distribution. However, it must not be overdone. For example, between a third and a half of the graduates of statistics courses that I test on the histogram of a single spinner, erroneously think it has a bell shaped distribution (see Reference [3]). Nonetheless it is important to know something about the normal distribution. The standard deviation (sigma) and variance (the square of the standard deviation, sigma^2) are measures of the width of the distribution. 68%, 95% and 99% of the normal distribution lies within 1, 2 and 3 standard deviations of the mean, mu. In XLSim to generate a normal random variable use gen_Normal(mu,sigma).

When the spinner was introduced in section I. as a model of uncertain revenue, you may have considered it far-fetched. But by combining a few spinners, you can model all sorts of realistic phenomena. In fact in the early days of computing, a common way to generate normal random numbers was to add together the results of 12 simulated spinners and subtract the number 6. The number 12 was chosen because each spinner has a variance of 1/12th, so the resulting sum had a variance of 1. Since the average of the sum of 12 spinners is 6, this number was subtracted off to get the mean of the random number back to 0. The result was a Standard Normal random variable, that is with mean =0 and standard deviation =1. To this day, simulations equivalent to the spinner are at the heart of most Monte Carlo simulation techniques.

Resampling for Matching any Distribution

The normal distribution is an idealization that serves as a useful approximation in may cases. Other such idealized distributions include the log normal, as discussed earlier, the binomial, Poisson, and many others with meanings only to statisticians. Although theoretically powerful and elegant, they are not intuitive to the average manager.

With the advent of computers, a new approach to dealing with statistics emerged that was computational in nature instead of theoretical. The basic idea is to draw random samples from data. This is known as resampling, and is the basis for a modern statistical technique known as bootstrapping. This is accomplished in XLSim with gen_Resample(Data). Imagine that each of the numbers in the range Data are painted on ping pong balls, which are then tossed in a lottery basket. Every time gen_Resample calculates, it spins the basket, pulls out a ball, reads the number into the cell, and returns the ball to the basket. This results in a random number with the same distribution as the original data. There is a further discussion of gen_Resample in section IV. For more information see References ([1], [4], and [6]).

Summary

When more and more independent uncertain numbers are averaged or added together, the histogram of the resulting new uncertain number goes up in the middle and eventually becomes bell-shaped.
Because of this, normal distributions are used to model a wide range of uncertainties.
When sufficient past data is available it may be re-sampled directly regardless of distribution.

Introduction Section I
Uncertain
Numbers:
Random
Variables
Section II
Combining
Uncertain
Numbers:
Diversification
Section III
Plans Based on
Uncertain Inputs:
Functions of
Random Variables
Section IV
Uncertain Numbers
that Depend on
Other Uncertain
Numbers: Statistical
Dependence
Command
Reference
References Acknowledge-
ments