[Continued from Grasping the normal distribution (Part 2)]
Computing the odds
It's time for a little math. Let N be the number of dice for a given experiment. The smallest value we can get from the throw comes when all the die are showing 1's, so the score must be N. Likewise, the largest value must be 6N.
Recall that, in Figures 3 through 6, the height of each bar is the number of ways a throw can generate a given result. Let the height for bar n be wn. Then, borrowing the form of Equation 6, we can write:
(10)
Where, don't forget, W is the total number of ways we can arrange N dice. It's the sum of all the w's.
(11)
I should say a word about the range. We get the smallest non-zero result when all the dice are showing 1's. Likewise, we get the largest result when they're all showing 6's. You can always make the range wider if you like—even ±ì, since the w's are all zeroes outside the range shown. From now on, I won't show the range explicitly.
Since W is also the total number of ways we can arrange N dice, we know that it must be:
(12)
The first few values are:
(13)
You can verify the smaller numbers for yourself—just count the number of 1-unit "bricks" in each column. I have to admit that I let the computer verify the larger numbers.
We need a graph
Armed with Equation 10, it's easy enough to convert repeating Figures 3 through 6 probability charts. I won't bore you by showing them here; they'll look just like the histograms, except for the for the vertical scale. I'll only note that the sum of all the probability bars must be:
(14)
As must be the case for any decent probability function (did this surprise you?)
Instead of repeating the bar graphs, I'd like to corral all the probability distribution functions into a single graph. To do that I have to switch from Excel's bar-chart format to the x-y (scatter) plot format. Figure 7 shows the result.
.
Well, we did manage to get all the curves on the same graph, and they look very nice, don't they? Let me remind you, though, that the curves don't really exist at all—they're still bar graphs in disguise. When you look at the figure, your brain naturally sees continuous curves. Remember, though, that we're still talking discrete values here. The only data items occur at the marker points on the graph. The lines are there only to show which marker points are in which group.
I suppose I could have emphasized the integer-only nature of the data by leaving off the connecting lines, but trust me: that graph looks even more confusing.
Normalizing the x-axis
Looking at the "curves" in Figure 7, the transition from straight lines to swoopy curves is hard to miss. But the transition would look even more convincing if we could get the curves on the same horizontal scale. We can certainly do that, by scaling the x-values to a specified range. But the scaling is a little tricky, mainly because there are no mathematical rules as to how to do it. I just made some arbitrary decisions, which were:
1. The peak should be centred at x = 0, as any good error curve should be
2. The horizontal scale should range from -1 to +1
3. Each curve should start with one and only one zero (not the four, say, of the five-dice case)
Let's see how the scaling works out. Table 1 shows the essential x-axis range information for each "curve." The Min and Max columns include the single zero values.
From the table, we can write some equations. If N is the number of dice, then:
(15)
So to scale to the range -1..1, we need the scale factor:
(16)
After scaling, we can translate the x-values left by subtracting 1. The scaled results are shown in Figure 8.
Did we get it right?
What's that you say? Something looks wrong? You were expecting curves enclosing equal areas?
Ah, there's the rub. You're still looking at the graph and seeing continuous curves—a mistake that's even easier to justify, considering that the new x-values are no longer integers. But, as before, Figure 8 is really still a bar chart in disguise. The data still only exist at the marker points.
You're expecting curves enclosing equal areas because you're jumping ahead of me. You know that all the probabilities for a given "curve" should add up to 1, and you're thinking "integral under the curve." But, of course, there is no integral under the curve (yet), because there's still no curve—only the lone data points where the markers are. Or, equivalently, the heights of the bars in the bar charts.
[To be continued at Grasping the normal distribution (Part 4)]
用户1406868 2013-9-14 00:23