probability-如何使用-有什么中文资料面包板社区

相关博文

Understanding the normal distribution (Part 4)

热度 27

用户3678849

2013-5-21 13:38

3377 次阅读|

0 个评论

What if I want areas? As we've discussed, the sum of the probabilities for all possible results must add up to unity. And indeed they do, in Figure 8 . If, for a given "curve," you add up all the y -axis values at the marker points, you will absolutely get 1.0. But what if we actually prefer the "curves" to be shown such that they enclose equal areas? In that case, we have one more transformation to perform. Figure 5 gives us a hint. In that graph, the width of each bar—call it Δx —is exactly equal to 1. So the area of the bar is the same as its height. Mathematically, we can say: (17) In this special case, adding up all the occurrences is the same thing as computing the total area, which had better come out to be 7,776. The situation is the same in Figure 8 . Now the graph is showing probabilities, but the width of the (not very apparent) bar is still unity, so the area of the bar is: (18) Adding them all up, we should get: (19) Why would I want to include this new parameter, Δx , if its value is unity anyhow? For two reasons. First, the variable x may have units, like meters, volts, or pomegranates. The parameter Δx might have the value 1, but it will still have the same units as x . Mixing parameters with and without units is not allowed. More importantly, I just got through scaling the curves to force them onto the same horizontal range. In doing so, I multiplied by the scale factor given in Equation 16 . Now I see that this scale factor is in fact the very same thing as Δx . In the figure, you can see that the marker points are getting closer together as we add dice to the experiment. As in Equation 19 , the total area is no longer unity, but Δx . To force the curves to have areas of unity, I have to divide the y -values by Δx again. Since these values are no longer probabilities, I'll just call them y . Figure 9 shows the results. Now you see it ... Now here's a graph we can learn to love. Now that we have equal areas under each curve, we can see more clearly how they morph to look more like continuous curves. Not only do the (apparent) curves get smoother as we add dice, but the peak also gets higher, while the sides pinch in to maintain the equal area requirement. But hang on ... is that a fifth curve I spy? According to the legend, the dotted black line is something called " Normal ." Unlike the other "curves," it's a truly continuous curve. That, my friends, is the normal distribution function . It's taken us awhile to get to it, but the evidence of Figure 9 is overwhelming. If, seeing Figure 9 , you still aren't convinced that the sum of separate random processes trends to the bell curve of the normal distribution, there's no hope for you. Sum vs. integral Before we go forward, I want to call your attention to a very important aspect of Figure 9 . As you know, the two-dice through five-dice "curves" are not really curves at all, but discrete functions, with y -values that only exist at the marker points. But the curve labelled "normal" is very much a continuous curve. It's not often that you get to see both discrete and continuous functions on the same graph. How did we do this? The answer becomes clear when you compare the area under the curves. When I scaled the y -axis values to force the areas for the discrete curves to be unity, I required: (20) For the continuous curve, I require: (21) See how the two formulas complement each other? For the discrete version, we're measuring the area of a bar whose height is P ( n ), and whose width is Δx . Similarly, for the continuous function p ( x ) we get the area by integrating it over all real numbers. So what is this new function p ( x )? Well, it's a probability all right, but it's not just a probability that a measurement is exactly the same as the x-axis value. Since x can range over all numbers, the probability that the result is exactly equal to x is zero. Instead, p ( x ) is the probability that the measurement fall into an infinitesimally narrow range, between x and x + dx . The math of it all Now that you've seen the curve, I still must show you the math behind it. Here again, I'm given the opportunity to derive the math from first principles. But I'm going to duck it again. As I mentioned earlier, the classical derivation is pretty horrible. If you'd like to see it done the easy way, see the exquisite paper The Normal Distribution: A derivation from basic principles , Dan Teague, North Carolina School of Science and Mathematics http://courses.ncssm.edu/math/Talks/PDFS/normal.pdf To learn all there is to know about the normal distribution (including its origin, inspired by a gambler), see the exhaustive study by Saul Stahl: "Evolution of the Normal Distribution," Saul Stahl, Mathematics Magazine , Vol. 79, No. 2, April 2006, pp. 96-113 http://mathdl.maa.org/images/upload_library/22/Allendoerfer/stahl96.pdf As for my "derivation," I'm going to follow the example set by J. Willard Gibbs, the father of statistical mechanics, circa 1900. He said (and I paraphrase), "We use this form because it's the simplest one we can think of, that works." Now, that's my kind of physicist! Take another look at the shapes in Figure 9 . There are a lot of things we can say about them, without knowing anything about the mathematical formula underlying them. Indeed, if we'd been clever enough, we could have said these things from the outset. These things are: - The most probable value of x (the peak of the distribution) should be zero - The distribution should decrease monotonically as x moves away from zero - The functions should be symmetric around zero - It should tail off to zero at the extremes (which are ±∞) As soon as you hear the words, "tail off to zero," you should be thinking of an exponential function. One function that does this is: (22) But that one's no good, because it's not symmetric. In fact, it grows to infinity as x goes more and more negative. So what's the next simplest function we can think of? Why, it's the one that doesn't care if x is positive or negative: (23) This is the function Sir Willard used, and if it's good enough for him. it's good enough for me. Figure 10 shows the function in all its glory. That's definitely the shape we want. We still have to add some bric-a-brac to make it functional, but the shape is perfect. The area By now we should be very comfortable by the fact that any probability distribution curve must include an area equal to unity. Does this one? Let's find out. The area under the curve of Figure 10 is: (24) Did you see that I had to integrate from -∞ to +∞, which is of course the full range of real numbers? The function in Figure 10 sure looks as though there's little or no area out past x =±4 , but since the function never quite gets to zero, we still have to include those tiny slivers of area out in the suburbs. Now, what's the value of the integral? We can find it in a number of ways. If you're feeling adventurous and like to do things from first principles (as I usually do), you can derive the integral yourself. It's fairly easy, but not at all obvious. See how here: http://www.youtube.com/watch?v=fWOGfzC3IeY If you still have your book called Tables of Integrals , you can simply look up the answer. Your book is probably not the same as mine—mine was Pierce, printed in 1939. Or, you can do as I did: Ask Mathcad, who says: (25) Noting very astutely that is not the same thing as 1, I see that I must modify Equation 23 to read: (26) In this form, the function has an integral of 1, so it's earned the right to be called a probability distribution function (hence the name change). Note that the height of the central peak of p(x) occurs when x = 0, where it's clearly: (27) On the home stretch As you'll recall, in building Figure 9 I had to shrink and stretch the N -dice "curves" to force them onto the same x -axis interval (±1) and keep their areas equal. We need to be able to do something similar for p ( x ). The new multiplying constant takes care of the area constraint, but we still need to be able to scale the x -axis width. I think it's safe to say that we won't always want the width of the central peak to be about ±2 or so. Even if we did, we still need a scale factor on x , because remember, x can—and often does—have units. I'm pretty sure that I don't know how to raise e to the power 1.618 pomegranates . 2 To take care of this, let's make the change of variables: (28) I'm sure you must be wondering where that factor of 2 came from. It seems like an unnecessary complexity, added for no good reason. Actually, there is a good reason—even a very good reason—but it won't be apparent until later. For now, just trust me, Ok? Note carefully that it's not enough to just substitute for x in Equation 26 . If we try to just stretch or shrink the horizontal scale, the function will still have the same height, so the area will change. We really need to go back to Equation 24 and evaluate the integral again. Differentiating the last of Equation 28 gives: (29) Substituting for both x and dx in Equation 24 gives the new integral: (30) Since we're integrating over the range ±∞, the changes to the exponent don't matter. times infinity is still infinity. So the integral still evaluates to , which makes the new area: (31) And our function now takes the form: (32) I mean... There is one last little tweak to p ( x ). Sometimes, people need to translate the x -component so that the central peak no longer occurs at x = 0. This isn't so much a problem for us, because when you're dealing with noise, it's most likely value will always be zero. But for the sake of completeness, here is the normal distribution function in its most general form. (33) As you can see, we now have two parameters we can adjust to match the situation. The constantµ is an additive factor to shift the peak left and right, while σ allows for scaling (and possibly removing the units of) x . These two parameters have names, and those names—which come from the science of statistics—should be familiar to you.µ is the mean , and σ is the standard deviation . As my last trick for this column, I'll prove to you that these names fit the statistical definitions of these parameters. Because we had to scale x , we now have a factor of σ in the multiplicative constant. This means, or course, that the height of the central peak will change as we vary σ. The expectation value Let's look back for a moment, to the things we were doing with dice. For any number of dice, I showed you the histograms, which can be easily turned into probability distributions using Equation 10 . Until now, we've only concerned ourselves with the probabilities of having a certain result, like 2, 12, or 7. But what if the thing we're interested in is not the result itself, but something that depends on it? To stick with the dice-game theme, what if you get, say, $10 every time you roll two dice and get a 4, but only $2 if you roll a 9 (which, you may recall, has the same probability: 1/9). In that case, it's not enough just to know the probability of getting a certain result from a dice roll; you also need to know what happens when you get that roll. In other words, you need the rules of the game. To take another example, suppose I buy a $1 lottery ticket, for a pot that's currently worth $300,000,000. What can I expect to get out of the deal? Well, one thing's for sure: It's not the 300 mill, because my likelihood of winning is very, very low. There's a mathematical term for this concept, and it's the same one the gamblers use. The only difference is that the gamblers were using it several thousand years earlier. The term is called expectation value . Mathematically, if P is the probability of winning, and v the payout value, then the expectation value of my lottery ticket is: (34) Here I've shown two popular notations for the expectation value. I tend to prefer the angle-bracket notation (..), because it's completely unambiguous. But the E(..) notation seems more popular lately. The same principle works for games like the dice game, only then we need to compute the average of all possible outcomes. If there are n possible outcomes from a given dice roll, then the expectation value becomes: (35) Now that we see the concept, it's easy enough to extend it to the case of continuous functions. If f(x) represents some function of x (the rules of the game, if you will), then its expectation value is: (36) This important integral embodies the central idea of how to deal with random processes. For everything we'll be doing from now on, we'll be using the normal distribution, so we might as well insert it into Equation 36 explicitly, to get: (37) Just to emphasize: This definition works for any function f ( x )—at least, any "well-behaved" function, meaning that it doesn't have any internal infinities. Of course, there's no guarantee that we'll be able to get a closed-form solution; we might have to resort to a numerical method such as Simpson's rule.
Grasping the normal distribution (Part 3)

热度 20

用户3678849

2013-5-16 14:58

3773 次阅读|

1 个评论

Computing the odds It's time for a little math. Let N be the number of dice for a given experiment. The smallest value we can get from the throw comes when all the die are showing 1's, so the score must be N . Likewise, the largest value must be 6 N . Recall that, in Figures 3 through 6 , the height of each bar is the number of ways a throw can generate a given result. Let the height for bar n be w n . Then, borrowing the form of Equation 6 , we can write: (10) Where, don't forget, W is the total number of ways we can arrange N dice. It's the sum of all the w 's. (11) I should say a word about the range. We get the smallest non-zero result when all the dice are showing 1's. Likewise, we get the largest result when they're all showing 6's. You can always make the range wider if you like—even ±ì, since the w 's are all zeroes outside the range shown. From now on, I won't show the range explicitly. Since W is also the total number of ways we can arrange N dice, we know that it must be: (12) The first few values are: (13) You can verify the smaller numbers for yourself—just count the number of 1-unit "bricks" in each column. I have to admit that I let the computer verify the larger numbers. We need a graph Armed with Equation 10 , it's easy enough to convert repeating Figures 3 through 6 probability charts. I won't bore you by showing them here; they'll look just like the histograms, except for the for the vertical scale. I'll only note that the sum of all the probability bars must be: (14) As must be the case for any decent probability function (did this surprise you?) Instead of repeating the bar graphs, I'd like to corral all the probability distribution functions into a single graph. To do that I have to switch from Excel's bar-chart format to the x - y (scatter) plot format. Figure 7 shows the result. . Well, we did manage to get all the curves on the same graph, and they look very nice, don't they? Let me remind you, though, that the curves don't really exist at all—they're still bar graphs in disguise. When you look at the figure, your brain naturally sees continuous curves. Remember, though, that we're still talking discrete values here. The only data items occur at the marker points on the graph. The lines are there only to show which marker points are in which group. I suppose I could have emphasized the integer-only nature of the data by leaving off the connecting lines, but trust me: that graph looks even more confusing. Normalizing the x-axis Looking at the "curves" in Figure 7 , the transition from straight lines to swoopy curves is hard to miss. But the transition would look even more convincing if we could get the curves on the same horizontal scale. We can certainly do that, by scaling the x -values to a specified range. But the scaling is a little tricky, mainly because there are no mathematical rules as to how to do it. I just made some arbitrary decisions, which were: 1. The peak should be centred at x = 0, as any good error curve should be 2. The horizontal scale should range from -1 to +1 3. Each curve should start with one and only one zero (not the four, say, of the five-dice case) Let's see how the scaling works out. Table 1 shows the essential x-axis range information for each "curve." The Min and Max columns include the single zero values. From the table, we can write some equations. If N is the number of dice, then: (15) So to scale to the range -1..1, we need the scale factor: (16) After scaling, we can translate the x -values left by subtracting 1. The scaled results are shown in Figure 8 . Did we get it right? What's that you say? Something looks wrong? You were expecting curves enclosing equal areas? Ah, there's the rub. You're still looking at the graph and seeing continuous curves—a mistake that's even easier to justify, considering that the new x -values are no longer integers. But, as before, Figure 8 is really still a bar chart in disguise. The data still only exist at the marker points. You're expecting curves enclosing equal areas because you're jumping ahead of me. You know that all the probabilities for a given "curve" should add up to 1, and you're thinking "integral under the curve." But, of course, there is no integral under the curve (yet), because there's still no curve—only the lone data points where the markers are. Or, equivalently, the heights of the bars in the bar charts.
Understanding the normal distribution (Part 3)

热度 24

用户3678849

2013-5-16 14:50

3093 次阅读|

0 个评论

Computing the odds It's time for a little math. Let N be the number of dice for a given experiment. The smallest value we can get from the throw comes when all the die are showing 1's, so the score must be N . Likewise, the largest value must be 6 N . Recall that, in Figures 3 through 6 , the height of each bar is the number of ways a throw can generate a given result. Let the height for bar n be w n . Then, borrowing the form of Equation 6 , we can write: (10) Where, don't forget, W is the total number of ways we can arrange N dice. It's the sum of all the w 's. (11) I should say a word about the range. We get the smallest non-zero result when all the dice are showing 1's. Likewise, we get the largest result when they're all showing 6's. You can always make the range wider if you like—even ±ì, since the w 's are all zeroes outside the range shown. From now on, I won't show the range explicitly. Since W is also the total number of ways we can arrange N dice, we know that it must be: (12) The first few values are: (13) You can verify the smaller numbers for yourself—just count the number of 1-unit "bricks" in each column. I have to admit that I let the computer verify the larger numbers. We need a graph Armed with Equation 10 , it's easy enough to convert repeating Figures 3 through 6 probability charts. I won't bore you by showing them here; they'll look just like the histograms, except for the for the vertical scale. I'll only note that the sum of all the probability bars must be: (14) As must be the case for any decent probability function (did this surprise you?) Instead of repeating the bar graphs, I'd like to corral all the probability distribution functions into a single graph. To do that I have to switch from Excel's bar-chart format to the x - y (scatter) plot format. Figure 7 shows the result. . Well, we did manage to get all the curves on the same graph, and they look very nice, don't they? Let me remind you, though, that the curves don't really exist at all—they're still bar graphs in disguise. When you look at the figure, your brain naturally sees continuous curves. Remember, though, that we're still talking discrete values here. The only data items occur at the marker points on the graph. The lines are there only to show which marker points are in which group. I suppose I could have emphasized the integer-only nature of the data by leaving off the connecting lines, but trust me: that graph looks even more confusing. Normalizing the x-axis Looking at the "curves" in Figure 7 , the transition from straight lines to swoopy curves is hard to miss. But the transition would look even more convincing if we could get the curves on the same horizontal scale. We can certainly do that, by scaling the x -values to a specified range. But the scaling is a little tricky, mainly because there are no mathematical rules as to how to do it. I just made some arbitrary decisions, which were: 1. The peak should be centred at x = 0, as any good error curve should be 2. The horizontal scale should range from -1 to +1 3. Each curve should start with one and only one zero (not the four, say, of the five-dice case) Let's see how the scaling works out. Table 1 shows the essential x-axis range information for each "curve." The Min and Max columns include the single zero values. From the table, we can write some equations. If N is the number of dice, then: (15) So to scale to the range -1..1, we need the scale factor: (16) After scaling, we can translate the x -values left by subtracting 1. The scaled results are shown in Figure 8 . Did we get it right? What's that you say? Something looks wrong? You were expecting curves enclosing equal areas? Ah, there's the rub. You're still looking at the graph and seeing continuous curves—a mistake that's even easier to justify, considering that the new x -values are no longer integers. But, as before, Figure 8 is really still a bar chart in disguise. The data still only exist at the marker points. You're expecting curves enclosing equal areas because you're jumping ahead of me. You know that all the probabilities for a given "curve" should add up to 1, and you're thinking "integral under the curve." But, of course, there is no integral under the curve (yet), because there's still no curve—only the lone data points where the markers are. Or, equivalently, the heights of the bars in the bar charts.
Understanding the normal distribution (Part 2)

热度 20

用户3678849

2013-5-14 18:29

3261 次阅读|

0 个评论

What are the odds? A good definition for the word "probability" is hard to find. The ones I've found use synonyms like likelihood or chance, so the definition is circular. Fortunately, most of us have an innate understanding of the concept. The probability that the Sun will come up tomorrow is pretty high. The probability that I'll hit the Powerball jackpot, be asked for a date by Kate Upton, and be hit on the head by a meteorite—all in the same day—is vanishingly small. If you really want to learn about probability, you don't need to go to Yale or Harvard. You only need to study under a professional gambler, like those fictitious, Runyonesque characters in Guys and Dolls . Nobody can compute probabilities in his head like a gambler. Only he calls them "the odds." Ask one of these guys what the odds are for a tossed coin coming up heads, and he'll immediately say "50/50." That's his way of saying that there is no preference for one result over the other. Tossed many times, the coin will come up heads, on average, half the time. We'd say the probability is 1/2. That's 50% to the gambler. Faced with the same question, a mathematician might define a probability function : (2) Since the two probabilities are equal, we say that the distribution is uniform . The gambler would say that the coin is fair . The roll of a single die is also fair. Except for the number of dots, all the faces are made just alike, so there's no reason to suppose that one of them will come up more often than the other. The gambler would say that the odds of a tossed die showing, say, six, are 1 in 6. The mathematician would write (3) The sets of probabilities in Equations 2 and 3 are called probability distribution functions . For these two cases, the functions are discrete, having values only at the integer mesh points. Try as you might, you're never going to roll some dice and get a value of 3.185295. As for all non-integer results, the probability of that result is 0. The probabilities of my sun vs. Kate examples are: (4) From these few and sometimes silly examples, we can get an idea as to what a probability really is. It must be a single scalar number that represents the likelihood that some event will happen. What's more, the value must be constrained to the range (5) because no event can happen less frequently than never, or more frequently than always. Note carefully that the probabilities in Equations 2 and 3 add up to 1. When you flip a coin, you must get some result, and the result can only be heads or tails. Landing on its edge is not allowed. Similarly, when you roll a single die, getting a value between 1 through 6 is certain. On the basis of this sometimes arm-waving argument, we can now give a rigorous definition of a probability. It's: (6) On a roll Let's perform a little thought experiment. I'm going to roll a single die six times, and count how many times each face shows up. The result is shown in Figure 1 , in the form of a histogram . What's that you say? You were expecting to see each face appear once and only once? Well, that's what you'd get if the results were predictable, but they're not. It's a random process, remember? The chance of getting one and only one occurrence of each face are about: (7) If we roll the dice a lot more times, we should get a histogram more like what we expect. Figure 2 shows the result of 6,000 rolls. Even with so many trials, we still see slight differences between the ideal and actual results. But at this point it probably won't take much to convince you that, on average, the number of occurrences are equal. The die is indeed fair, and the probability of rolling any given value is 1/6. More dice, please So far, the graphs I've shown are rather dull. Six numbers, all equal, are not exactly likely to get your blood pumping. But things get a lot more interesting if you add more dice. Figure 3 shows the histogram for two dice. Now we're getting somewhere! At last, the histogram has some character. Why are there more occurrences of a seven than a 2 or 12? The answer goes right to the heart of probability theory. The unstated rule for a roll with two dice is that we add the values of the two dice. When we add them, there can only be one way to get a result of 2: each die has to show a value of 1. Ditto for a sum of 12. But there are six possible ways to get a sum of 7. You can have: (8) All six ways must be counted, and the order matters. and count as two different ways, not just one. As we can see from the histogram, a result of 7 is six times more likely than a result of 2 or 12. If you add the heights of all the bars, you'll get a total of 36. That makes sense; you can arrange the first die in six possible ways. For each of those ways, you can arrange the second in six ways. The total number of ways must be: (9) Our gambler friends would say the odds of a 7 are 6 out of 36, or 1 out of 6. Well, if two dice make for a more interesting histogram, maybe we should try three or more. Figures 4 through 6 show the results for three, four, and five dice, respectively. What are we looking at here? Can we say "bell curve"? Let's review the bidding. We started this thought experiment with the simple statistics for a single die—statistics which happen to describe a uniform distribution , in which all outcomes are equally likely. From that simplest of beginnings, we added more dice, always following the rule that the result is the numerical sum of the faces shown on each die. We watched the shapes of the histograms morph from the uniform distribution through the triangular shape of Figure 4 into the familiar bell-curve shape. It's really quite remarkable that we not only got a histogram of this shape from such primitive beginnings, but the bell-curve shape begins to appear with so few (three to five) dice. But if you think that's remarkable, wait till you hear this: We would have gotten the same shape for any starting distribution! All we need is some device that produces at least two numbers at random, and the rule that we get the score by adding the individual results. This truly remarkable result follows from the central limit theorem . No doubt you've already figured out that the shape that these histograms seem to be trending to is the shape of the normal distribution. Now you can see why the normal distribution is so ubiquitous in nature. It's because you almost never see (except in board games) a single source of the noise. Usually the noise is generated by many random processes, all running independently of each other. As long as the outputs of the many sources are added together (as they would be in, say, an electronic circuit), the normal distribution is the inevitable result.
Grasping the normal distribution (Part 2)

热度 25

用户3678849

2013-5-10 18:38

3740 次阅读|

1 个评论

What are the odds? A good definition for the word "probability" is hard to find. The ones I've found use synonyms like likelihood or chance, so the definition is circular. Fortunately, most of us have an innate understanding of the concept. The probability that the Sun will come up tomorrow is pretty high. The probability that I'll hit the Powerball jackpot, be asked for a date by Kate Upton, and be hit on the head by a meteorite—all in the same day—is vanishingly small. If you really want to learn about probability, you don't need to go to Yale or Harvard. You only need to study under a professional gambler, like those fictitious, Runyonesque characters in Guys and Dolls . Nobody can compute probabilities in his head like a gambler. Only he calls them "the odds." Ask one of these guys what the odds are for a tossed coin coming up heads, and he'll immediately say "50/50." That's his way of saying that there is no preference for one result over the other. Tossed many times, the coin will come up heads, on average, half the time. We'd say the probability is 1/2. That's 50% to the gambler. Faced with the same question, a mathematician might define a probability function : (2) Since the two probabilities are equal, we say that the distribution is uniform . The gambler would say that the coin is fair . The roll of a single die is also fair. Except for the number of dots, all the faces are made just alike, so there's no reason to suppose that one of them will come up more often than the other. The gambler would say that the odds of a tossed die showing, say, six, are 1 in 6. The mathematician would write (3) The sets of probabilities in Equations 2 and 3 are called probability distribution functions . For these two cases, the functions are discrete, having values only at the integer mesh points. Try as you might, you're never going to roll some dice and get a value of 3.185295. As for all non-integer results, the probability of that result is 0. The probabilities of my sun vs. Kate examples are: (4) From these few and sometimes silly examples, we can get an idea as to what a probability really is. It must be a single scalar number that represents the likelihood that some event will happen. What's more, the value must be constrained to the range (5) because no event can happen less frequently than never, or more frequently than always. Note carefully that the probabilities in Equations 2 and 3 add up to 1. When you flip a coin, you must get some result, and the result can only be heads or tails. Landing on its edge is not allowed. Similarly, when you roll a single die, getting a value between 1 through 6 is certain. On the basis of this sometimes arm-waving argument, we can now give a rigorous definition of a probability. It's: (6) On a roll Let's perform a little thought experiment. I'm going to roll a single die six times, and count how many times each face shows up. The result is shown in Figure 1 , in the form of a histogram . What's that you say? You were expecting to see each face appear once and only once? Well, that's what you'd get if the results were predictable, but they're not. It's a random process, remember? The chance of getting one and only one occurrence of each face are about: (7) If we roll the dice a lot more times, we should get a histogram more like what we expect. Figure 2 shows the result of 6,000 rolls. Even with so many trials, we still see slight differences between the ideal and actual results. But at this point it probably won't take much to convince you that, on average, the number of occurrences are equal. The die is indeed fair, and the probability of rolling any given value is 1/6. More dice, please So far, the graphs I've shown are rather dull. Six numbers, all equal, are not exactly likely to get your blood pumping. But things get a lot more interesting if you add more dice. Figure 3 shows the histogram for two dice. Now we're getting somewhere! At last, the histogram has some character. Why are there more occurrences of a seven than a 2 or 12? The answer goes right to the heart of probability theory. The unstated rule for a roll with two dice is that we add the values of the two dice. When we add them, there can only be one way to get a result of 2: each die has to show a value of 1. Ditto for a sum of 12. But there are six possible ways to get a sum of 7. You can have: (8) All six ways must be counted, and the order matters. and count as two different ways, not just one. As we can see from the histogram, a result of 7 is six times more likely than a result of 2 or 12. If you add the heights of all the bars, you'll get a total of 36. That makes sense; you can arrange the first die in six possible ways. For each of those ways, you can arrange the second in six ways. The total number of ways must be: (9) Our gambler friends would say the odds of a 7 are 6 out of 36, or 1 out of 6. Well, if two dice make for a more interesting histogram, maybe we should try three or more. Figures 4 through 6 show the results for three, four, and five dice, respectively. What are we looking at here? Can we say "bell curve"? Let's review the bidding. We started this thought experiment with the simple statistics for a single die—statistics which happen to describe a uniform distribution , in which all outcomes are equally likely. From that simplest of beginnings, we added more dice, always following the rule that the result is the numerical sum of the faces shown on each die. We watched the shapes of the histograms morph from the uniform distribution through the triangular shape of Figure 4 into the familiar bell-curve shape. It's really quite remarkable that we not only got a histogram of this shape from such primitive beginnings, but the bell-curve shape begins to appear with so few (three to five) dice. But if you think that's remarkable, wait till you hear this: We would have gotten the same shape for any starting distribution! All we need is some device that produces at least two numbers at random, and the rule that we get the score by adding the individual results. This truly remarkable result follows from the central limit theorem . No doubt you've already figured out that the shape that these histograms seem to be trending to is the shape of the normal distribution. Now you can see why the normal distribution is so ubiquitous in nature. It's because you almost never see (except in board games) a single source of the noise. Usually the noise is generated by many random processes, all running independently of each other. As long as the outputs of the many sources are added together (as they would be in, say, an electronic circuit), the normal distribution is the inevitable result.

更多...

标签: probability