### Normal Curves

In this part of the statistics curriculum we will be discussing the concept of normal curves and some of their uses. The first concept to be discussed is exactly what is a normal curve. Well, to put it simply, it is a representation of data, that in many cases actually fits how the real world represent it. As an example, we all know that the height of people seems to center around a certain height, with fewer people at the extremes. If one thinks about it, how many 4. ft 6 in. adults do you know off or how many adults that are 7 ft. 5 in. Probably not too many, in both cases, they would be considered unusual. But, on the other hand, a person who is 5 ft. 7 in. would be fairly common. To further this assumption, we know a lot of adults that are between 5 feet and 6 feet, with gradually lesser people toward the extremes. This is a very typical normal curve. You could visualize it with the below curve.

You will notice several key features in this graph. The first is that it is completely symmetric, i.e. it is exactly the same shape on both sides of the central axis, labeled in a lower case y. This central axis is also inportant in that it represents the average of all data. Next, you should notice that the tails ( left and right) do not actually go to a value of zero but are assymtopic to the x axis. Recall that assymtopic means that it gets closer and closer to a value without ever reaching it.

Let us look at a sample data set. In this case, we will use the number of heads that occur in a set of 1000 trials of 100 coin flips. (Graph shown below) You will notice that the highest point of the graph will occur in the 48 - 52 range of our graph. As we approach the tails you will notice that it becomes less and less likely for these possibilities to occur. Through pure common sense, we all realize that it is not very likely to get 98 heads out of 100 flips, or conversely getting on 5 tails in 100 flips. You will also notice that the curve is very similiar in shape to the normal curve.

The first thing you will notice is that this curve is not smooth like the above the curve. Well this is partially due to the small number of trials that were used (1000 to be exact). If one were to run this simulation several million times we would find a curve very similar in shape, except in the case of coin flips, there would be very discrete values, due to the fact that coins have to conditions, heads or tails. Another feature that you will probably notice is that 50 is not the peak value. Again this can be explained by the lack of millions of trials. The shape in essence is very close to what a normal curve would be. Keep in mind, that I have only shown values from 26 to 68. The other values were considered, but the odds of these numbers occuring is quite small. As an example the odds of getting 10 heads or 90 heads out of 100 flips is approximately ! Needless to say we would not expect them to occur very often. How I arrived at this figure will be discussed in the unit on Binomial Curves.

One of the reasons we do statistics is to see if an occurance is common, fairly uncommon, or extremely rare. To do this we need to know how to compute the mean and standard deviation. To refresh your mind, I will go through the process once again, step by step.

The process of computing the mean is a very simple process. We basically add all of the values that are of concern for a set of data, and divide this total by the number of items in our list. Let us now look at a simple example of calculating the mean of a set of numbers, say for instance, the mean score on a test in a math class. Our set of data is: { 69, 78, 83, 96, 83, 91, 99, 100, 53, 75}.

If you recall from the last section, we found that the standard deviation is a measure of the "spread" of the data. The process is shown below:

 69 78 - 82.7 187.69 78 78 - 82.7 22.09 83 83 - 82.7 0.09 96 96 - 82.7 176.89 83 83 - 82.7 0.09 91 91 - 82.7 68.89 99 99 - 82.7 265.69 100 100 - 82.7 299.29 53 53 - 82.7 882.09 75 75 - 82.7 59.29

Now recalling from the previous unit, that standard deviation tells us how spread out the data, as well as, what could be considered a normal range of data, i.e. 95% of our data should be within 2 standard deviations of the mean. Thus in this case, our normal data range would be:

Below is a representation of the normal curve for this data.

Now that we have a graph of our data, we can determine a quite a bit of information about our data. We know from previous discussions that within 1 standard deviation of the mean, 68% of our data should be located. Within 2 standard deviations of our mean approximately 95.4% of our data is located. Lastly, within 3 standard deviations of our mean, we know that 99.7% of our data is located. Using all of the above information, as well as, the graph we see the normal range of scores to be roughly 54 to 110 points on what ever the total points are on this test.

In our previous unit, we found that we could use Z-scores to compare two different populations, as well as, to see which value is more different from the mean/average value of set of data.. Well, this Z-score has another use, it can be used to determine what percent of a population is within a certain range of values As an example, let us find what percentage of people scored less than 75 points. You may say, that easy I only have 10 people, I count the people with scores less than 75 points and divide by 10. Well, for this set that is true, but let's say you have a sample of 10000 people, would you want to count them all? I don't think so. Any way, let us continue. The first thing we need to do is to calculate the Z-value for our score of 75 points.

Now we look this value on our Normal Curve table.

We find that the percentage of data up to this point is: .2912 or 29.12% of all the data is below 75 points. If we acutally counted the data points from our data, we would find it to be 30%. Pretty impressive, I would say!
Let us now check to see what percent of people scored above 90 points on our test.

Now for the Z-value of .52, we find that .6985 or 69.85% of the data scored below 90 points, but we are looking for greater than 90 points! The solution is quite easy, recalling the area under the curve is always 1, we simply subtract .6985 from 1 to get a value of .3015 or 30.15%. Checking our data we find that 4 of 10 are greater than 90 points. The difference is quick small if you consider that 91 is quite close to 90 points and we have a very small set of data, thus we could very logically assume a plus/minus 10% accuracy.

Our last example using Z-scores involves finding the percent of data between two values. Let us say we want to know what percent of data is between 50 and 80 points. As a result what we have to do is first determine the Z-score for both 50 and 80 points as below:

We now look up these values on our normal curve table and find that the percent of data included up to 50 points is 0.0099 or 0.99% and the percent of data included up to 80 points is 0.4247 or 42.47%. Now the slightly tricky part. We want to determine the area between these two Z-scores. See the graph below. To do this, simply subtract the two values, namely smaller from the larger, i.e. .4247 - .0099. Your answer will be .4148 or 41.48% of your data. According to our data set we get a value of 50%, i.e. 5 of 10 values. Again this is reasonably close due to the small number of data points that we have.

Now it is you turn. Let us try a few problems. Click the below links to access your sample problems and assignment