Standard deviation formula. Average linear deviation

Expectation and variance

Let us measure a random variable N times, for example, we measure the wind speed ten times and want to find the average value. How is the average value related to the distribution function?

We will roll the dice a large number of times. The number of points that will appear on the dice with each throw is a random variable and can take any natural value from 1 to 6. The arithmetic mean of the dropped points calculated for all dice throws is also a random variable, but for large N it tends to a very specific number - mathematical expectation M x. In this case M x = 3,5.

How did you get this value? Let in N tests, once you get 1 point, once you get 2 points, and so on. Then When N→ ∞ number of outcomes in which one point was rolled, Similarly, Hence

Model 4.5. Dice

Let us now assume that we know the distribution law of the random variable x, that is, we know that the random variable x can take values x 1 , x 2 , ..., x k with probabilities p 1 , p 2 , ..., p k.

Expected value M x random variable x equals:

Answer. 2,8.

The mathematical expectation is not always a reasonable estimate of some random variable. Thus, to estimate the average salary, it is more reasonable to use the concept of median, that is, such a value that the number of people receiving a salary lower than the median and a greater one coincide.

Median random variable is called a number x 1/2 is such that p (x < x 1/2) = 1/2.

In other words, the probability p 1 that the random variable x will be smaller x 1/2, and probability p 2 that the random variable x will be greater x 1/2 are identical and equal to 1/2. The median is not determined uniquely for all distributions.

Let's return to the random variable x, which can take values x 1 , x 2 , ..., x k with probabilities p 1 , p 2 , ..., p k.

Variance random variable x The average value of the squared deviation of a random variable from its mathematical expectation is called:

Example 2

Under the conditions of the previous example, calculate the variance and standard deviation of the random variable x.

Answer. 0,16, 0,4.

Model 4.6. Shooting at a target

Example 3

Find the probability distribution of the number of points that appear on the dice on the first throw, the median, the mathematical expectation, the variance and the standard deviation.

Any edge is equally likely to fall out, so the distribution will look like this:

Standard deviation It can be seen that the deviation of the value from the average value is very large.

Properties of mathematical expectation:

  • The mathematical expectation of the sum of independent random variables is equal to the sum of their mathematical expectations:

Example 4

Find the mathematical expectation of the sum and product of points rolled on two dice.

In example 3 we found that for one cube M (x) = 3.5. So for two cubes

Dispersion properties:

  • The variance of the sum of independent random variables is equal to the sum of the variances:

D x + y = D x + Dy.

Let for N rolls on the dice rolled y points. Then

This result is true not only for dice rolls. In many cases, it determines the accuracy of measuring the mathematical expectation empirically. It can be seen that with increasing number of measurements N the spread of values ​​around the average, that is, the standard deviation, decreases proportionally

The variance of a random variable is related to the mathematical expectation of the square of this random variable by the following relation:

Let's find the mathematical expectations of both sides of this equality. A-priory,

The mathematical expectation of the right side of the equality, according to the property of mathematical expectations, is equal to

Standard deviation

Standard deviation equal to the square root of the variance:
When determining the standard deviation for a sufficiently large volume of the population being studied (n > 30), the following formulas are used:

Related information.


Dispersion. Standard deviation

Dispersion is the arithmetic mean of the squared deviations of each attribute value from the overall average. Depending on the source data, the variance can be unweighted (simple) or weighted.

The variance is calculated using the following formulas:

· for ungrouped data

· for grouped data

The procedure for calculating the weighted variance:

1. determine the arithmetic weighted average

2. deviations of the variant from the average are determined

3. square the deviation of each option from the average

4. multiply the squares of deviations by weights (frequencies)

5. summarize the resulting products

6. the resulting amount is divided by the sum of the scales

The formula for determining variance can be converted into the following formula:

- simple

The procedure for calculating variance is simple:

1. determine the arithmetic mean

2. square the arithmetic mean

3. square each option in the row

4. find the sum of squares option

5. divide the sum of squares by their number, i.e. determine the mean square

6. determine the difference between the mean square of the characteristic and the square of the mean

Also, the formula for determining the weighted variance can be converted into the following formula:

those. the dispersion is equal to the difference between the average of the squared values ​​of the attribute and the square of the arithmetic mean. When using the transformed formula, the additional procedure for calculating deviations of individual values ​​of a characteristic from x is eliminated and the error in the calculation associated with rounding of deviations is eliminated

Dispersion has a number of properties, some of which make it easier to calculate:

1) the variance of a constant value is zero;

2) if all variants of attribute values ​​are reduced by the same number, then the variance will not decrease;

3) if all variants of attribute values ​​are reduced by the same number of times (fold), then the variance will decrease by a factor

Standard deviation S- represents the square root of the variance:

· for ungrouped data:

;

· for the variation series:

The range of variation, linear mean and standard deviation are named quantities. They have the same units of measurement as the individual characteristic values.

Variance and standard deviation are the most widely used measures of variation. This is explained by the fact that they are included in most theorems of probability theory, which serves as the foundation of mathematical statistics. In addition, the variance can be decomposed into its component elements, which make it possible to evaluate the influence of various factors that determine the variation of a trait.

The calculation of variation indicators for banks grouped by profit margin is shown in the table.

Profit amount, million rubles. Number of banks calculated indicators
3,7 - 4,6 (-) 4,15 8,30 -1,935 3,870 7,489
4,6 - 5,5 5,05 20,20 - 1,035 4,140 4,285
5,5 - 6,4 5,95 35,70 - 0,135 0,810 0,109
6,4 - 7,3 6,85 34,25 +0,765 3,825 2,926
7,3 - 8,2 7,75 23,25 +1,665 4,995 8,317
Total: 121,70 17,640 23,126

The average linear and standard deviation show how much the value of a characteristic fluctuates on average among units and the population under study. So, in this case, the average fluctuation in profit is: according to the average linear deviation, 0.882 million rubles; by standard deviation - 1.075 million rubles. The standard deviation is always greater than the mean linear deviation. If the distribution of the characteristic is close to normal, then there is a relationship between S and d: S=1.25d, or d=0.8S. The standard deviation shows how the bulk of the population units are located relative to the arithmetic mean. Regardless of the shape of the distribution, 75 values ​​of the attribute fall into the interval x 2S, and at least 89 of all values ​​fall into the interval x 3S (P.L. Chebyshev’s theorem).

The Excel program is highly valued by both professionals and amateurs, because users of any skill level can work with it. For example, anyone with minimal “communication” skills in Excel can draw a simple graph, make a decent plate, etc.

At the same time, this program even allows you to perform various types of calculations, for example, calculations, but this requires a slightly different level of training. However, if you have just begun to become closely acquainted with this program and are interested in everything that will help you become a more advanced user, this article is for you. Today I will tell you what the standard deviation formula in Excel is, why it is needed at all and, strictly speaking, when it is used. Go!

What it is

Let's start with the theory. The standard deviation is usually called the square root obtained from the arithmetic mean of all squared differences between the available quantities, as well as their arithmetic mean. By the way, this value is usually called the Greek letter “sigma”. The standard deviation is calculated using the STANDARDEVAL formula; accordingly, the program does this for the user itself.

The essence of this concept is to identify the degree of variability of an instrument, that is, it is, in its own way, an indicator derived from descriptive statistics. It identifies changes in the volatility of an instrument over a certain time period. The STDEV formulas can be used to estimate the standard deviation of a sample, ignoring Boolean and text values.

Formula

The formula that is automatically provided in Excel helps to calculate the standard deviation in Excel. To find it, you need to find the formula section in Excel, and then select the one called STANDARDEVAL, so it’s very simple.

After this, a window will appear in front of you in which you will need to enter data for the calculation. In particular, two numbers should be entered in special fields, after which the program itself will calculate the standard deviation for the sample.

Undoubtedly, mathematical formulas and calculations are a rather complex issue, and not all users can cope with it straight away. However, if you dig a little deeper and look at the issue in a little more detail, it turns out that not everything is so sad. I hope you are convinced of this using the example of calculating the standard deviation.

Video to help

According to the sample survey, depositors were grouped according to the size of their deposit in the city’s Sberbank:

Define:

1) scope of variation;

2) average deposit size;

3) average linear deviation;

4) dispersion;

5) standard deviation;

6) coefficient of variation of contributions.

Solution:

This distribution series contains open intervals. In such series, the value of the interval of the first group is conventionally assumed to be equal to the value of the interval of the next one, and the value of the interval of the last group is equal to the value of the interval of the previous one.

The value of the interval of the second group is equal to 200, therefore, the value of the first group is also equal to 200. The value of the interval of the penultimate group is equal to 200, which means that the last interval will also have a value of 200.

1) Let us define the range of variation as the difference between the largest and smallest value of the attribute:

The range of variation in the deposit size is 1000 rubles.

2) The average size of the contribution will be determined using the weighted arithmetic average formula.

Let us first determine the discrete value of the attribute in each interval. To do this, using the simple arithmetic mean formula, we find the midpoints of the intervals.

The average value of the first interval will be:

the second - 500, etc.

Let's enter the calculation results in the table:

Deposit amount, rub.Number of depositors, fMiddle of the interval, xxf
200-400 32 300 9600
400-600 56 500 28000
600-800 120 700 84000
800-1000 104 900 93600
1000-1200 88 1100 96800
Total 400 - 312000

The average deposit in the city's Sberbank will be 780 rubles:

3) The average linear deviation is the arithmetic mean of the absolute deviations of individual values ​​of a characteristic from the overall average:

The procedure for calculating the average linear deviation in the interval distribution series is as follows:

1. The weighted arithmetic mean is calculated, as shown in paragraph 2).

2. Absolute deviations from the average are determined:

3. The resulting deviations are multiplied by frequencies:

4. Find the sum of weighted deviations without taking into account the sign:

5. The sum of weighted deviations is divided by the sum of frequencies:

It is convenient to use the calculation data table:

Deposit amount, rub.Number of depositors, fMiddle of the interval, x
200-400 32 300 -480 480 15360
400-600 56 500 -280 280 15680
600-800 120 700 -80 80 9600
800-1000 104 900 120 120 12480
1000-1200 88 1100 320 320 28160
Total 400 - - - 81280

The average linear deviation of the size of the deposit of Sberbank clients is 203.2 rubles.

4) Dispersion is the arithmetic mean of the squared deviations of each attribute value from the arithmetic mean.

Calculation of variance in interval distribution series is carried out using the formula:

The procedure for calculating variance in this case is as follows:

1. Determine the weighted arithmetic mean, as shown in paragraph 2).

2. Find deviations from the average:

3. Square the deviation of each option from the average:

4. Multiply the squares of the deviations by the weights (frequencies):

5. Sum up the resulting products:

6. The resulting amount is divided by the sum of the weights (frequencies):

Let's put the calculations in a table:

Deposit amount, rub.Number of depositors, fMiddle of the interval, x
200-400 32 300 -480 230400 7372800
400-600 56 500 -280 78400 4390400
600-800 120 700 -80 6400 768000
800-1000 104 900 120 14400 1497600
1000-1200 88 1100 320 102400 9011200
Total 400 - - - 23040000

An approximate method for assessing the variability of a variation series is to determine the limit and amplitude, but the values ​​of the variant within the series are not taken into account. The main generally accepted measure of the variability of a quantitative characteristic within a variation series is standard deviation (σ - sigma). The larger the standard deviation, the higher the degree of fluctuation of this series.

The method for calculating the standard deviation includes the following steps:

1. Find the arithmetic mean (M).

2. Determine the deviations of individual options from the arithmetic mean (d=V-M). In medical statistics, deviations from the average are designated as d (deviate). The sum of all deviations is zero.

3. Square each deviation d 2.

4. Multiply the squares of the deviations by the corresponding frequencies d 2 *p.

5. Find the sum of the products å(d 2 *p)

6. Calculate the standard deviation using the formula:

When n is greater than 30, or when n is less than or equal to 30, where n is the number of all options.

Standard deviation value:

1. The standard deviation characterizes the spread of the variant relative to the average value (i.e., the variability of the variation series). The larger the sigma, the higher the degree of diversity of this series.

2. The standard deviation is used for a comparative assessment of the degree of correspondence of the arithmetic mean to the variation series for which it was calculated.

Variations of mass phenomena obey the law of normal distribution. The curve representing this distribution looks like a smooth bell-shaped symmetrical curve (Gaussian curve). According to the theory of probability, in phenomena that obey the law of normal distribution, there is a strict mathematical relationship between the values ​​of the arithmetic mean and the standard deviation. The theoretical distribution of a variant in a homogeneous variation series obeys the three-sigma rule.

If in a system of rectangular coordinates the values ​​of a quantitative characteristic (variants) are plotted on the abscissa axis, and the frequency of occurrence of a variant in a variation series is plotted on the ordinate axis, then variants with larger and smaller values ​​are evenly located on the sides of the arithmetic mean.



It has been established that with a normal distribution of the trait:

68.3% of the variant values ​​are within M±1s

95.5% of the variant values ​​are within M±2s

99.7% of the variant values ​​are within M±3s

3. The standard deviation allows you to establish normal values ​​for clinical and biological parameters. In medicine, the interval M±1s is usually taken as the normal range for the phenomenon being studied. The deviation of the estimated value from the arithmetic mean by more than 1s indicates a deviation of the studied parameter from the norm.

4. In medicine, the three-sigma rule is used in pediatrics for individual assessment of the level of physical development of children (sigma deviation method), for the development of standards for children's clothing

5. The standard deviation is necessary to characterize the degree of diversity of the characteristic being studied and to calculate the error of the arithmetic mean.

The value of the standard deviation is usually used to compare the variability of series of the same type. If two series with different characteristics are compared (height and weight, average duration of hospital treatment and hospital mortality, etc.), then a direct comparison of sigma sizes is impossible , because standard deviation is a named value expressed in absolute numbers. In these cases, use coefficient of variation (Cv), which is a relative value: the percentage ratio of the standard deviation to the arithmetic mean.

The coefficient of variation is calculated using the formula:

The higher the coefficient of variation , the greater the variability of this series. It is believed that a coefficient of variation of more than 30% indicates the qualitative heterogeneity of the population.