The mean and root mean square values ​​are presented. Natural and decimal fractions

The mean square of two non-negative numbers a, b is a non-negative number whose square is the arithmetic mean of the squares of the numbers a and b, i.e. the number

Problem 351. The definition deals with the arithmetic mean. What happens if you replace it with the geometric mean?

Problem 352. Prove that the mean square of two numbers is greater than or equal to their arithmetic mean:

(For example, the square mean of the numbers 0 and a is equal to , and the arithmetic mean is equal to )

Solution. Let's compare the squares and prove that

Multiply by 4 and open the brackets

Again the left-hand side is a square and therefore non-negative.

Problem 353. For what a and b is the square mean equal to the arithmetic mean?

Problem 354. Prove that the geometric mean does not exceed the quadratic mean.

A geometric illustration is shown in Fig. 31. Let's draw a graph. Let's connect the points with coordinates lying on it with a segment. The middle of this segment will have coordinates that are the arithmetic mean of the coordinates of the ends, i.e.

Below it on the graph is a point

Thus, the inequality about the arithmetic mean and the mean square means that the graph is convex downwards (the curve lies below the chord.

Problem 355. By swapping the x and y axes, from the graph we obtain a graph of the function, which is located above any of its chords (see Fig. 32). What inequality does this correspond to?

We now know that for any non-negative a and b

For each of these three types of average, we will draw points (a, b), for which the average does not exceed 1 (see Fig. 33 a-c).

Combining them in one figure (Fig. 34), we see that the larger the average, the smaller the corresponding area.

Problem 356. Prove the inequality about the arithmetic mean and the mean square for three numbers:

Problem 357. (a) The sum of two positive numbers is 2. What is the minimum value of the sum of their squares?

(b) The same question for the sum of squares of three positive numbers whose sum is 3.

Defined as a generalizing characteristic of the size of variation of a trait in the aggregate. It is equal to the square root of the average square deviation of individual values ​​of the attribute from the arithmetic mean, i.e. The root of and can be found like this:

1. For the primary row:

2. For the variation series:

Transformation of the standard deviation formula brings it to a form more convenient for practical calculations:

Standard deviation determines how much on average specific options deviate from their average value, and is also an absolute measure of the variability of a characteristic and is expressed in the same units as the options, and is therefore well interpreted.

Examples of finding the standard deviation: ,

For alternative characteristics, the standard deviation formula looks like this:

where p is the proportion of units in the population that have a certain characteristic;

q is the proportion of units that do not have this characteristic.

The concept of average linear deviation

Average linear deviation is defined as the arithmetic mean of the absolute values ​​of the deviations of individual options from .

1. For the primary row:

2. For the variation series:

where the sum n is sum of frequencies of variation series.

An example of finding the average linear deviation:

The advantage of the mean absolute deviation as a measure of dispersion over the range of variation is obvious, since this measure is based on taking into account all possible deviations. But this indicator has significant drawbacks. Arbitrary rejection of algebraic signs of deviations can lead to the fact that the mathematical properties of this indicator are far from elementary. This makes it very difficult to use the mean absolute deviation when solving problems involving probabilistic calculations.

Therefore, the average linear deviation as a measure of variation of a characteristic is rarely used in statistical practice, namely when summing up indicators without taking into account signs makes economic sense. With its help, for example, the turnover of foreign trade, the composition of workers, the rhythm of production, etc. are analyzed.

Mean square

Mean square applied, for example, to calculate the average size of the sides of n square sections, the average diameters of trunks, pipes, etc. It is divided into two types.

Simple mean square. If, when replacing individual values ​​of a characteristic with an average value, it is necessary to keep the sum of the squares of the original values ​​unchanged, then the average will be a quadratic average value.

It is the square root of the quotient of dividing the sum of squares of the individual attribute values ​​by their number:

The weighted mean square is calculated using the formula:

where f is the weight sign.

Average cubic

Average cubic applies, for example, when determining the average length of a side and cubes. It is divided into two types.
Average cubic simple:

When calculating average values ​​and dispersion in interval distribution series, the true values ​​of the attribute are replaced by the central values ​​of the intervals, which differ from the arithmetic mean of the values ​​included in the interval. This leads to a systematic error when calculating the variance. V.F. Sheppard determined that error in variance calculation, caused by the use of grouped data, is 1/12 of the square of the interval in both the upward and downward direction of the variance.

Sheppard Amendment should be used if the distribution is close to normal, relates to a characteristic with a continuous nature of variation, and is based on a significant amount of initial data (n > 500). However, based on the fact that in some cases both errors, acting in different directions, compensate each other, it is sometimes possible to refuse to introduce corrections.

The smaller the variance and standard deviation, the more homogeneous the population and the more typical the average will be.
In the practice of statistics, there is often a need to compare variations of various characteristics. For example, it is of great interest to compare variations in the age of workers and their qualifications, length of service and wages, costs and profits, length of service and labor productivity, etc. For such comparisons, indicators of absolute variability of characteristics are unsuitable: it is impossible to compare the variability of work experience, expressed in years, with the variation of wages, expressed in rubles.

To carry out such comparisons, as well as comparisons of the variability of the same characteristic in several populations with different arithmetic averages, a relative indicator of variation is used - the coefficient of variation.

Structural averages

To characterize the central tendency in statistical distributions, it is often rational to use, together with the arithmetic mean, a certain value of the characteristic X, which, due to certain features of its location in the distribution series, can characterize its level.

This is especially important when in a distribution series the extreme values ​​of a characteristic have unclear boundaries. In this regard, an accurate determination of the arithmetic mean is usually impossible or very difficult. In such cases, the average level can be determined by taking, for example, the value of the feature that is located in the middle of the frequency series or that occurs most often in the current series.

Such values ​​depend only on the nature of the frequencies, i.e., on the structure of the distribution. They are typical in location in a series of frequencies, therefore such values ​​are considered as characteristics of the center of the distribution and therefore received the definition of structural averages. They are used to study the internal structure and structure of the distribution series of attribute values. Such indicators include:

The most perfect characteristic of variation is the mean square deviation, which is called the standard (or standard deviation). Standard deviation() is equal to the square root of the average square deviation of individual values ​​of the attribute from the arithmetic mean:

The standard deviation is simple:

Weighted standard deviation is applied to grouped data:

Between the root mean square and mean linear deviations under normal distribution conditions the following ratio occurs: ~ 1.25.

The standard deviation, being the main absolute measure of variation, is used in determining the ordinate values ​​of a normal distribution curve, in calculations related to the organization of sample observation and establishing the accuracy of sample characteristics, as well as in assessing the limits of variation of a characteristic in a homogeneous population.

Dispersion, its types, standard deviation.

Variance of a random variable— a measure of the spread of a given random variable, i.e., its deviation from the mathematical expectation. In statistics, the notation or is often used. The square root of the variance is called the standard deviation, standard deviation, or standard spread.

Total variance (σ 2) measures the variation of a trait in its entirety under the influence of all factors that caused this variation. At the same time, thanks to the grouping method, it is possible to identify and measure the variation due to the grouping characteristic and the variation arising under the influence of unaccounted factors.

Intergroup variance (σ 2 m.gr) characterizes systematic variation, i.e., differences in the value of the studied characteristic that arise under the influence of the characteristic - the factor that forms the basis of the group.

Standard deviation(synonyms: standard deviation, standard deviation, square deviation; related terms: standard deviation, standard spread) - in probability theory and statistics, the most common indicator of the dispersion of the values ​​of a random variable relative to its mathematical expectation. With limited arrays of samples of values, instead of the mathematical expectation, the arithmetic mean of the set of samples is used.

The standard deviation is measured in units of the random variable itself and is used when calculating the standard error of the arithmetic mean, when constructing confidence intervals, when statistically testing hypotheses, when measuring the linear relationship between random variables. Defined as the square root of the variance of a random variable.


Standard deviation:

Standard deviation(estimate of the standard deviation of a random variable x relative to its mathematical expectation based on an unbiased estimate of its variance):

where is the dispersion; — i th element of the selection; — sample size; — arithmetic mean of the sample:

It should be noted that both estimates are biased. In the general case, it is impossible to construct an unbiased estimate. However, the estimate based on the unbiased variance estimate is consistent.

Essence, scope and procedure for determining mode and median.

In addition to power averages in statistics, for the relative characterization of the value of a varying characteristic and the internal structure of distribution series, structural averages are used, which are mainly represented by fashion and median.

Fashion- This is the most common variant of the series. Fashion is used, for example, in determining the size of clothes and shoes that are most in demand among customers. The mode for a discrete series is the one with the highest frequency. When calculating the mode for an interval variation series, you must first determine the modal interval (based on the maximum frequency), and then the value of the modal value of the attribute using the formula:

- - fashion value

- — lower limit of the modal interval

- — interval size

- — modal interval frequency

- — frequency of the interval preceding the modal

- — frequency of the interval following the modal

Median - this is the value of the attribute that underlies the ranked series and divides this series into two equal parts.

To determine the median in a discrete series in the presence of frequencies, first calculate the half-sum of frequencies and then determine which value of the variant falls on it. (If the sorted series contains an odd number of features, then the median number is calculated using the formula:

M e = (n (number of features in total) + 1)/2,

in the case of an even number of features, the median will be equal to the average of the two features in the middle of the row).

When calculating medians for an interval variation series, first determine the median interval within which the median is located, and then determine the value of the median using the formula:

- — the required median

- - lower limit of the interval that contains the median

- — interval size

- — sum of frequencies or number of series terms

Sum of accumulated frequencies of intervals preceding the median

- — frequency of the median interval

Example. Find the mode and median.

Solution:
In this example, the modal interval is within the age group of 25-30 years, since this interval has the highest frequency (1054).

Let's calculate the magnitude of the mode:

This means that the modal age of students is 27 years.

Let's calculate the median. The median interval is in the age group of 25-30 years, since within this interval there is an option that divides the population into two equal parts (Σf i /2 = 3462/2 = 1731). Next, we substitute the necessary numerical data into the formula and get the median value:

This means that one half of the students are under 27.4 years old, and the other half are over 27.4 years old.

In addition to mode and median, indicators such as quartiles can be used, dividing the ranked series into 4 equal parts, deciles- 10 parts and percentiles - per 100 parts.

The concept of selective observation and its scope.

Selective observation applies when the use of continuous surveillance physically impossible due to a large amount of data or not economically feasible. Physical impossibility occurs, for example, when studying passenger flows, market prices, and family budgets. Economic inexpediency occurs when assessing the quality of goods associated with their destruction, for example, tasting, testing bricks for strength, etc.

The statistical units selected for observation constitute the sampling frame or sample, and their entire array constitutes the general population (GS). In this case, the number of units in the sample is denoted by n, and in the entire HS - N. Attitude n/N called the relative size or proportion of the sample.

The quality of the results of sample observation depends on the representativeness of the sample, that is, on how representative it is in the HS. To ensure representativeness of the sample, it is necessary to comply principle of random selection of units, which assumes that the inclusion of a HS unit in the sample cannot be influenced by any factor other than chance.

Exists 4 ways of random selection to sample:

  1. Actually random selection or the “lotto method”, when statistical quantities are assigned serial numbers, recorded on certain objects (for example, barrels), which are then mixed in some container (for example, in a bag) and selected at random. In practice, this method is carried out using a random number generator or mathematical tables of random numbers.
  2. Mechanical selection according to which each ( N/n)-th value of the general population. For example, if it contains 100,000 values, and you need to select 1,000, then every 100,000 / 1000 = 100th value will be included in the sample. Moreover, if they are not ranked, then the first one is selected at random from the first hundred, and the numbers of the others will be one hundred higher. For example, if the first unit was No. 19, then the next one should be No. 119, then No. 219, then No. 319, etc. If the population units are ranked, then No. 50 is selected first, then No. 150, then No. 250, and so on.
  3. Selection of values ​​from a heterogeneous data array is carried out stratified(stratified) method, when the population is first divided into homogeneous groups to which random or mechanical selection is applied.
  4. A special sampling method is serial selection, in which they randomly or mechanically select not individual values, but their series (sequences from some number to some number in a row), within which continuous observation is carried out.

The quality of sample observations also depends on sample type: repeated or unrepeatable.

At re-selection Statistical values ​​or their series included in the sample are returned to the general population after use, having a chance to be included in a new sample. Moreover, all values ​​in the population have the same probability of inclusion in the sample.

Repeatless selection means that the statistical values ​​or their series included in the sample do not return to the general population after use, and therefore for the remaining values ​​of the latter the probability of being included in the next sample increases.

Non-repetitive sampling gives more accurate results, so it is used more often. But there are situations when it cannot be applied (studying passenger flows, consumer demand, etc.) and then a repeated selection is carried out.

Maximum observation sampling error, average sampling error, procedure for their calculation.

Let us consider in detail the methods of forming a sample population listed above and the errors that arise when doing so. representativeness .
Properly random sampling is based on selecting units from the population at random without any systematic elements. Technically, actual random selection is carried out by drawing lots (for example, lotteries) or using a table of random numbers.

Proper random selection “in its pure form” is rarely used in the practice of selective observation, but it is the original among other types of selection, it implements the basic principles of selective observation. Let's consider some questions of the theory of the sampling method and the error formula for a simple random sample.

Sampling bias is the difference between the value of the parameter in the general population and its value calculated from the results of sample observation. For an average quantitative characteristic, the sampling error is determined by

The indicator is called the marginal sampling error.
The sample mean is a random variable that can take on different values ​​depending on which units are included in the sample. Therefore, sampling errors are also random variables and can take on different values. Therefore, the average of possible errors is determined - average sampling error, which depends on:

Sample size: the larger the number, the smaller the average error;

The degree of change in the characteristic being studied: the smaller the variation of the characteristic, and, consequently, the dispersion, the smaller the average sampling error.

At random re-selection the average error is calculated:
.
In practice, the general variance is not precisely known, but in probability theory it has been proven that
.
Since the value for sufficiently large n is close to 1, we can assume that . Then the average sampling error can be calculated:
.
But in cases of a small sample (with n<30) коэффициент необходимо учитывать, и среднюю ошибку малой выборки рассчитывать по формуле
.

At random non-repetitive sampling the given formulas are adjusted by the value . Then the average non-repetitive sampling error is:
And .
Because is always less, then the multiplier () is always less than 1. This means that the average error during non-repetitive selection is always less than during repeated selection.
Mechanical sampling is used when the general population is ordered in some way (for example, alphabetical voter lists, telephone numbers, house numbers, apartment numbers). The selection of units is carried out at a certain interval, which is equal to the inverse of the sampling percentage. So, with a 2% sample, every 50 unit = 1/0.02 is selected, with a 5% sample, every 1/0.05 = 20 unit of the general population.

The reference point is selected in different ways: randomly, from the middle of the interval, with a change in the reference point. The main thing is to avoid systematic error. For example, with a 5% sample, if the first unit is the 13th, then the next ones are 33, 53, 73, etc.

In terms of accuracy, mechanical selection is close to actual random sampling. Therefore, to determine the average error of mechanical sampling, proper random selection formulas are used.

At typical selection the population being surveyed is preliminarily divided into homogeneous, similar groups. For example, when surveying enterprises, these can be industries, sub-sectors; when studying the population, these can be regions, social or age groups. Then an independent selection from each group is made mechanically or purely randomly.

Typical sampling produces more accurate results than other methods. Typing the general population ensures that each typological group is represented in the sample, which makes it possible to eliminate the influence of intergroup variance on the average sampling error. Consequently, when finding the error of a typical sample according to the rule of adding variances (), it is necessary to take into account only the average of the group variances. Then the average sampling error is:
upon re-selection
,
with non-repetitive selection
,
Where - the average of the within-group variances in the sample.

Serial (or nest) selection used when the population is divided into series or groups before the start of the sample survey. These series can be packaging of finished products, student groups, teams. Series for examination are selected mechanically or purely randomly, and within the series a continuous examination of units is carried out. Therefore, the average sampling error depends only on the intergroup (interseries) variance, which is calculated by the formula:

where r is the number of selected series;
- average of the i-th series.

The average serial sampling error is calculated:

upon re-selection:
,
with non-repetitive selection:
,
where R is the total number of episodes.

Combined selection is a combination of the considered selection methods.

The average sampling error for any sampling method depends mainly on the absolute size of the sample and, to a lesser extent, on the percentage of the sample. Let us assume that 225 observations are made in the first case from a population of 4,500 units and in the second from a population of 225,000 units. The variances in both cases are equal to 25. Then in the first case, with a 5% selection, the sampling error will be:

In the second case, with 0.1% selection, it will be equal to:


Thus, with a decrease in the sampling percentage by 50 times, the sampling error increased slightly, since the sample size did not change.
Let's assume that the sample size is increased to 625 observations. In this case, the sampling error is:

Increasing the sample by 2.8 times with the same population size reduces the size of the sampling error by more than 1.6 times.

Methods and methods for forming a sample population.

In statistics, various methods of forming sample populations are used, which is determined by the objectives of the study and depends on the specifics of the object of study.

The main condition for conducting a sample survey is to prevent the occurrence of systematic errors arising from violation of the principle of equal opportunity for each unit of the general population to be included in the sample. Prevention of systematic errors is achieved through the use of scientifically based methods for forming a sample population.

There are the following methods for selecting units from the population:

1) individual selection - individual units are selected for the sample;

2) group selection - the sample includes qualitatively homogeneous groups or series of units being studied;

3) combined selection is a combination of individual and group selection.
Selection methods are determined by the rules for forming a sample population.

The sample could be:

  • actually random consists in the fact that the sample population is formed as a result of random (unintentional) selection of individual units from the general population. In this case, the number of units selected in the sample population is usually determined based on the accepted sample proportion. The sample proportion is the ratio of the number of units in the sample population n to the number of units in the general population N, i.e.
  • mechanical consists in the fact that the selection of units in the sample population is made from the general population, divided into equal intervals (groups). In this case, the size of the interval in the population is equal to the inverse of the sample proportion. So, with a 2% sample, every 50th unit is selected (1:0.02), with a 5% sample, every 20th unit (1:0.05), etc. Thus, in accordance with the accepted proportion of selection, the general population is, as it were, mechanically divided into groups of equal size. From each group, only one unit is selected for the sample.
  • typical - in which the general population is first divided into homogeneous typical groups. Then, from each typical group, a purely random or mechanical sample is used to individually select units into the sample population. An important feature of a typical sample is that it gives more accurate results compared to other methods of selecting units in the sample population;
  • serial- in which the general population is divided into groups of equal size - series. Series are selected into the sample population. Within the series, continuous observation of the units included in the series is carried out;
  • combined- sampling can be two-stage. In this case, the population is first divided into groups. Then the groups are selected, and within the latter the individual units are selected.

In statistics, the following methods are distinguished for selecting units in a sample population::

  • single stage sampling - each selected unit is immediately subjected to study according to a given criterion (proper random and serial sampling);
  • multi-stage sampling - a selection is made from the general population of individual groups, and individual units are selected from the groups (typical sampling with a mechanical method of selecting units into the sample population).

In addition, there are:

  • re-selection- according to the scheme of the returned ball. In this case, each unit or series included in the sample is returned to the general population and therefore has a chance to be included in the sample again;
  • repeat selection- according to the unreturned ball scheme. It has more accurate results with the same sample size.

Determining the required sample size (using a Student's t-table).

One of the scientific principles in sampling theory is to ensure that a sufficient number of units are selected. Theoretically, the need to comply with this principle is presented in the proofs of limit theorems in probability theory, which make it possible to establish what volume of units should be selected from the population so that it is sufficient and ensures the representativeness of the sample.

A decrease in the standard sampling error, and therefore an increase in the accuracy of the estimate, is always associated with an increase in the sample size, therefore, already at the stage of organizing sample observation, it is necessary to decide what the size of the sample population should be in order to ensure the required accuracy of the observation results. The calculation of the required sample size is constructed using formulas derived from the formulas for the maximum sampling errors (A), corresponding to a particular type and method of selection. So, for a random repeated sample size (n) we have:

The essence of this formula is that with a random repeated selection of the required number, the sample size is directly proportional to the square of the confidence coefficient (t2) and variance of the variational characteristic (?2) and is inversely proportional to the square of the maximum sampling error (?2). In particular, with an increase in the maximum error by a factor of two, the required sample size can be reduced by a factor of four. Of the three parameters, two (t and?) are set by the researcher.

At the same time, the researcher, based on From the purpose and objectives of the sample survey, the question must be resolved: in what quantitative combination is it better to include these parameters to ensure the optimal option? In one case, he may be more satisfied with the reliability of the results obtained (t) than with the measure of accuracy (?), in another - vice versa. It is more difficult to resolve the issue regarding the value of the maximum sampling error, since the researcher does not have this indicator at the stage of designing the sample observation, therefore in practice it is customary to set the value of the maximum sampling error, usually within 10% of the expected average level of the attribute. Establishing the estimated average can be approached in different ways: using data from similar previous surveys, or using data from the sampling frame and conducting a small pilot sample.

The most difficult thing to establish when designing a sample observation is the third parameter in formula (5.2) - the dispersion of the sample population. In this case, it is necessary to use all the information at the disposal of the researcher, obtained in previously conducted similar and pilot surveys.

Question about definition the required sample size becomes more complicated if the sampling survey involves studying several characteristics of sampling units. In this case, the average levels of each of the characteristics and their variation, as a rule, are different, and therefore, deciding which variance of which of the characteristics to give preference to is possible only taking into account the purpose and objectives of the survey.

When designing a sample observation, a predetermined value of the permissible sampling error is assumed in accordance with the objectives of a particular study and the probability of conclusions based on the observation results.

In general, the formula for the maximum error of the sample average allows us to determine:

The magnitude of possible deviations of the general population indicators from the sample population indicators;

The required sample size, ensuring the required accuracy, at which the limits of possible error will not exceed a certain specified value;

The probability that the error in a sample will have a specified limit.

Student distribution in probability theory, it is a one-parameter family of absolutely continuous distributions.

Dynamic series (interval, moment), closing dynamic series.

Dynamics series- these are the values ​​of statistical indicators that are presented in a certain chronological sequence.

Each time series contains two components:

1) indicators of time periods (years, quarters, months, days or dates);

2) indicators characterizing the object under study for time periods or on corresponding dates, which are called series levels.

The levels of the series are expressed both absolute and average or relative values. Depending on the nature of the indicators, time series of absolute, relative and average values ​​are constructed. Dynamic series from relative and average values ​​are constructed on the basis of derived series of absolute values. There are interval and moment series of dynamics.

Dynamic interval series contains indicator values ​​for certain periods of time. In an interval series, levels can be summed up to obtain the volume of the phenomenon over a longer period, or the so-called accumulated totals.

Dynamic moment series reflects the values ​​of indicators at a certain point in time (date of time). In moment series, the researcher may only be interested in the difference in phenomena that reflects the change in the level of the series between certain dates, since the sum of the levels here has no real content. Cumulative totals are not calculated here.

The most important condition for the correct construction of time series is the comparability of the levels of the series belonging to different periods. The levels must be presented in homogeneous quantities, and there must be equal completeness of coverage of different parts of the phenomenon.

In order to To avoid distortion of real dynamics, in a statistical study preliminary calculations are carried out (closing the dynamics series), which precede the statistical analysis of the time series. The closure of dynamic series is understood as the combination into one series of two or more series, the levels of which are calculated using different methodology or do not correspond to territorial boundaries, etc. Closing the dynamics series may also imply bringing the absolute levels of the dynamics series to a common basis, which neutralizes the incomparability of the levels of the dynamics series.

The concept of comparability of dynamics series, coefficients, growth and growth rates.

Dynamics series- these are a series of statistical indicators characterizing the development of natural and social phenomena over time. Statistical collections published by the State Statistics Committee of Russia contain a large number of dynamics series in tabular form. Dynamic series make it possible to identify patterns of development of the phenomena being studied.

Dynamics series contain two types of indicators. Time indicators(years, quarters, months, etc.) or points in time (at the beginning of the year, at the beginning of each month, etc.). Row level indicators. Indicators of the levels of dynamics series can be expressed in absolute values ​​(product production in tons or rubles), relative values ​​(share of the urban population in %) and average values ​​(average wages of industry workers by year, etc.). In tabular form, a time series contains two columns or two rows.

Correct construction of time series requires the fulfillment of a number of requirements:

  1. all indicators of a series of dynamics must be scientifically based and reliable;
  2. indicators of a series of dynamics must be comparable over time, i.e. must be calculated for the same periods of time or on the same dates;
  3. indicators of a number of dynamics must be comparable across the territory;
  4. indicators of a series of dynamics must be comparable in content, i.e. calculated according to a single methodology, in the same way;
  5. indicators of a number of dynamics should be comparable across the range of farms taken into account. All indicators of a series of dynamics must be given in the same units of measurement.

Statistical indicators can characterize either the results of the process being studied over a period of time, or the state of the phenomenon being studied at a certain point in time, i.e. indicators can be interval (periodic) and momentary. Accordingly, initially the dynamics series can be either interval or moment. Moment dynamics series, in turn, can be with equal or unequal time intervals.

The original dynamics series can be transformed into a series of average values ​​and a series of relative values ​​(chain and basic). Such time series are called derived time series.

The methodology for calculating the average level in the dynamics series is different, depending on the type of the dynamics series. Using examples, we will consider the types of dynamics series and formulas for calculating the average level.

Absolute increases (Δy) show how many units the subsequent level of the series has changed compared to the previous one (gr. 3. - chain absolute increases) or compared to the initial level (gr. 4. - basic absolute increases). The calculation formulas can be written as follows:

When the absolute values ​​of the series decrease, there will be a “decrease” or “decrease”, respectively.

Indicators of absolute growth indicate that, for example, in 1998, the production of product “A” increased by 4 thousand tons compared to 1997, and by 34 thousand tons compared to 1994; for other years, see table. 11.5 gr. 3 and 4.

Growth rate shows how many times the level of the series has changed compared to the previous one (gr. 5 - chain coefficients of growth or decline) or compared to the initial level (gr. 6 - basic coefficients of growth or decline). The calculation formulas can be written as follows:

Rates of growth show what percentage the next level of the series is compared to the previous one (gr. 7 - chain growth rates) or compared to the initial level (gr. 8 - basic growth rates). The calculation formulas can be written as follows:

So, for example, in 1997, the production volume of product “A” compared to 1996 was 105.5% (

Growth rate show by what percentage the level of the reporting period increased compared to the previous one (column 9 - chain growth rates) or compared to the initial level (column 10 - basic growth rates). The calculation formulas can be written as follows:

T pr = T r - 100% or T pr = absolute growth / level of the previous period * 100%

So, for example, in 1996, compared to 1995, product “A” was produced by 3.8% (103.8% - 100%) or (8:210)x100% more, and compared to 1994 - by 9% (109% - 100%).

If the absolute levels in the series decrease, then the rate will be less than 100% and, accordingly, there will be a rate of decline (the rate of increase with a minus sign).

Absolute value of 1% increase(column 11) shows how many units must be produced in a given period so that the level of the previous period increases by 1%. In our example, in 1995 it was necessary to produce 2.0 thousand tons, and in 1998 - 2.3 thousand tons, i.e. much bigger.

The absolute value of 1% growth can be determined in two ways:

The level of the previous period is divided by 100;

Chain absolute increases are divided by the corresponding chain growth rates.

Absolute value of 1% increase =

In dynamics, especially over a long period, a joint analysis of the growth rate with the content of each percentage increase or decrease is important.

Note that the considered methodology for analyzing time series is applicable both for time series, the levels of which are expressed in absolute values ​​(t, thousand rubles, number of employees, etc.), and for time series, the levels of which are expressed in relative indicators (% of defects , % ash content of coal, etc.) or average values ​​(average yield in c/ha, average wage, etc.).

Along with the considered analytical indicators, calculated for each year in comparison with the previous or initial level, when analyzing dynamics series, it is necessary to calculate the average analytical indicators for the period: the average level of the series, the average annual absolute increase (decrease) and the average annual growth rate and growth rate.

Methods for calculating the average level of a series of dynamics were discussed above. In the interval dynamics series we are considering, the average level of the series is calculated using the simple arithmetic mean formula:

Average annual production volume of the product for 1994-1998. amounted to 218.4 thousand tons.

The average annual absolute growth is also calculated using the simple arithmetic average formula:

Annual absolute increases varied over the years from 4 to 12 thousand tons (see column 3), and the average annual increase in production for the period 1995 - 1998. amounted to 8.5 thousand tons.

Methods for calculating the average growth rate and average growth rate require more detailed consideration. Let us consider them using the example of the annual series level indicators given in the table.

Average level of the dynamics series.

Dynamic series (or time series)- these are the numerical values ​​of a certain statistical indicator at successive moments or periods of time (i.e., arranged in chronological order).

The numerical values ​​of one or another statistical indicator that makes up the dynamics series are called series levels and is usually denoted by the letter y. First term of the series y 1 called initial or basic level, and the last one y n - final. The moments or periods of time to which the levels relate are designated by t.

Dynamics series are usually presented in the form of a table or graph, and a time scale is constructed along the abscissa axis t, and along the ordinate axis - the scale of series levels y.

Average indicators of the dynamics series

Each series of dynamics can be considered as a certain set n time-varying indicators that can be summarized as averages. Such generalized (average) indicators are especially necessary when comparing changes in a particular indicator over different periods, in different countries, etc.

A generalized characteristic of the dynamics series can serve, first of all, middle row level. The method for calculating the average level depends on whether the series is momentary or interval (periodic).

When interval of a series, its average level is determined by the formula of a simple arithmetic average of the levels of the series, i.e.

=
If available moment row containing n levels ( y1, y2, …, yn) with equal intervals between dates (times), then such a series can be easily converted into a series of average values. In this case, the indicator (level) at the beginning of each period is simultaneously the indicator at the end of the previous period. Then the average value of the indicator for each period (the interval between dates) can be calculated as half the sum of the values at at the beginning and end of the period, i.e. How . The number of such averages will be . As stated earlier, for series of average values, the average level is calculated using the arithmetic mean.

Therefore, we can write:
.
After transforming the numerator we get:
,

Where Y1 And Yn— first and last levels of the row; Yi— intermediate levels.

This average is known in statistics as average chronological for moment series. It received its name from the word “cronos” (time, Latin), since it is calculated from indicators that change over time.

In case of unequal intervals between dates, the chronological average for a moment series can be calculated as the arithmetic mean of the average values ​​of levels for each pair of moments, weighted by the distances (time intervals) between dates, i.e.
.
In this case it is assumed that in the intervals between dates the levels took on different values, and we are one of two known ( yi And yi+1) we determine the averages, from which we then calculate the overall average for the entire analyzed period.
If it is assumed that each value yi remains unchanged until the next (i+ 1)- th moment, i.e. If the exact date of change in levels is known, then the calculation can be carried out using the weighted arithmetic average formula:
,

where is the time during which the level remained unchanged.

In addition to the average level in the dynamics series, other average indicators are calculated - the average change in the levels of the series (basic and chain methods), the average rate of change.

Baseline mean absolute change is the quotient of the last underlying absolute change divided by the number of changes. That is

Chain mean absolute change levels of the series is the quotient of dividing the sum of all chain absolute changes by the number of changes, that is

The sign of average absolute changes is also used to judge the nature of the change in a phenomenon on average: growth, decline or stability.

From the rule for controlling basic and chain absolute changes it follows that the basic and chain average changes must be equal.

Along with the average absolute change, the relative average is also calculated using the basic and chain methods.

Baseline average relative change determined by the formula:

Chain average relative change determined by the formula:

Naturally, the basic and chain average relative changes must be the same, and by comparing them with the criterion value 1, a conclusion is drawn about the nature of the change in the phenomenon on average: growth, decline or stability.
By subtracting 1 from the base or chain average relative change, the corresponding average rate of change, by the sign of which one can also judge the nature of the change in the phenomenon under study, reflected by this series of dynamics.

Seasonal fluctuations and seasonality indices.

Seasonal fluctuations are stable intra-annual fluctuations.

The basic principle of management for obtaining maximum effect is to maximize income and minimize costs. By studying seasonal fluctuations, the problem of the maximum equation is solved at each level of the year.

When studying seasonal fluctuations, two interrelated problems are solved:

1. Identification of the specifics of the development of the phenomenon in intra-annual dynamics;

2. Measuring seasonal fluctuations with building a seasonal wave model;

To measure seasonal variation, seasonal turkeys are usually counted. In general, they are determined by the ratio of the initial equations of the dynamics series to the theoretical equations, which act as a basis for comparison.

Since random deviations are superimposed on seasonal fluctuations, seasonality indices are averaged to eliminate them.

In this case, for each period of the annual cycle, generalized indicators are determined in the form of average seasonal indices:

Average seasonal fluctuation indices are free from the influence of random deviations of the main development trend.

Depending on the nature of the trend, the formula for the average seasonality index can take the following forms:

1.For series of intra-annual dynamics with a clearly expressed main trend of development:

2. For series of intra-annual dynamics in which there is no increasing or decreasing trend or is insignificant:

Where is the overall average;

Methods for analyzing the main trend.

The development of phenomena over time is influenced by factors of different nature and strength of influence. Some of them are random in nature, others have an almost constant impact and form a certain development trend in the dynamics.

An important task of statistics is to identify trend dynamics in series, freed from the influence of various random factors. For this purpose, the time series are processed by the methods of enlarging intervals, moving average and analytical leveling, etc.

Interval enlargement method is based on the enlargement of time periods, which include the levels of a series of dynamics, i.e. is the replacement of data related to small time periods with data for larger periods. It is especially effective when the initial levels of the series relate to short periods of time. For example, series of indicators related to daily events are replaced by series related to weekly, monthly, etc. This will show more clearly “axis of development of the phenomenon”. The average, calculated over enlarged intervals, allows us to identify the direction and nature (acceleration or slowdown of growth) of the main development trend.

Moving average method similar to the previous one, but in this case the actual levels are replaced by average levels calculated for sequentially moving (sliding) enlarged intervals covering m series levels.

For example, if we accept m=3, then first the average of the first three levels of the series is calculated, then - from the same number of levels, but starting from the second, then - starting from the third, etc. Thus, the average “slides” along the dynamics series, moving by one term. Calculated from m members, moving averages refer to the middle (center) of each interval.

This method only eliminates random fluctuations. If the series has a seasonal wave, then it will persist even after smoothing using the moving average method.

Analytical alignment. In order to eliminate random fluctuations and identify a trend, leveling of series levels using analytical formulas (or analytical leveling) is used. Its essence is to replace empirical (actual) levels with theoretical ones, which are calculated using a certain equation adopted as a mathematical trend model, where theoretical levels are considered as a function of time: . In this case, each actual level is considered as the sum of two components: , where is a systematic component and expressed by a certain equation, and is a random variable that causes fluctuations around the trend.

The task of analytical alignment comes down to the following:

1. Determination, based on actual data, of the type of hypothetical function that can most adequately reflect the development trend of the indicator under study.

2. Finding the parameters of the specified function (equation) from empirical data

3. Calculation using the found equation of theoretical (aligned) levels.

The choice of a particular function is carried out, as a rule, on the basis of a graphical representation of empirical data.

The models are regression equations, the parameters of which are calculated using the least squares method

Below are the most commonly used regression equations for aligning time series, indicating which specific development trends they are most suitable for reflecting.

To find the parameters of the above equations, there are special algorithms and computer programs. In particular, to find the parameters of a straight line equation, the following algorithm can be used:

If the periods or moments of time are numbered so that St = 0, then the above algorithms will be significantly simplified and turn into

Aligned levels on the chart will be located on one straight line, passing at the closest distance from the actual levels of this dynamic series. The sum of squared deviations is a reflection of the influence of random factors.

Using it, we calculate the average (standard) error of the equation:

Here n is the number of observations, and m is the number of parameters in the equation (we have two of them - b 1 and b 0).

The main tendency (trend) shows how systematic factors influence the levels of a series of dynamics, and the fluctuation of levels around the trend () serves as a measure of the influence of residual factors.

To assess the quality of the time series model used, it is also used Fisher's F test. It is the ratio of two variances, namely the ratio of the variance caused by regression, i.e. the factor being studied, to the variance caused by random reasons, i.e. residual dispersion:

In expanded form, the formula for this criterion can be presented as follows:

where n is the number of observations, i.e. number of row levels,

m is the number of parameters in the equation, y is the actual level of the series,

Aligned row level - middle row level.

A model that is more successful than others may not always be sufficiently satisfactory. It can be recognized as such only in the case when its criterion F crosses the known critical limit. This boundary is established using F-distribution tables.

Essence and classification of indices.

In statistics, an index is understood as a relative indicator that characterizes the change in the magnitude of a phenomenon in time, space, or in comparison with any standard.

The main element of the index relation is the indexed value. An indexed value is understood as the value of a characteristic of a statistical population, the change of which is the object of study.

Using indexes, three main tasks are solved:

1) assessment of changes in a complex phenomenon;

2) determining the influence of individual factors on changes in a complex phenomenon;

3) comparison of the magnitude of a phenomenon with the magnitude of the past period, the magnitude of another territory, as well as with standards, plans, and forecasts.

Indices are classified according to 3 criteria:

2) according to the degree of coverage of the elements of the population;

3) according to methods for calculating general indices.

By content indexed quantities, the indices are divided into indices of quantitative (volume) indicators and indices of qualitative indicators. Indices of quantitative indicators - indices of the physical volume of industrial products, physical volume of sales, headcount, etc. Indices of qualitative indicators - indices of prices, costs, labor productivity, average wages, etc.

According to the degree of coverage of population units, indices are divided into two classes: individual and general. To characterize them, we introduce the following conventions adopted in the practice of using the index method:

q- quantity (volume) of any product in physical terms ; R- unit price; z- unit cost of production; t— time spent on producing a unit of product (labor intensity) ; w- production of products in value terms per unit of time; v- production output in physical terms per unit of time; T— total time spent or number of employees.

In order to distinguish which period or object the indexed quantities belong to, it is customary to place subscripts at the bottom right of the corresponding symbol. So, for example, in dynamics indices, as a rule, the subscript 1 is used for the periods being compared (current, reporting) and for the periods with which the comparison is made,

Individual indices serve to characterize changes in individual elements of a complex phenomenon (for example, a change in the volume of output of one type of product). They represent relative values ​​of dynamics, fulfillment of obligations, comparison of indexed values.

The individual index of the physical volume of products is determined

From an analytical point of view, the given individual dynamics indices are similar to growth coefficients (rates) and characterize the change in the indexed value in the current period compared to the base period, i.e. they show how many times it has increased (decreased) or what percentage it is growth (decrease). Index values ​​are expressed in coefficients or percentages.

General (composite) index reflects changes in all elements of a complex phenomenon.

Aggregate index is the basic form of an index. It is called aggregate because its numerator and denominator are a set of “aggregates”

Average indices, their definition.

In addition to aggregate indices, another form of them is used in statistics - weighted average indices. Their calculation is resorted to when the available information does not allow calculating the general aggregate index. Thus, if there is no data on prices, but there is information on the cost of products in the current period and individual price indices for each product are known, then the general price index cannot be determined as an aggregate one, but it is possible to calculate it as the average of the individual ones. In the same way, if the quantities of individual types of products produced are not known, but individual indices and the cost of production of the base period are known, then the general index of the physical volume of production can be determined as a weighted average value.

Average index - This an index calculated as the average of the individual indices. An aggregate index is the basic form of a general index, so the average index must be identical to the aggregate index. When calculating average indices, two forms of averages are used: arithmetic and harmonic.

The arithmetic average index is identical to the aggregate index if the weights of the individual indices are the terms of the denominator of the aggregate index. Only in this case, the value of the index calculated using the arithmetic average formula will be equal to the aggregate index.

In this article I will talk about how to find standard deviation. This material is extremely important for a full understanding of mathematics, so a math tutor should devote a separate lesson or even several to studying it. In this article you will find a link to a detailed and understandable video tutorial that explains what standard deviation is and how to find it.

Standard deviation makes it possible to evaluate the spread of values ​​obtained as a result of measuring a certain parameter. Indicated by the symbol (Greek letter "sigma").

The formula for calculation is quite simple. To find the standard deviation, you need to take the square root of the variance. So now you have to ask, “What is variance?”

What is variance

The definition of variance goes like this. Dispersion is the arithmetic mean of the squared deviations of values ​​from the mean.

To find the variance, perform the following calculations sequentially:

  • Determine the average (simple arithmetic average of a series of values).
  • Then subtract the average from each value and square the resulting difference (you get squared difference).
  • The next step is to calculate the arithmetic mean of the resulting squared differences (You can find out why exactly the squares below).

Let's look at an example. Let's say you and your friends decide to measure the height of your dogs (in millimeters). As a result of the measurements, you received the following height measurements (at the withers): 600 mm, 470 mm, 170 mm, 430 mm and 300 mm.

Let's calculate the mean, variance and standard deviation.

First let's find the average value. As you already know, to do this you need to add up all the measured values ​​and divide by the number of measurements. Calculation progress:

Average mm.

So, the average (arithmetic mean) is 394 mm.

Now we need to determine deviation of the height of each dog from the average:

Finally, to calculate variance, we square each of the resulting differences, and then find the arithmetic mean of the results obtained:

Dispersion mm 2 .

Thus, the dispersion is 21704 mm 2.

How to find standard deviation

So how can we now calculate the standard deviation, knowing the variance? As we remember, take the square root of it. That is, the standard deviation is equal to:

Mm (rounded to the nearest whole number in mm).

Using this method, we found that some dogs (for example, Rottweilers) are very large dogs. But there are also very small dogs (for example, dachshunds, but you shouldn’t tell them that).

The most interesting thing is that the standard deviation carries useful information. Now we can show which of the obtained height measurement results are within the interval that we get if we plot the standard deviation from the average (to both sides of it).

That is, using the standard deviation, we obtain a “standard” method that allows us to find out which of the values ​​is normal (statistical average), and which is extraordinarily large or, conversely, small.

What is standard deviation

But... everything will be a little different if we analyze sample data. In our example we considered general population. That is, our 5 dogs were the only dogs in the world that interested us.

But if the data is a sample (values ​​selected from a large population), then the calculations need to be done differently.

If there are values, then:

All other calculations are carried out similarly, including the determination of the average.

For example, if our five dogs are just a sample of the population of dogs (all dogs on the planet), we must divide by 4, not 5, namely:

Sample variance = mm 2.

In this case, the standard deviation for the sample is equal to mm (rounded to the nearest whole number).

We can say that we have made some “correction” in the case where our values ​​are just a small sample.

Note. Why exactly squared differences?

But why do we take exactly the squared differences when calculating the variance? Let's say when measuring some parameter, you received the following set of values: 4; 4; -4; -4. If we simply add the absolute deviations from the mean (differences) together... the negative values ​​cancel out with the positive ones:

.

It turns out that this option is useless. Then maybe it’s worth trying the absolute values ​​of the deviations (that is, the modules of these values)?

At first glance, it turns out well (the resulting value, by the way, is called the mean absolute deviation), but not in all cases. Let's try another example. Let the measurement result in the following set of values: 7; 1; -6; -2. Then the average absolute deviation is:

Wow! Again we got a result of 4, although the differences have a much larger spread.

Now let's see what happens if we square the differences (and then take the square root of their sum).

For the first example it will be:

.

For the second example it will be:

Now it’s a completely different matter! The greater the spread of the differences, the greater the standard deviation is... which is what we were aiming for.

In fact, this method uses the same idea as when calculating the distance between points, only applied in a different way.

And from a mathematical point of view, using squares and square roots provides more benefits than we could get from absolute deviation values, making standard deviation applicable to other mathematical problems.

Sergey Valerievich told you how to find the standard deviation

When choosing units of observation, bias errors are possible, i.e. such events, the occurrence of which cannot be accurately predicted. These errors are objective and natural. When determining the degree of accuracy of a sampling study, the amount of error that can occur during the sampling process is assessed. Such errors are called random errors R e etc without entatively With you(m),

In practice, to determine the average sampling error when conducting statistical research, the following Formulas are used:

1) to calculate the average error (m m) of the average value (M):

, where σ is the standard deviation;

n - sample size.

This is for a large sample, and for a small n-1

92 Standard deviation. Calculation method, application in the work of a doctor.

An approximate method for assessing the variability of a variation series is to determine the limit, i.e. minimum and maximum values ​​of a quantitative characteristic, and amplitude - i.e. the difference between the largest and smallest value option (Vmax - Vmin). However, the limit and amplitude do not take into account the variant values ​​within the series.

The main generally accepted measure of the variability of a quantitative characteristic within a variation series is σ - sigma).

The average duration of treatment in both hospitals is the same , however, the variation was greater at the second hospital.

The method for calculating the standard deviation includes the following steps:

2. Determine the deviations of individual options from the arithmetic mean (V-M=d). In medical statistics, deviations from the average are designated as d (deviate). The sum of all deviations is equal to zero (column 3 . table 5).

3. Square each deviation (column 4 . table 5).

4. Multiply the squares of the deviations by the corresponding frequencies d2*p (column 5, table 5).

5. Calculate the standard deviation using the formula:

when n is greater than 30, or
. when n is less than or equal to 30, where n is the number of all options

The method for calculating the standard deviation is given in Table 5.

The standard deviation allows you to establish the degree of typicality of the average , range scattering limits, compare the variability of several distribution rows. , coefficient of variation (Cv)

Table 5

Number of days V

Number of patients Ρ

M=20 n=95 Σ=252

Example: according to a special study, the average height of 7-year-old boys in city N was 117.7 cm (σ=5 . 1 cm) , and the average weight is 21.7 kg (σ = 2.4 kg). It is impossible to assess the variability of height and weight by comparing standard deviations, since weight and height are named quantities. Therefore, a relative value is used - the coefficient of variation:

,

Comparison of coefficients of variation for height (4.3%) and weight (11.2%) shows , that weight has a higher coefficient of variation, therefore, is a less stable feature.

The higher the coefficient of variation ,

Average values ​​are widely used in the daily work of healthcare workers. They are used to characterize Physical Development , main anthropometric characteristics: height, weight . chest circumference , dynamometry, etc. Average values ​​are used to assess the patient’s condition by analyzing physiological , biochemical changes in the body: blood pressure levels , heart rate . body temperature, level of biochemical indicators , hormone content, etc. Average values ​​are widely used in analyzing the activities of medical institutions, for example: when analyzing the work of hospitals, indicators of the average annual bed occupancy, the average length of stay of a patient in a bed, etc. are calculated.

standard deviation (σ - sigma)

1. Find the arithmetic mean (M).

The value of the standard deviation is usually used to compare the variability of series of the same type. If two series with different characteristics are compared (height and weight, average duration of hospital treatment and hospital mortality, etc.), then a direct comparison of sigma sizes is impossible , because standard deviation is a named value expressed in absolute numbers. In these cases, use the coefficient of variation (Cv) , which is a relative value: the percentage ratio of the standard deviation to the arithmetic mean.

The coefficient of variation is calculated using the formula:

The higher the coefficient of variation , the greater the variability of this series. It is believed that a coefficient of variation of more than 30% indicates the qualitative heterogeneity of the population.

81. Standard deviation, calculation method, application.

An approximate method for assessing the variability of a variation series is to determine the limit and amplitude, but the values ​​of the variant within the series are not taken into account. The main generally accepted measure of the variability of a quantitative characteristic within a variation series is standard deviation (σ - sigma). The larger the standard deviation, the higher the degree of fluctuation of this series.

The method for calculating the standard deviation includes the following steps:

1. Find the arithmetic mean (M).

2. Determine the deviations of individual options from the arithmetic mean (d=V-M). In medical statistics, deviations from the average are designated as d (deviate). The sum of all deviations is zero.

3. Square each deviation d2.

4. Multiply the squares of the deviations by the corresponding frequencies d2*p.

5. Find the sum of products (d2*p)

6. Calculate the standard deviation using the formula:

When n is greater than 30, or when n is less than or equal to 30, where n is the number of all options.

Standard deviation value:

1. The standard deviation characterizes the spread of the variant relative to the average value (i.e., the variability of the variation series). The larger the sigma, the higher the degree of diversity of this series.

2. The standard deviation is used for a comparative assessment of the degree of correspondence of the arithmetic mean to the variation series for which it was calculated.

Variations of mass phenomena obey the law of normal distribution. The curve representing this distribution looks like a smooth bell-shaped symmetrical curve (Gaussian curve). According to the theory of probability, in phenomena that obey the law of normal distribution, there is a strict mathematical relationship between the values ​​of the arithmetic mean and the standard deviation. The theoretical distribution of a variant in a homogeneous variation series obeys the three-sigma rule.

If in a system of rectangular coordinates the values ​​of a quantitative characteristic (variants) are plotted on the abscissa axis, and the frequency of occurrence of a variant in a variation series is plotted on the ordinate axis, then variants with larger and smaller values ​​are evenly located on the sides of the arithmetic mean.

It has been established that with a normal distribution of the trait:

68.3% of the values ​​of the option are within M1

95.5% of the values ​​of the option are within M2

99.7% of the values ​​of the option are within M3

3. The standard deviation allows you to establish normal values ​​for clinical and biological parameters. In medicine, the interval M1 is usually taken as the normal range for the phenomenon being studied. The deviation of the estimated value from the arithmetic mean by more than 1 indicates a deviation of the studied parameter from the norm.

4. In medicine, the three-sigma rule is used in pediatrics for individual assessment of the level of physical development of children (sigma deviation method), for the development of standards for children's clothing

5. The standard deviation is necessary to characterize the degree of diversity of the characteristic being studied and to calculate the error of the arithmetic mean.

The value of the standard deviation is usually used to compare the variability of series of the same type. If two series with different characteristics are compared (height and weight, average duration of hospital treatment and hospital mortality, etc.), then a direct comparison of sigma sizes is impossible , because standard deviation is a named value expressed in absolute numbers. In these cases, use the coefficient of variation (Cv) , which is a relative value: the percentage ratio of the standard deviation to the arithmetic mean.

The coefficient of variation is calculated using the formula:

The higher the coefficient of variation , the greater the variability of this series. It is believed that a coefficient of variation of more than 30% indicates the qualitative heterogeneity of the population.

Arithmetic mean and harmonic mean

The essence and meaning of average values, their types

The most common form of statistical indicator is average magnitude. An indicator in the form of an average value expresses the typical level of a characteristic in the aggregate. The widespread use of average values ​​is explained by the fact that they allow one to compare the values ​​of a characteristic among units belonging to different populations. For example, you can compare the average length of a working day, the average wage category of workers, and the average wage level for different enterprises.

The essence of average values ​​is that they cancel out deviations in the values ​​of a characteristic in individual units of the population, caused by the action of random factors. Therefore, averages must be calculated for sufficiently large populations (in accordance with the law of large numbers). The reliability of average values ​​also depends on the variability of the attribute values ​​in the aggregate. In general, the smaller the variation of a characteristic and the larger the population from which the average value is determined, the more reliable it is.

The typicality of the average value is also directly related to homogeneity of the statistical population. The average value will only reflect the typical level of the attribute when it is calculated from a qualitatively homogeneous population. Otherwise, the average method is used in combination with the grouping method. If the population is heterogeneous, then the general averages are replaced or supplemented by group averages calculated for qualitatively homogeneous groups.

Selecting the type of averages is determined by the economic content of the indicator under study and the source data. The following types of averages are most often used in statistics: power averages (arithmetic, harmonic, geometric, quadratic, cubic, etc.), chronological average, and structural averages (mode and median).

Arithmetic mean most often found in socio-economic research. The arithmetic mean is used in the form of simple average and weighted average.

Calculated from ungrouped data based on formula (4.1):

Where x- individual values ​​of the characteristic (options);

n- the number of units in the population.

Example. It is required to find the average output of a worker in a team consisting of 15 people, if the number of products produced by one worker (pieces) is known: 21; 20; 20; 19; 21; 19; 18; 22; 19; 20; 21; 20; 18; 19; 20.

Simple arithmetic mean calculated from ungrouped data based on formula (4.2):

where f is the frequency of repetition of the corresponding value of the attribute (variant);

∑f is the total number of population units (∑f = n).

Example. Based on the available data on the distribution of workers in a team according to the number of products they produce, it is necessary to find the average output of a worker in the team.

Note 1. The average value of a characteristic in the aggregate can be calculated both on the basis of individual values ​​of the characteristic and on the basis of group (private) averages calculated for individual parts of the population. In this case, the arithmetic weighted average formula is used, and group (partial) averages ( x j).

Example. There is data on the average length of service of workers in the plant's workshops. It is required to determine the average length of service of workers for the plant as a whole.

Note 2. In the case when the values ​​of the characteristic being averaged are specified in the form of intervals, when calculating the arithmetic mean value, the average values ​​of these intervals are taken as the values ​​of the characteristic in groups ( X’). Thus, the interval series is converted into a discrete series. In this case, the value of open intervals, if any (as a rule, these are the first and last), is conditionally equated to the value of the intervals adjacent to them.

Example. There is data on the distribution of enterprise workers by wage level.

Harmonic mean value is a modification of the arithmetic mean. It is used in cases where individual values ​​of a characteristic are known, i.e. variants ( x), and the product of the variant and the frequency (xf = M), but the frequencies themselves are unknown ( f).

The weighted harmonic mean is calculated using formula (4.3):

Example. It is required to determine the average wages of employees of an association consisting of three enterprises, if the wage fund and the average wages of employees for each enterprise are known.

The harmonic mean, which is simple in statistics practice, is used extremely rarely. In cases where xf = Mm = const, the weighted harmonic mean turns into a simple harmonic mean (4.4):

Example. Two cars traveled the same route. At the same time, one of them was moving at a speed of 60 km/h, the second - at a speed of 80 km/h. It is required to determine the average speed of the cars along the way.

Other types of power averages. Average chronological

The geometric mean is used to calculate the average dynamics. The geometric mean is used in the form of a simple average (for ungrouped data) and a weighted average (for grouped data).

Geometric mean simple (4.5):

where n is the number of attribute values;

P - product sign.

Weighted geometric mean (4.6):

Root mean square value used when calculating variation indices. It is used in a simple and weighted form.

Simple mean square (4.7):

Weighted mean square (4.8):

The cubic average is used to calculate skewness and kurtosis. It is used in simple weighed form.

Average cubic simple (4.9): the mode is determined quite simply - by the maximum frequency. In an interval variation series, the mode approximately corresponds to the center of the modal interval, that is, the interval that has a high frequency (frequency). frequency of the interval following the modal.

The median (Me) is the value of the attribute located in the middle of the ranked series. By ranked we mean a series ordered in ascending or descending order of attribute values. The median divides the ranked series into two parts, one of which has attribute values ​​no greater than the median, and the other no less.

For a ranked series with an odd number of terms, the median is the option located in the center of the series. The position of the median is determined by the serial number of the unit of the series in accordance with formula (4.13):

where n is the number of members of the ranked series.

For a ranked series with an even number of terms, the median is the arithmetic mean of two adjacent values ​​located in the center of the series.

Median interval frequency.

Example. Work team consisting of 9 people, have the following tariffs digits: 4; 3; 4; 5; 3; 3; 6; 2;6. It is required to determine the modal and median values ​​of the tariff category.

Since this brigade has the most workers of the 3rd category, this category will be modal, i.e. Mo = 3.

To determine the median Let's rank the original series in ascending order of attribute values:

2; 3; 3; 3; 4; 4; 5; 6; 6.

The central value in this series is the fifth value of the attribute. Accordingly, Me = 4.

Example. It is required to determine the modal and median tariff category of factory workers based on the data from the following distribution row.

Since the original distribution series is discrete, the modal value is determined by the maximum frequency indicator. In this example, the plant has the most workers of the 3rd category (f max = 30), i.e. this discharge is modal (Mo = 3).

Let's determine the position of the median. The initial distribution series is built on the basis of a ranked series, ordered by increasing values ​​of the attribute. The middle of the series is between the 50th and 51st serial numbers of the attribute values. Let's find out which group the workers with these serial numbers belong to. To do this, we calculate the accumulated frequencies. The accumulated frequencies indicate that the median value of the tariff category is equal to three (Me = 3), since the values ​​of the characteristic with serial numbers from 39 to 68, including 50 and 51, are equal to 3.

Example. It is required to determine the modal and median wages of factory workers based on the data from the following distribution series.

Since the original distribution series is interval, the modal value of wages is calculated using the formula. In this case, the modal interval is 360-420 with a maximum frequency of 30.

The median salary value is also calculated using the formula. In this case, the median is the interval 360-420, the accumulated frequency of which is 70, while the accumulated frequency of the previous interval was only 40 with a total number of units equal to 100.