Method of scatter and precision diagrams. Basic statistical parameters of a large and small sample population and their characteristics Small sample formula

When controlling the quality of goods in economic research, an experiment can be conducted on the basis of a small sample.

Under small sample refers to a non-continuous statistical survey in which the sample population is formed from a relatively small number of units in the general population. The volume of a small sample usually does not exceed 30 units and can reach 4 - 5 units.

The average error of a small sample is calculated using the formula:

,

Where
- small sample variance.

When determining variance the number of degrees of freedom is n-1:

.

Marginal small sample error
determined by the formula

In this case, the value of the confidence coefficient t depends not only on the given confidence probability, but also on the number of sampling units n. For individual values ​​of t and n, the confidence probability of a small sample is determined using special Student tables (Table 9.1.), which give the distribution of standardized deviations:

.

Since when conducting a small sample, the value of 0.59 or 0.99 is practically accepted as a confidence probability, then to determine the maximum error of a small sample
The following Student distribution readings are used:

Ways to generalize sample characteristics to the population.

The sampling method is most often used to obtain characteristics of the population according to the corresponding sample indicators. Depending on the purposes of the research, this is done either by direct recalculation of sample indicators for the general population, or by calculating correction factors.

Direct recalculation method. It consists in the fact that the sample share indicators or average applies to the general population, taking into account sampling error.

Thus, in trade, the number of non-standard products received in a consignment is determined. To do this (taking into account the accepted degree of probability), the indicators of the share of non-standard products in the sample are multiplied by the number of products in the entire batch of goods.

Method of correction factors. It is used in cases where the purpose of the sampling method is to clarify the results of continuous accounting.

In statistical practice, this method is used to clarify data from annual censuses of livestock owned by the population. To do this, after generalizing the data from the complete census, a 10% sample survey is used to determine the so-called “percentage of undercounting”.

Methods for selecting units from the general population.

In statistics, various methods of forming sample populations are used, which is determined by the objectives of the study and depends on the specifics of the object of study.

The main condition for conducting a sample survey is to prevent the occurrence of systematic errors arising from violation of the principle of equal opportunity for each unit of the general population to be included in the sample. Prevention of systematic errors is achieved through the use of scientifically based methods for forming a sample population.

There are the following methods for selecting units from the population:

1) individual selection - individual units are selected for the sample;

2) group selection - the sample includes qualitatively homogeneous groups or series of units being studied;

3) combined selection is a combination of individual and group selection.

Selection methods are determined by the rules for forming a sample population.

The sample could be:

Properly random;

Mechanical;

Typical;

Serial;

Combined.

Proper random sampling consists in the fact that the sample population is formed as a result of random (unintentional) selection of individual units from the general population. In this case, the number of units selected in the sample population is usually determined based on the accepted sample proportion.

The sample proportion is the ratio of the number of units in the sample population n to the number of units in the general population N, i.e.

.

So, with a 5% sample from a batch of goods of 2,000 units. sample size n is 100 units. (5*2000:100), and with a 20% sample it will be 400 units. (20*2000:100), etc.

Mechanical sampling consists in the fact that the selection of units in the sample population is made from the general population, divided into equal intervals (groups). In this case, the size of the interval in the population is equal to the inverse of the sample proportion.

So, with a 2% sample, every 50th unit is selected (1:0.02), with a 5% sample, every 20th unit (1:0.05), etc.

Thus, in accordance with the accepted proportion of selection, the general population is, as it were, mechanically divided into groups of equal size. From each group, only one unit is selected for the sample.

An important feature of mechanical sampling is that the formation of a sample population can be carried out without resorting to compiling lists. In practice, the order in which the units of the population are actually located is often used. For example, the sequence of exit of finished products from a conveyor or production line, the order of placement of units of a batch of goods during storage, transportation, sales, etc.

Typical sample. In typical sampling, the population is first divided into homogeneous typical groups. Then, from each typical group, a purely random or mechanical sample is used to individually select units into the sample population.

Sample sampling is usually used when studying complex statistical populations. For example, in a sample survey of labor productivity of trade workers, consisting of separate groups by qualification.

An important feature of a typical sample is that it gives more accurate results compared to other methods of selecting units in the sample population.

To determine the average error of a typical sample, the following formulas are used:

re-selection

,

repeat selection

,

The variance is determined using the following formulas:

,

At single stage In a sample, each selected unit is immediately studied according to a given characteristic. This is the case with purely random and serial sampling.

At multi-stage In the sample, individual groups are selected from the general population, and individual units are selected from the groups. This is how a typical sample is made with a mechanical method of selecting units into the sample population.

Combined sampling can be two-stage. In this case, the population is first divided into groups. Then the groups are selected, and within the latter the individual units are selected.

Statistical data processing on personal computers and mainframe computers. There are special programs designed for teaching students, which contain detailed explanations of all procedures and tests to test their mastery.

As already noted, in the case of a small sample, both confidence probabilities and confidence limits of the general mean can be calculated only for a normally distributed population.

For small samples, the calculation of the average possible error is based on sample variances, so

Small samples are widely used to solve problems involving testing statistical hypotheses, especially hypotheses about averages.

For example, for a sample of 32 units, a pairwise correlation coefficient of 0.319 was obtained. The number of degrees of freedom for it is 30, since the calculation of r involves two quantities whose values ​​are fixed - J and y. Due to this, we lose two degrees of freedom 32 - 2. Since the critical value for 30 degrees of freedom is equal (at a significance level of 0.05) to 0.3494, the resulting value is lower than the critical value in absolute value. Accordingly, the hypothesis about the relationship of signs has not been reliably proven. The conclusion about the absence of a connection is also incorrect - it has also not been reliably proven. From the table Appendix 5 shows that with a small sample, only close connections can be reliably established, and with a large population size, for example, 102 units, weak connections can also be reliably measured. This conclusion is important for practical work on correlation analysis.

This suggests that on average the actual number of patients is 1.5 times the predicted value, meaning that the forecasting model used typically underestimates the number of patients presenting. In this case, it may be worth analyzing the applied model and making adjustments to it. Ideally, the average error is zero, i.e., negative and positive error values ​​cancel each other out. However, we must say that in our example the average value was obtained from a very small sample. A larger sample size, such as a full year of data, will allow us to determine the likely accuracy of the forecast with a greater degree of confidence.

The average and maximum errors of a small sample are determined by the formulas

For a complete series of 15 values, the homogeneity criterion (Var) Normality check for a truncated set of data (for the 7 remaining stores) shows that all three series of values ​​are normal. However, this raises doubts about the legitimacy of using statistical procedures on such a small sample. However, if we ignore this fact, then in this case a dependence of the form z = a + b x + b2y will not provide the analyst with significant information, since there is a strong interdependence (multicollinearity) between the factors xn - this is evidenced by the high value of the pair correlation coefficient (on a truncated sample z = -0 ,88).

After preliminary compilation of the questionnaire, it must be tested on a small sample to identify possible errors. Testing is different from preliminary searching. Search helps to clarify the research plan; when testing, the developed plan is tested and the cost of its implementation is assessed. If the test results are considered satisfactory, the completed questionnaire is used to conduct research on the appropriate sample.

Based on the data presented, the assessment of the regression dependence Рк(рп), which was mentioned above, can be presented in the form of a correlation equation, based on any established form of statistical connection for the entire selected time interval of 26 years. Constructing regressions for shorter time periods would be unreliable precisely because of the small sample size (small sample).

Distribution of normalized deviations in a small sample. Values ​​of t for which probability) = p

If Ek> O, then the curve is peaked; for Ek, ​​the Method of Moments, as a rule, leads to consistent estimates. However, with small samples, estimates may be significantly biased and ineffective. The method of moments is quite effective for estimating the parameters of normally distributed random variables.

In some cases, the cost of conducting a survey is used as the main argument in determining the sample size. Thus, the marketing research budget provides for the costs of conducting certain surveys, which cannot be exceeded. Obviously, the value of the information received is not taken into account. However, in some cases, a small sample can give fairly accurate results.

If, based on the results of a small sample, it can be unambiguously concluded that the batch is suitable or, conversely, unsuitable, then quality control costs very little. If the first sample does not give a clear answer, you can take another sample - a single larger sample of samples will give a more accurate result. The control principle may be as follows

Based on the assumption that the general population from which the sample under study is taken has a smooth distribution curve, it is natural to assume that the dips and outliers that appear during grouping are random “noise” generated by the randomness of certain values ​​falling into a small sample. Coarsening the grouping intervals is a method of filtering out this random “noise”. However, when the intervals are too long, it is no longer the “noise” that is “filtered,” but the “signal” itself, i.e., the features of the desired distribution law begin to be smoothed out.

For each of the noted types and varieties of documents, their copies are collected, obtained by making an additional copy when preparing the corresponding document on a writing or computing machine. In the small sample collected, there are about 30 copies of documents for each species or variety, covering the main

How to deal with small samples

Thus, the two-sided confidence interval for a small sample would be represented as follows:

The root of our difficulties lies in sampling. As Leibniz once reminded Bernoulli, nature is so varied and so complex that it is difficult for us to draw correct conclusions from what we observe. We only have access to crumbs of reality, and this leads us to erroneous conclusions, or we interpret small samples as a full reflection of the characteristics of a larger population.

The quality of the progressiveness standards in force at the enterprise is characterized by the level of their intensity. The dispersion of the number of workers by individual labor productivity is usually close to the so-called normal distribution and deviates almost symmetrically (with some asymmetry to the right) in both directions from the average level of their performance. Moreover, with an increase in the number of workers, deviations in individual labor productivity from the average are increasingly compensated and repaid. Based on the formula for the maximum sampling error, it can be stated with reasonable certainty that if the maximum deviation of the individual labor productivity of individual workers from the industry average does not exceed M%, then according to probability theory, the limit of deviations of the average labor productivity of randomly selected n workers from the average will be equal to M/ n %, or adjusted for a small sample from a large N population

The last reason can sometimes be eliminated by introducing appropriate adjustments. Thus, for interval estimates of error over a small (n) normal distribution (see p. 50), quantiles of the Student’s statistical distribution (Table 6), characteristic of a small sample from a normal population (with unknown m and a), are used.

A superficial look at the problem, small samples for research, when individual parts replace the entire problem.

However, the way yt, xt is calculated leads to the loss of the first observation (if we do not have the previous observation). The number of degrees of freedom will decrease by one, which is not so significant for large samples, but for small samples it can lead to a loss of efficiency. This problem is usually overcome by using the Price-Vynosen correction

To estimate a small sample, the corrected standard deviation of the small sample and Student's law of probability distribution are used.

The theory of small samples was developed by the English statistician W. Gosset (who wrote under the pseudonym Student) at the beginning of the 20th century. In 1908, he constructed a special distribution that allows one to correlate / and the confidence probability F(t) even with small samples. For n > 100, the Student distribution tables give the same results as the Laplace probability integral tables, at 30

The likelihood test is unbiased and consistent; for large samples, -2-log X has a hi-squared distribution with r

  • 6. Types of statistical groupings, their cognitive significance.
  • 7.Statistical tables: types, construction rules, reading techniques
  • 8.Absolute quantities: types, cognitive significance. Conditions for the scientific use of absolute and relative indicators.
  • 9. Average values: content, types, types, scientific conditions of application.
  • 11.Dispersion properties. The rule for adding (decomposing) variance and its use in statistical analysis.
  • 12.Types of statistical graphs according to the content of the problems being solved and methods of construction.
  • 13. Dynamic series: types, analysis indicators.
  • 14. Methods for identifying trends in time series.
  • 15. Indices: definition, main elements of indices, problems solved with the help of indices, index system in statistics.
  • 16. Rules for constructing dynamic and territorial indices.
  • 17. Fundamentals of the theory of the sampling method.
  • 18. Small sample theory.
  • 19. Methods for selecting units in the sample population.
  • 20.Types of connections, statistical methods for analyzing relationships, the concept of correlation.
  • 21. Contents of correlation analysis, correlation models.
  • 22.Assessment of the strength (closeness) of the correlation connection.
  • 23. System of indicators of socio-economic statistics.
  • 24. Basic groupings and classifications in socio-economic statistics.
  • 25. National wealth: category content and composition.
  • 26. Contents of the land cadastre. Indicators of land composition by type of ownership, intended purpose and type of land.
  • 27. Classification of fixed assets, methods of evaluation and revaluation, indicators of movement, condition and use.
  • 28. Objectives of labor statistics. The concept and content of the main categories of the labor market.
  • 29. Statistics on the use of labor and working time.
  • 30. Labor productivity indicators and methods of analysis.
  • 31. Indicators of crop production and agricultural yields. Crops and lands.
  • 32. Indicators of livestock production and productivity of farm animals.
  • 33. Statistics of public costs and production costs.
  • 34. Statistics of wages and labor costs.
  • 35.Statistics of gross output and income.
  • 36.Indicators of movement and sales of agricultural products.
  • 37.Tasks of statistical analysis of agricultural enterprises.
  • 38. Statistics of prices and goods in sectors of the national economy: tasks and methods of analysis.
  • 39. Statistics of the market of goods and services.
  • 40. Statistics of social production indicators.
  • 41.Statistical analysis of consumer market prices.
  • 42.Inflation statistics and main indicators of its assessment.
  • 43.Tasks of financial statistics of enterprises.
  • 44. Main indicators of financial results of enterprises.
  • 45.Tasks of state budget statistics.
  • 46. ​​System of indicators of state budget statistics.
  • 47. System of indicators of monetary circulation statistics.
  • 48. Statistics of the composition and structure of the money supply in the country.
  • 49. The main tasks of banking statistics.
  • 50. Main indicators of banking statistics.
  • 51. Concept and classification of credit. Objectives of its statistical study.
  • 52.System of credit statistics indicators.
  • 53. Basic indicators and methods of analysis of savings business.
  • 54.Tasks of statistics of the stock market and securities.
  • 56. Statistics of commodity exchanges: objectives and system of indicators.
  • 57. System of national accounts: concepts, main categories and classification.
  • 58.Basic principles of constructing a snc.
  • 59. Main macroeconomic indicators – content, methods of determination.
  • 60. Inter-industry balance: concepts, tasks, types of mob.
  • 62.Statistics of income and expenses of the population
  • 18. Small sample theory.

    With a large number of units in the sample population (n>100), the distribution of random errors of the sample mean in accordance with A.M. Lyapunov’s theorem is normal or approaches normal as the number of observations increases.

    However, in the practice of statistical research in a market economy, one increasingly has to deal with small samples.

    A small sample is a sample observation whose number of units does not exceed 30.

    When assessing the results of a small sample, the population size is not used. To determine possible error limits, the Student's test is used.

    The value of σ is calculated based on sample observation data.

    This value is used only for the population under study, and not as an approximate estimate of σ in the population.

    The probabilistic assessment of the results of a small sample differs from the assessment in a large sample in that with a small number of observations, the probability distribution for the average depends on the number of selected units.

    However, for a small sample, the value of the confidence coefficient t is related to the probability assessment differently than for a large sample (since the distribution law differs from normal).

    According to the distribution law established by Student, the probable distribution error depends both on the value of the confidence coefficient t and on the sample size B.

    The average error of a small sample is calculated using the formula:

    where is the small sample variance.

    In MV, the coefficient n/(n-1) must be taken into account and must be adjusted. When determining the dispersion S2, the number of degrees of freedom is equal to:

    .

    The marginal error of a small sample is determined by the formula

    In this case, the value of the confidence coefficient t depends not only on the given confidence probability, but also on the number of sampling units n. For individual values ​​of t and n, the confidence probability of a small sample is determined using special Student tables, which give the distribution of standardized deviations:

    The probabilistic assessment of the results of MV differs from the assessment in BB in that with a small number of observations, the probability distribution for the average depends on the number of selected units

    19. Methods for selecting units in the sample population.

    1. The sample population must be large enough in size.

    2. The structure of the sample population should best reflect the structure of the general population

    3. The selection method must be random

    Depending on whether the selected units participate in the sample, a distinction is made between the non-repetitive and repeated methods.

    Non-repetitive selection is a selection in which a unit included in the sample does not return to the population from which further selection is carried out.

    Calculation of the average error of a non-repetitive random sample:

    Calculation of the maximum error of a non-repetitive random sample:

    During repeated selection, the unit included in the sample, after recording the observed characteristics, is returned to the original (general) population to participate in the further selection procedure.

    The average error of repeated simple random sampling is calculated as follows:

    Calculation of the maximum error of repeated random sampling:

    The type of formation of the sample population is divided into individual, group and combined.

    Selection method - determines the specific mechanism for selecting units from the general population and is divided into: actually - random; mechanical; typical; serial; combined.

    Actually – random the most common method of selection in a random sample, it is also called the drawing of lots, in which a ticket with a serial number is prepared for each unit of the statistical population. Next, the required number of units of the statistical population is randomly selected. Under these conditions, each of them has the same probability of being included in the sample.

    Mechanical sampling. It is used in cases where the general population is ordered in some way, i.e. there is a certain sequence in the arrangement of units.

    To determine the average error of mechanical sampling, the formula for the average error in actual random non-repetitive sampling is used.

    Typical selection. It is used when all units in the general population can be divided into several typical groups. Typical selection involves selecting units from each group in a purely random or mechanical way.

    For a typical sample, the standard error depends on the accuracy of the group means. Thus, in the formula for the maximum error of a typical sample, the average of the group variances is taken into account, i.e.

    Serial selection. It is used in cases where population units are combined into small groups or series. The essence of serial sampling lies in the actual random or mechanical selection of series, within which a continuous examination of units is carried out.

    With serial sampling, the magnitude of the sampling error depends not on the number of units studied, but on the number of surveyed series (s) and on the magnitude of intergroup dispersion:

    Combined selection may go through one or more stages. A sample is called single-stage if once selected units of the population are studied.

    The sample is called multi-stage, if the selection of a population takes place in stages, successive stages, and each stage, stage of selection has its own unit of selection.

    "

    Small sample method

    The main advantage of the small sample method is the ability to evaluate the dynamics of the process over time, reducing the time for computational procedures.

    Instantaneous samples are randomly selected at certain periods of time ranging from 5 to 20 units. The sampling period is established empirically and depends on the stability of the process, determined by analyzing a priori information.

    For each instantaneous sample, the main statistical characteristics are determined. The instantaneous samples and their main statistical characteristics are presented in Appendix B.

    A hypothesis about the homogeneity of sample dispersion is put forward and tested using one of the possible criteria (Fisher’s criterion).

    Testing the hypothesis about the homogeneity of sample characteristics.

    To check the significance of the difference between arithmetic means in 2 series of measurements, measure G is introduced. Calculations are given in Appendix B

    The decision rule is formulated as follows:

    where tr is the value of the quantile of the normalized distribution at a given confidence probability P, ? = 0.095, n = 10, tр =2.78.

    When the inequality is satisfied, the hypothesis is confirmed that the difference between the sample means is not significant.

    Since the inequality is satisfied in all cases, the hypothesis that the difference between the sample means is not significant is confirmed.

    To test the hypothesis about the homogeneity of sample variances, the F0 measure is introduced as the ratio of unbiased estimates of the variances of the results of 2 series of measurements. Moreover, the larger of the 2 estimates is taken as the numerator and if Sx1>Sx2, then

    The calculation results are given in Appendix B.

    Then the values ​​of the confidence probability P are specified and the values ​​of F(K1; K2; ?/2) are determined with K1 = n1 - 1 and K2 = n2 - 1.

    With P = 0.025 and K1 = 10-1 = 4 and K2 = 10-1 = 4 F (9;9;0.025/2) =4.1.

    Decision rule: if F(K1; K2; ?/2)>F0, then the hypothesis about the homogeneity of variances in the two samples is accepted.

    Since the condition F(K1; K2; ?/2) > F0 is satisfied in all cases, the hypothesis of homogeneity of variances is accepted.

    Thus, the hypothesis about the homogeneity of sample variances is confirmed, which indicates the stability of the process; the hypothesis about the homogeneity of sample means using the method of comparison of means is confirmed, this means that the center of dispersion has not changed and the process is in a stable state.

    Scatter and precision plot method

    Over a certain period of time, instant samples of 3 to 10 products are taken and the statistical characteristics of each sample are determined.

    The obtained data are plotted on diagrams with time on the abscissa axis? or numbers k of samples, and on the ordinate axis - individual values ​​of xk or the value of one of the statistical characteristics (sample arithmetic mean, sample standard deviation). In addition, two horizontal lines Тв and Тн are drawn on the diagram, limiting the tolerance range of the product.

    Instantaneous samples are given in Appendix B.


    Figure 1 accuracy chart

    The diagram clearly shows the progress of the production process. It can be used to indicate that the production process is unstable

    In addition to the actual random sample with its clear probabilistic justification, there are other samples that are not completely random, but are widely used. It should be noted that the strict application of purely random selection of units from the general population is not always possible in practice. Such samples include mechanical sampling, typical, serial (or nested), multiphase and a number of others.

    It is rare for a population to be homogeneous; this is the exception rather than the rule. Therefore, when there are different types of phenomena in the population, it is often desirable to ensure a more even representation of the different types in the sample. This goal is successfully achieved by using typical sampling. The main difficulty is that we must have additional information about the entire population, which in some cases is difficult.

    A typical sample is also called a stratified or stratified sample; it is also used for the purpose of more uniform representation of different regions in the sample, and in this case the sample is called regionalized.

    So iodine typical A sample is understood as a sample in which the general population is divided into typical subgroups formed according to one or more essential characteristics (for example, the population is divided into 3-4 subgroups according to average per capita income or level of education - primary, secondary, higher, etc. ). Next, from all typical groups, you can select units for the sample in several ways, forming:

    • a) a typical sample with uniform placement, where an equal number of units are selected from different types (layers). This scheme works well if in the population the layers (types) do not differ very much from each other in the number of units;
    • b) typical sampling with proportional placement, when it is required (as opposed to uniform placement) that the proportion (%) of selection for all strata be the same (for example, 5 or 10%);
    • c) a typical sample with optimal placement, when the degree of variation of characteristics in different groups of the general population is taken into account. With this placement, the proportion of selection for groups with large variability of the trait increases, which ultimately leads to a decrease in random error.

    The formula for the average error in a typical selection is similar to the usual sampling error for a purely random sample, with the only difference being that instead of the total variance, the average of the particular within-group variances is entered, which naturally leads to a decrease in error compared to a purely random sample. However, its use is not always possible (for many reasons). If there is no need for great precision, it is easier and cheaper to use serial sampling.

    Serial(cluster) sampling consists in the fact that not units of the population (for example, students), but individual series, or nests (for example, study groups) are selected for the sample. In other words, with serial (cluster) sampling, the observation unit and the sampling unit do not coincide: certain groups of units (nests) adjacent to each other are selected, and the units included in these nests are subject to examination. So, for example, during a sample survey of housing conditions, we can randomly select a certain number of households (sampling unit) and then find out the living conditions of the families living in these houses (observation units).

    Series (nests) consist of units connected to each other territorially (districts, cities, etc.), organizationally (enterprises, workshops, etc.) or in time (for example, a set of units of products produced over a given period of time).

    Serial selection can be organized in the form of single-stage, two-stage or multi-stage selection.

    Randomly selected series are subjected to continuous research. Thus, serial sampling consists of two stages of random selection of series and continuous study of these series. Serial selection provides significant savings in manpower and resources and is therefore often used in practice. The error of serial selection differs from the error of the actual random selection of hemes, in that instead of the value of the total variance, interseries (intergroup) variance is used, and instead of the sample volume, the number of series is used. The accuracy is usually not very high, but in some cases it is acceptable. A serial sample can be repeated or non-repetitive, and series can be equal-sized or unequal-sized.

    Serial sampling can be organized according to different schemes. For example, you can form a sample population in two stages: first, the series to be surveyed are selected in random order, then from each selected series a certain number of units are also selected in random order to be directly observed (measured, weighed, etc.). The error of such a sample will depend on the error of serial selection and on the error of individual selection, i.e. Multi-stage selection, as a rule, gives less accurate results compared to single-stage selection, which is explained by the occurrence of representativeness errors at each sampling stage. In this case, you need to use the sampling error formula for combined sampling.

    Another form of selection is multiphase selection (1, 2, 3 phases, or stages). This selection differs in structure from multi-stage selection, since in multi-phase selection the same selection units are used in each phase. Errors in multiphase sampling are calculated for each phase separately. The main feature of a two-phase sample is that the samples differ from each other according to three criteria depending on: 1) the proportion of units studied in the first phase of the sample and again included in the second and subsequent phases; 2) from maintaining equal chances for each sample unit of the first phase to again be the object of study; 3) on the size of the interval separating the phases from each other.

    Let us dwell on one more type of selection, namely mechanical(or systematic). This selection is probably the most common. This is apparently explained by the fact that of all the selection techniques, this technique is the simplest. In particular, it is much simpler than random selection, which requires the ability to use tables of random numbers, and does not require additional information about the population and its structure. In addition, mechanical selection is closely intertwined with proportional stratified selection, which leads to a reduction in sampling error.

    For example, the use of mechanical selection of members of a housing cooperative from a list compiled in the order of admission to this cooperative will ensure proportional representation of cooperative members with different lengths of experience. Using the same technique to select respondents from an alphabetical list of individuals ensures equal chances for surnames beginning with different letters, etc. The use of time sheets or other lists at enterprises or educational institutions, etc. can ensure the necessary proportionality in the representation of workers with different lengths of experience. Note that mechanical selection is widely used in sociology, in the study of public opinion, etc.

    In order to reduce the magnitude of error and especially the costs of conducting a sampling study, various combinations of individual types of selection (mechanical, serial, individual, multiphase, etc.) are widely used. In such cases, more complex sampling errors should be calculated, which consist of errors occurring at different stages of the study.

    A small sample is a collection of units less than 30. Small samples occur quite often in practice. For example, the number of rare diseases or the number of units possessing a rare trait; In addition, a small sample is resorted to when the research is expensive or the research involves the destruction of products or samples. Small samples are widely used in the field of product quality surveys. The theoretical foundations for determining small sample errors were laid by the English scientist W. Gosset (pseudonym Student).

    It must be remembered that when determining the error for a small sample, instead of the sample size, you should take the value (P- 1) or before determining the average sampling error, calculate the so-called corrected sample variance (in the denominator instead of P should be put (P- 1)). Note that such a correction is made only once - when calculating the sample variance or when determining the error. Magnitude (P- 1) is called the degree of freedom. In addition, the normal distribution is replaced by the ^-distribution (Stuodent distribution), which is tabulated and depends on the number of degrees of freedom. The only parameter of the Student distribution is the value (P - 1). Let us emphasize once again that the amendment (P- 1) important and significant only for small but large sample populations; at yi > 30 and above, the difference disappears, approaching zero.

    So far we have been talking about random samples, i.e. such when the selection of units from the population is random (or almost random) and all units have an equal (or almost equal) probability of being included in the sample. However, the selection of units can be based on the principle of non-random selection, when the principle of accessibility and purposefulness is at the forefront. In such cases, it is impossible to talk about the representativeness of the resulting sample, and the calculation of errors of representativeness can only be done with information about the general population.

    There are several known schemes for forming a non-random sample, which have become widespread and are used mainly in sociological research: selection of available observation units, selection according to the Nuremberg method, targeted sampling when identifying experts, etc. Quota sampling, which is formed by the researcher from a small number, is also important. significant parameters and gives a very close match to the general population. In other words, quota selection should provide the researcher with almost complete coincidence of the sample and general populations according to his chosen parameters. Purposeful achievement of the proximity of two populations but in a limited range of indicators is achieved, as a rule, using a sample of a significantly smaller size than when using random selection. It is this circumstance that makes quota selection attractive for a researcher who does not have the opportunity to focus on a large self-weighting random sample. It should be added that a reduction in sample size is most often combined with a reduction in monetary costs and research time, which increases the advantages of this selection method. Let us also note that with quota sampling there is quite significant preliminary information about the structure of the population. The main advantage here is that the sample size is significantly smaller than with random sampling. The selected characteristics (most often socio-demographic - gender, age, education) should closely correlate with the studied characteristics of the general population, i.e. object of research.

    As already indicated, the sampling method makes it possible to obtain information about the general population with much less money, time and effort than with continuous observation. It is also clear that a complete study of the entire population is impossible in some cases, for example, when checking the quality of products, samples of which are destroyed.

    At the same time, however, it should be pointed out that the population is not a completely “black box” and we still have some information about it. Conducting, for example, a sample study concerning the life, everyday life, property status, income and expenses of students, their opinions, interests, etc., we still have information about their total number, grouping by gender, age, marital status, place of residence , course of study and other characteristics. This information is always used in sample research.

    There are several types of distribution of sample characteristics to the general population: the method of direct recalculation and the method of correction factors. Recalculation of sample characteristics is carried out, as a rule, taking into account confidence intervals and can be expressed in absolute and relative values.

    It is quite appropriate to emphasize here that most of the statistical information relating to the economic life of society in its most diverse manifestations and types is based on sample data. Of course, they are supplemented by complete registration data and information obtained as a result of censuses (of population, enterprises, etc.). For example, all budget statistics (on income and expenses of the population) provided by Rosstat are based on data from a sample study. Information on prices, production volumes, and trade volumes, expressed in the corresponding indices, is also largely based on sample data.