Determining the significance of the results among themselves. Reliability and statistical significance

Statistical reliability is essential in the FCC's calculation practice. It was noted earlier that from the same population multiple samples can be selected:

If they are selected correctly, then their average indicators and the indicators of the general population differ slightly from each other in the magnitude of the representativeness error, taking into account the accepted reliability;

If they are selected from different populations, the difference between them turns out to be significant. Statistics is all about comparing samples;

If they differ insignificantly, unprincipally, insignificantly, i.e., they actually belong to the same general population, the difference between them is called statistically unreliable.

Statistically reliable A sample difference is a sample that differs significantly and fundamentally, that is, it belongs to different general populations.

At the FCC, assessing the statistical significance of sample differences means solving a set practical problems. For example, the introduction of new teaching methods, programs, sets of exercises, tests, control exercises is associated with their experimental testing, which should show that the test group is fundamentally different from the control group. Therefore, special statistical methods, called statistical significance criteria, allowing to detect the presence or absence of a statistically significant difference between samples.

All criteria are divided into two groups: parametric and non-parametric. Parametric criteria require the presence of a normal distribution law, i.e. this means the mandatory determination of the main indicators of the normal law - the average arithmetic value and standard deviation s. Parametric criteria are the most accurate and correct. Nonparametric tests are based on rank (ordinal) differences between sample elements.

Here are the main criteria for statistical significance used in the FCC practice: Student's test and Fisher's test.

Student's t test named after the English scientist K. Gosset (Student - pseudonym), who discovered this method. Student's t test is parametric and is used for comparison absolute indicators samples. Samples may vary in size.

Student's t test is defined like this.

1. Find the Student t test using the following formula:


where are the arithmetic averages of the compared samples; t 1, t 2 - errors of representativeness identified based on the indicators of the compared samples.

2. Practice at the FCC has shown that for sports work it is enough to accept the reliability of the account P = 0.95.

For counting reliability: P = 0.95 (a = 0.05), with the number of degrees of freedom

k = n 1 + n 2 - 2 using the table in Appendix 4 we find the value of the limit value of the criterion ( t gr).

3. Based on the properties of the normal distribution law, the Student’s criterion compares t and t gr.

We draw conclusions:

if t t gr, then the difference between the compared samples is statistically significant;

if t t gr, then the difference is statistically insignificant.

For researchers in the field of FCS, assessing statistical significance is the first step in solving a specific problem: whether the samples being compared are fundamentally or non-fundamentally different from each other. The next step is to evaluate this difference with pedagogical point vision, which is determined by the conditions of the problem.

Let's consider the application of the Student test using a specific example.

Example 2.14. A group of 18 subjects was assessed for heart rate (bpm) before x i and after y i warm-up.

Assess the effectiveness of the warm-up based on heart rate. Initial data and calculations are presented in table. 2.30 and 2.31.

Table 2.30

Processing heart rate indicators before warming up


The errors for both groups coincided, since the sample sizes are equal (the same group is studied at different conditions), and the average standard deviations amounted to s x = s y = 3 beats/min. Let's move on to defining the Student's test:

We set the reliability of the account: P = 0.95.

Number of degrees of freedom k 1 = n 1 + n 2 - 2 = 18 + 18-2 = 34. From the table in Appendix 4 we find t gr= 2,02.

Statistical inference. Since t = 11.62, and the boundary t gr = 2.02, then 11.62 > 2.02, i.e. t > t gr, therefore the difference between the samples is statistically significant.

Pedagogical conclusion. It was found that in terms of heart rate the difference between the state of the group before and after warm-up is statistically significant, i.e. significant, fundamental. So, based on the heart rate indicator, we can conclude that the warm-up is effective.

Fisher criterion is parametric. It is used when comparing sample dispersion rates. This usually means a comparison in terms of stability of sports performance or stability of functional and technical indicators in practice physical culture and sports. Samples can be of different sizes.

The Fisher criterion is defined in the following sequence.

1. Find the Fisher criterion F using the formula


where , are the variances of the compared samples.

The conditions of the Fisher criterion stipulate that in the numerator of the formula F there is a large dispersion, i.e. the number F is always greater than one.

We set the calculation reliability: P = 0.95 - and determine the number of degrees of freedom for both samples: k 1 = n 1 - 1, k 2 = n 2 - 1.

Using the table in Appendix 4, we find the limit value of criterion F gr.

Comparison of F and F criteria gr allows us to formulate conclusions:

if F > F gr, then the difference between the samples is statistically significant;

if F< F гр, то различие между выборками статически недо­стоверно.

Let's give a specific example.

Example 2.15. Let's analyze two groups of handball players: x i (n 1= 16 people) and y i (p 2 = 18 people). These groups of athletes were studied for the take-off time (s) when throwing the ball into the goal.

Are the repulsion indicators of the same type?

Initial data and basic calculations are presented in table. 2.32 and 2.33.

Table 2.32

Processing of repulsion indicators of the first group of handball players


Let us define the Fisher criterion:





According to the data presented in the table in Appendix 6, we find Fgr: Fgr = 2.4

Let us pay attention to the fact that the table in Appendix 6 lists the numbers of degrees of freedom of both greater and lesser dispersion when approaching large numbers gets rougher. Thus, the number of degrees of freedom of the larger dispersion follows in this order: 8, 9, 10, 11, 12, 14, 16, 20, 24, etc., and the smaller one - 28, 29, 30, 40, 50, etc. d.

This is explained by the fact that as the sample size increases, the differences in the F-test decrease and it is possible to use tabular values ​​that are close to the original data. So, in example 2.15 =17 is absent and we can take the value closest to it k = 16, from which we obtain Fgr = 2.4.

Statistical inference. Since Fisher's test F= 2.5 > F= 2.4, the samples are statistically distinguishable.

Pedagogical conclusion. The values ​​of the take-off time (s) when throwing the ball into the goal for handball players of both groups differ significantly. These groups should be considered different.

Further research should reveal the reason for this difference.

Example 2.20.(on the statistical reliability of the sample ). Has the football player's qualifications improved if the time (s) from giving the signal to kicking the ball at the beginning of the training was x i , and at the end y i .

Initial data and basic calculations are given in table. 2.40 and 2.41.

Table 2.40

Processing time indicators from giving a signal to hitting the ball at the beginning of training


Let us determine the difference between groups of indicators using the Student’s criterion:

With reliability P = 0.95 and degrees of freedom k = n 1 + n 2 - 2 = 22 + 22 - 2 = 42, using the table in Appendix 4 we find t gr= 2.02. Since t = 8.3 > t gr= 2.02 - the difference is statistically significant.

Let us determine the difference between groups of indicators using Fisher’s criterion:


According to the table in Appendix 2, with reliability P = 0.95 and degrees of freedom k = 22-1 = 21, the value F gr = 21. Since F = 1.53< F гр = = 2,1, различие в рассеивании исходных данных статистически недостоверно.

Statistical inference. According to the arithmetic average, the difference between groups of indicators is statistically significant. In terms of dispersion (dispersion), the difference between groups of indicators is statistically unreliable.

Pedagogical conclusion. The football player's qualifications have improved significantly, but attention should be paid to the stability of his testimony.

Preparing for work

Before this laboratory work by discipline " Sports metrology» to all students study group it is necessary to form work teams of 3-4 students in each, to jointly complete the work assignment of all laboratory work.

In preparation for work read the relevant sections of the recommended literature (see section 6 of the data methodological instructions) and lecture notes. Study sections 1 and 2 for this laboratory work, as well as the work assignment for it (section 4).

Prepare a report form on standard sheets of A4 size writing paper and fill it with the materials necessary for the work.

The report must contain :

Title page indicating the department (UC and TR), study group, last name, first name, patronymic of the student, number and title of laboratory work, date of its completion, as well as last name, academic degree, academic title and position of the teacher accepting the job;

Goal of the work;

Formulas with numerical values, explaining the intermediate and final results of calculations;

Tables of measured and calculated values;

Graphic material required by the assignment;

Brief conclusions based on the results of each stage of the work assignment and in general on the work performed.

All graphs and tables are drawn carefully using drawing tools. Conventional graphic and letter symbols must comply with GOSTs. It is allowed to prepare a report using computer technology.

Work assignment

Before carrying out all measurements, each member of the team must study the rules of use sports game Darts given in Appendix 7, which are necessary for carrying out the following stages of research.

Stage I of research“Study of the results of hitting the target of the sport game Darts by each member of the team for compliance normal law distributions according to criterion χ 2 Pearson and criterion of three sigma"

1. measure (test) your (personal) speed and coordination of actions, by throwing darts 30-40 times at a circular target in the sports game Darts.

2. Results of measurements (tests) x i(with glasses) arrange in the form variation series and enter into table 4.1 (columns , do all necessary calculations, fill out the necessary tables and draw appropriate conclusions regarding the compliance of the received empirical distribution the normal distribution law, by analogy with similar calculations, tables and conclusions of example 2.12, given in section 2 of these guidelines on pages 7 -10.

Table 4.1

Correspondence of the speed and coordination of the subjects’ actions to the normal distribution law

No. rounded
Total

II – stage of research

“Assessment of the average indicators of the general population of hits on the target of the sport game Darts of all students of the study group based on the results of measurements of members of one team”

Assess the average indicators of speed and coordination of actions of all students in the study group (according to the list of the study group in the class magazine) based on the results of hitting the Darts target of all team members, obtained at the first stage of research of this laboratory work.

1. Document the results of measurements of speed and coordination of actions when throwing darts at a circular target in the sports game Darts of all members of your team (2 - 4 people), who represent a sample of measurement results from the general population (measurement results of all students in a study group - for example, 15 people), entering them in the second and third columns Table 4.2.

Table 4.2

Processing indicators of speed and coordination of actions

brigade members

No.
Total

In table 4.2 under should be understood , matched average score (see calculation results in Table 4.1) members of your team ( , obtained at the first stage of research. It should be noted that, usually, Table 4.2 contains the calculated average value of the measurement results obtained by one member of the team at the first stage of research , since the probability that the measurement results various members the brigades will coincide very small. Then, as a rule, the values in column Table 4.2 for each row - equal to 1, A in the line “Total " columns " ", is written the number of members of your team.

2. Perform all the necessary calculations to fill out table 4.2, as well as other calculations and conclusions similar to the calculations and conclusions of example 2.13 given in the 2nd section of this methodological development on pages 13-14. It should be kept in mind when calculating the representativeness error "m" it is necessary to use formula 2.4 given on page 13 of this methodological development, since the sample is small (n, and the number of elements of the general population N is known, and is equal to the number of students in the study group, according to the list of the journal of the study group.

III – stage of research

Evaluation of the effectiveness of the warm-up according to the indicator “Speed ​​and coordination of actions” by each team member using the Student’s t-test

To evaluate the effectiveness of the warm-up for throwing darts at the target of the sports game "Darts", performed at the first stage of research of this laboratory work, by each member of the team according to the indicator "Speed ​​and coordination of actions", using the Student's criterion - a parametric criterion for the statistical reliability of the empirical distribution law to the normal distribution law .

… Total

2. variances and RMS , results of measurements of the indicator “Speed ​​and coordination of actions” based on the results of warm-up, given in table 4.3, (see similar calculations given immediately after table 2.30 of example 2.14 on page 16 of this methodological development).

3. Each member of the work team measure (test) your (personal) speed and coordination of actions after warming up,

… Total

5. Perform average calculations variances and RMS ,results of measurements of the indicator “Speed ​​and coordination of actions” after warm-up, given in table 4.4, write down the overall measurement result based on the warm-up results (see similar calculations given immediately after table 2.31 of example 2.14 on page 17 of this methodological development).

6. Perform all necessary calculations and conclusions similar to the calculations and conclusions of example 2.14 given in the 2nd section of this methodological development on pages 16-17. It should be kept in mind when calculating the representativeness error "m" it is necessary to use formula 2.1 given on page 12 of this methodological development, since the sample is n and the number of elements in the population N ( is unknown.

IV – stage of research

Assessment of the uniformity (stability) of the indicators “Speedness and coordination of actions” of two team members using the Fisher criterion

Assess the uniformity (stability) of the indicators “Speedness and coordination of actions” of two team members using the Fisher criterion, based on the measurement results obtained at the third stage of research in this laboratory work.

To do this you need to do the following.

Using the data from tables 4.3 and 4.4, the results of calculating variances from these tables obtained at the third stage of research, as well as the method of calculating and applying the Fisher criterion to assess uniformity (stability) sports performance given in example 2.15 on pages 18-19 of this methodological development, draw appropriate statistical and pedagogical conclusions.

V – stage of research

Assessment of groups of indicators “Speedness and coordination of actions” of one team member before and after warm-up

In the tables of results of statistical calculations in coursework, diploma and master's theses in psychology, the indicator “p” is always present.

For example, according to research objectives Differences in the level of meaningfulness in life among teenage boys and girls were calculated.

Average value

Mann-Whitney U test

Statistical significance level (p)

Boys (20 people)

Girls

(5 people)

Goals

28,9

35,2

17,5

0,027*

Process

30,1

32,0

38,5

0,435

Result

25,2

29,0

29,5

0,164

Locus of control - "I"

20,3

23,6

0,067

Locus of control - "Life"

30,4

33,8

27,5

0,126

Meaningful life

98,9

111,2

0,103

* - differences are statistically significant (p0,05)

The right column shows the value of “p” and it is by its value that one can determine whether the differences in the meaningfulness of life in the future between boys and girls are significant or not. The rule is simple:

  • If the level of statistical significance “p” is less than or equal to 0.05, then we conclude that the differences are significant. In the table below, the differences between boys and girls are significant in relation to the “Goals” indicator - meaningfulness of life in the future. For girls, this indicator is statistically significantly higher than for boys.
  • If the level of statistical significance “p” is greater than 0.05, then it is concluded that the differences are not significant. In the table below, the differences between boys and girls are not significant for all other indicators, with the exception of the first.

Where does the level of statistical significance “p” come from?

The level of statistical significance is calculated statistical program along with the calculation statistical criterion. In these programs, you can also set a critical limit for the level of statistical significance and the corresponding indicators will be highlighted by the program.

For example, in the STATISTICA program, when calculating correlations, you can set the “p” limit, for example, 0.05, and all statistically significant relationships will be highlighted in red.

If the statistical criterion is calculated manually, then the significance level “p” is determined by comparing the value of the resulting criterion with the critical value.

What does the level of statistical significance “p” show?

All statistical calculations are approximate. The level of this approximation determines “p”. The significance level is written as decimals, for example 0.023 or 0.965. If we multiply this number by 100, we get the p indicator as a percentage: 2.3% and 96.5%. These percentages reflect the likelihood of our assumptions about the relationship between, for example, aggression and anxiety being wrong.

That is, correlation coefficient 0.58 between aggression and anxiety was obtained at a statistical significance level of 0.05 or an error probability of 5%. What exactly does this mean?

The correlation we identified means that in our sample the following pattern is observed: the higher the aggressiveness, the higher the anxiety. That is, if we take two teenagers, and one has higher anxiety than the other, then, knowing about the positive correlation, we can say that this teenager will also have higher aggressiveness. But since everything in statistics is approximate, then by stating this, we admit that we may be mistaken, and the probability of error is 5%. That is, having made 20 such comparisons in this group of adolescents, we can make one mistake in predicting the level of aggressiveness, knowing anxiety.

Which level of statistical significance is better: 0.01 or 0.05

The level of statistical significance reflects the probability of error. Therefore, the result at p=0.01 is more accurate than at p=0.05.

IN psychological research accepted two permissible levels statistical significance of the results:

p=0.01 - high reliability of the result comparative analysis or analysis of relationships;

p=0.05 - sufficient accuracy.

I hope this article will help you write a psychology paper on your own. If you need help, please contact us (all types of work in psychology; statistical calculations).

Before collecting and studying data, experimental psychologists typically decide how the data will be analyzed statistically. Often the researcher sets a significance level defined as statistical value, higher ( or lower) which contains values ​​that allow us to consider the influence of factors non-random. Researchers usually represent this level in the form of a probabilistic expression.

In many psychological experiments it can be expressed as " level 0.05" or " level 0.01" This means that random results will only occur with frequency 0.05 (1 of the times) or 0.01 (1 in 100 times). results statistical analysis data that meets a pre-established criterion ( be it 0.05, 0.01 or even 0.001), are referred to below as statistically significant.

It should be noted that the result may not be statistically significant, but still be of some interest. Often, especially in preliminary studies or experiments involving a small number of subjects or with a limited number of observations, the results may not reach the level of statistical significance, but suggest that further research with more precise control and more observations they will gain greater reliability. At the same time, the experimenter must be very careful in his desire to purposefully change the experimental conditions in order to achieve desired result at any cost.

In another example of a 2x2 plan Ji used two types of subjects and two types of tasks to study the influence of specialized knowledge on the memorization of information.

In his study Ji studied memorizing numbers and chess pieces ( variable A) children in chairs RECARO Young Sport and adults ( variable B), that is, according to the 2x2 plan. The children were 10 years old and good at chess, while the adults were new to the game. In the first task, you had to remember the location of the pieces on the board, as it might be during a normal game, and restore it after the pieces were removed. Another part of this task required memorizing a standard series of numbers, as is usually done when determining IQ.

Turns out, specialized knowledge, such as learning to play chess, make it easier to remember information related to this area, but have little effect on remembering numbers. Adults who are not too experienced in wisdom the oldest game, remember fewer figures, but are more successful in memorizing numbers.

In the text of the report Ji provides statistical analysis that mathematically validates the presented results.

The 2x2 design is the simplest of all factorial designs. Increasing the number of factors or levels of individual factors greatly increases the complexity of these plans.

PAID FEATURE. The statistical significance feature is only available on select plans. Check if it is in .

You can find out if there are statistically significant differences in the answers received from different groups respondents to survey questions. To use the statistical significance feature in SurveyMonkey, you must:

  • Enable the statistical significance feature when adding a comparison rule to a question in your survey. Select groups of respondents to compare to sort survey results into groups for visual comparison.
  • Examine tables with data on your survey questions to identify the presence of statistical significant differences in the replies received from various groups respondents.

View statistical significance

By following the steps below, you can create a survey that displays statistical significance.

1. Add closed-ended questions to your survey

In order to display statistical significance when analyzing results, you will need to apply a comparison rule to any question in your survey.

You can apply the comparison rule and calculate statistical significance in responses if you use one of the following in your survey design: following types questions:

It is necessary to make sure that the proposed answer options can be divided into complete groups. The response options you select for comparison when you create a comparison rule will be used to organize the data into crosstabs throughout the survey.

2. Collect answers

Once you've completed your survey, create a collector to distribute it. There are several ways.

You must receive at least 30 responses for each response option you plan to use in your comparison rule to activate and view statistical significance.

Survey example

You want to find out whether men are significantly more satisfied with your products than women.

  1. Add two multiple choice questions to your survey:
    What is your gender? (male, female)
    Are you satisfied or dissatisfied with our product? (satisfied, dissatisfied)
  2. Make sure that at least 30 respondents select “male” for the gender question AND at least 30 respondents select “female” as their gender.
  3. Add a comparison rule to the question "What is your gender?" and select both answer options as your groups.
  4. Use the data table below the question chart "Are you satisfied or dissatisfied with our product?" to see if any response options show a statistically significant difference

What is a statistically significant difference?

A statistically significant difference means that statistical analysis has determined that there are significant differences between the responses of one group of respondents and the responses of another group. Statistical significance means that the numbers obtained are significantly different. Such knowledge will greatly help you in data analysis. However, you determine the importance of the results obtained. It is you who decide how to interpret the survey results and what actions should be taken based on them.

For example, you receive more complaints from female customers than from male customers. How can we determine whether such a difference is real and whether action needs to be taken regarding it? One of great ways To check your observations is to conduct a survey that will show you whether male buyers are really much more satisfied with your product. By using statistical formula The statistical significance function we offer will give you the ability to determine whether your product actually appeals significantly more to men than to women. This will allow you to take action based on facts rather than guesswork.

Statistically significant difference

If your results are highlighted in the data table, it means that the two groups of respondents are significantly different from each other. The term “significant” does not mean that the resulting numbers have any particular importance or significance, only that there is a statistical difference between them.

No statistically significant difference

If your results are not highlighted in the corresponding data table, this means that, despite possible difference in the two figures being compared, there is no statistical difference between them.

Responses without statistically significant differences demonstrate that there is no significant difference between the two items being compared given the sample size you use, but this does not necessarily mean that they are not significant. Perhaps by increasing the sample size, you will be able to identify a statistically significant difference.

Sample size

If you have a very small sample size, only very large differences between the two groups will be significant. If you have a very large sample size, both small and large differences will be counted as significant.

However, just because two numbers are statistically different does not mean that the difference between the results makes any difference to you. practical significance. You will have to decide for yourself which differences are meaningful for your survey.

Calculating Statistical Significance

We calculate statistical significance using a standard 95% confidence level. If an answer option is shown as statistically significant, it means that by chance alone or due to sampling error there is less than a 5% probability of the difference between the two groups occurring (often shown as: p<0,05).

To calculate statistically significant differences between groups, we use the following formulas:

Parameter

Description

a1The percentage of participants from the first group who answered the question in a certain way, multiplied by the sample size of this group.
b1The percentage of participants from the second group who answered the question in a certain way, multiplied by the sample size of this group.
Pooled sample proportion (p)The combination of two shares from both groups.
Standard error (SE)An indicator of how much your share differs from the actual share. A lower value means the fraction is close to the actual fraction, a higher value means the fraction is significantly different from the actual fraction.
Test statistic (t)Test statistic. The number of standard deviations by which a given value differs from the mean.
Statistical significanceIf the absolute value of the test statistic is greater than 1.96* standard deviations from the mean, it is considered a statistically significant difference.

*1.96 is the value used for the 95% confidence level because 95% of the range handled by the Student's t-distribution function lies within 1.96 standard deviations of the mean.

Calculation example

Continuing with the example used above, let's find out whether the percentage of men who say they are satisfied with your product is significantly higher than the percentage of women.

Let's say 1,000 men and 1,000 women took part in your survey, and the result of the survey was that 70% of men and 65% of women say that they are satisfied with your product. Is the 70% level significantly higher than the 65% level?

Substitute the following data from the survey into the given formulas:

  • p1 (% of men satisfied with the product) = 0.7
  • p2 (% of women satisfied with the product) = 0.65
  • n1 (number of men surveyed) = 1000
  • n2 (number of women interviewed) = 1000

Since the absolute value of the test statistic is greater than 1.96, it means that the difference between men and women is significant. Compared to women, men are more likely to be satisfied with your product.

Hiding statistical significance

How to hide statistical significance for all questions

  1. Click the down arrow to the right of the comparison rule in the left sidebar.
  2. Select an item Edit rule.
  3. Disable the feature Show statistical significance using a switch.
  4. Click the button Apply.

To hide statistical significance for one question, you need to:

  1. Click the button Tune above the diagram of this issue.
  2. Open the tab Display options.
  3. Uncheck the box next to Statistical significance.
  4. Click the button Save.

The display option is automatically enabled when statistical significance display is enabled. If you clear this display option, the statistical significance display will also be disabled.

Turn on the statistical significance feature when adding a comparison rule to a question in your survey. Examine the data tables for your survey questions to determine if there are statistically significant differences in the responses received from different groups of respondents.

Significance level - this is the probability that we considered the differences to be significant, but they are actually random.

When we indicate that the differences are significant at the 5% significance level, or when R< 0,05 , then we mean that the probability that they are unreliable is 0.05.

When we indicate that the differences are significant at the 1% significance level, or when R< 0,01 , then we mean that the probability that they are unreliable is 0.01.

If we translate all this into more formalized language, then the significance level is the probability of rejecting the null hypothesis, while it is true.

Error,consisting ofthe onewhat werejectednull hypothesiswhile it is correct, it is called a type 1 error.(See Table 1)

Table 1. Null and alternative hypotheses and possible test conditions.

The probability of such an error is usually denoted as α. In essence, we would have to indicate in parentheses not p < 0.05 or p < 0.01, and α < 0.05 or α < 0,01.

If the probability of error is α , then the probability of a correct decision: 1-α. The smaller α, the greater the probability of a correct decision.

Historically, in psychology it is generally accepted that the lowest level of statistical significance is the 5% level (p≤0.05): sufficient is the 1% level (p≤0.01) and the highest is the 0.1% level ( p≤0.001), therefore, the tables of critical values ​​usually contain the values ​​of criteria corresponding to the levels of statistical significance p≤0.05 and p≤0.01, sometimes - p≤0.001. For some criteria, the tables indicate the exact significance level of their different empirical values. For example, for φ*=1.56 p=O.06.

However, until the level of statistical significance reaches p=0.05, we still have no right to reject the null hypothesis. We will adhere to the following rule for rejecting the hypothesis of no differences (Ho) and accepting the hypothesis of statistical significance of differences (H 1).

Rule for rejecting Ho and accepting h1

If the empirical value of the test is equal to or greater than the critical value corresponding to p≤0.05, then H 0 is rejected, but we cannot yet definitely accept H 1 .

If the empirical value of the criterion is equal to the critical value corresponding to p≤0.01 or exceeds it, then H 0 is rejected and H 1 is accepted.

Exceptions : G sign test, Wilcoxon T test and Mann-Whitney U test. Inverse relationships are established for them.

Rice. 4. Example of a “significance axis” for Rosenbaum’s Q criterion.

The critical values ​​of the criterion are designated as Q o, o5 and Q 0.01, the empirical value of the criterion as Q em. It is enclosed in an ellipse.

To the right of the critical value Q 0.01 extends the “zone of significance” - this includes empirical values ​​exceeding Q 0.01 and, therefore, certainly significant.

To the left of the critical value Q 0.05, the “zone of insignificance” extends - this includes empirical Q values ​​that are below Q 0.05 and, therefore, are certainly insignificant.

We see that Q 0,05 =6; Q 0,01 =9; Q em. =8;

The empirical value of the criterion falls in the region between Q 0.05 and Q 0.01. This is a zone of “uncertainty”: we can already reject the hypothesis about the unreliability of differences (H 0), but we cannot yet accept the hypothesis about their reliability (H 1).

In practice, however, the researcher can consider as reliable those differences that do not fall into the zone of insignificance, declaring that they are reliable at p < 0.05, or by indicating the exact level of significance of the obtained empirical criterion value, for example: p=0.02. Using standard tables, which are in all textbooks on mathematical methods, this can be done in relation to the Kruskal-Wallis H criteria, χ 2 r Friedman, Page's L, Fisher's φ* .

The level of statistical significance, or critical test values, is determined differently when testing directional and non-directional statistical hypotheses.

With a directional statistical hypothesis, a one-tailed test is used, with a non-directional hypothesis, a two-tailed test is used. The two-tailed test is more stringent because it tests differences in both directions, and therefore the empirical value of the test that previously corresponded to the significance level p < 0.05, now corresponds only to the p level < 0,10.

We won't have to decide for ourselves every time whether he uses a one-sided or two-sided criterion. The tables of critical values ​​of the criteria are selected in such a way that directional hypotheses correspond to a one-sided criterion, and non-directional hypotheses correspond to a two-sided criterion, and the given values ​​satisfy the requirements that apply to each of them. The researcher only needs to ensure that his hypotheses coincide in meaning and form with the hypotheses proposed in the description of each of the criteria.