Sample regression equation.

LABORATORY WORK No. 4

Calculation of the sample correlation coefficient and construction of the empirical and theoretical regression line

Goal of the work : familiarization with linear correlation; developing the ability to calculate and sample the correlation coefficient and compile equations of theoretical regression lines.

The content of the work : based on experimental data, calculate the sample correlation coefficient, construct a confidence interval for it with reliability, give a semantic description of the result obtained, construct empirical and theoretical regression lines on
according to the prepositional method above.

Correlation method

Using the correlation method in mathematical statistics, the relationship between phenomena is determined. The peculiarity of studying this relationship is that it is impossible to isolate the influence of extraneous factors. Therefore, the correlation method is used in order to determine, in the case of a complex interaction of extraneous influences of factors, what the relationship between the characteristics would be if the extraneous factors did not change, i.e., the conditions of the experiment were adequate.

Correlation theory considers two problems:

1) determination of the correlation parameter between the examined characteristics;

2) determining the closeness of this connection. On the nature of the relationship between characteristics
And can be judged by the location of points in the coordinate system (correlation field). If these points are located near a straight line, then it is assumed that between the conditional average And
there is a linear relationship. The equation
on
.

The equation
called regression line equation
on . If both regression lines are straight, then there is a linear correlation.

Regression Line Equations

And
are compiled on the basis of sample data given in the correlation table.

- average values ​​of the corresponding characteristics;

- regression coefficients on
And
on - calculated using formulas

Where
- average value of the product
on ;

And
- trait variances
And .

In straight-line correlation, the closeness of the relationship between characteristics is characterized by the sample correlation coefficient , which takes values ​​ranging from “-1” to “+1”.

If the value of the correlation coefficient is negative, then this indicates an inverse linear relationship between the characteristics being studied; if it is positive – about a rectilinear connection. If the correlation coefficient is 0, then there is no linear relationship between the characteristics.

The sample correlation coefficient is calculated using the formula:

r in
(1)

Where - average value of products
on

And - average values ​​of the corresponding characteristics;

And - standard deviations found for the characteristic
and for the sign .

METHOD OF PERFORMANCE OF THE WORK

Statistical data on the temperature of the lubricating oil of the rear axle of the car is given. depending on ambient temperature
.

1. CALCULATION OF SAMPLE CORRELATION COEFFICIENT

We will summarize these conditions in a correlation table

Table 1.

n y(frequency of feature y)

n x (frequency of characteristic x)

Let's find the numerical characteristics of the sample

1.1. Let's find the average values ​​of characteristics X and Y

,

1.2. Let's find sample variances

1513-1281,64=231,36

1.3. Sample standard deviation

,

,

1.4. Sample correlation moment

1/50(40 + 120+720+480+200+800+900+4200+1120+2160+4500+5280+4400+1320+1560) – 497,62=

1/50(27800) – 497,62 = 556 – 497,62 = 58,38

1.5. Sample correlation coefficient


0,77

2. Let’s check the significance of the correlation coefficient; to do this, let’s check the statistics:

=
≈ 8,3

We'll find
from the Student distribution table (Appendix) according to the significance level most used in technology
And
Y– number of degrees of freedom K= n – 2 = 50 – 2 = 48,
2,02

Because
= 8.3 > 2.02, then the found correlation coefficient differs significantly from zero. This means that the variables X and Y are related by a linear regression relationship of the form

Thus, the correlation coefficient shows the close linear relationship that exists between the rear axle lubricating oil temperature and the ambient air temperature.

3. Drawing up empirical linear regression equationsYonXAndXonY.

3.1. Empirical linear regression equation of Y on X.

,

3.2. Empirical linear regression equation of X onY.

,

=35.8+2.34(y-13.9)

4. CONSTRUCTION OF AN EMPIRICAL REGRESSION LINEYONX.

To build an empirical regression line, let's draw up Table 2.

table 2

- conditional average of characteristic values provided that takes a certain value, i.e.

;

;

;

Taking pairs of numbers
for the coordinates of the points, construct them in a coordinate system and connect them with straight line segments. The resulting broken line will be the empirical regression line.

The equation of the theoretical straight line regression of Y on X is:

;
, Where - sample mean of attribute ;

- sample mean of attribute .

;
;
;
;
.

The direct regression equation of Y on X will be written as follows:

or finally

Let's build both regression lines (Fig. 1)

Rice. 1. Empirical and theoretical regression lines

at
;

at.

5. We will make a meaningful interpretation of the analysis results There is a close direct linear correlation between the temperature of the lubricating oil of the rear axle of the vehicle and the ambient air temperature ( r V

The equation
characterizes how, on average, the temperature of the lubricating oil of the rear axle of a car depends on the ambient temperature.

Linear regression coefficient (
) suggests that if the ambient temperature is increased by an average of 1 degree, then the temperature of the lubricating oil of the rear axle of the car will increase by an average of 0.25 degrees.

The equation
characterizes how the temperature of the lubricating oil of the rear axle of a vehicle depends on the ambient temperature. If the temperature of the lubricating oil of the rear axle of a car needs to be increased by an average of 1 degree, then the ambient air temperature needs to be increased by an average of 2.34 degrees(
)

OPTIONS FOR INDIVIDUAL TASKS

1. Distribution of X - the cost of fixed production assets (million rubles) and Y - the average monthly output per worker

2. The distribution of 200 cylindrical lamp posts by length X (in cm) and by weight Y (in kg) is given in the following table:

3. The distribution of 100 firms by means of production X (in monetary units) and by daily output Y (in tons) is given in the following table:

Cover page of methodological Form

Ministry of Education and Science of the Republic of Kazakhstan

«

Chairman of the UMC _______________ « ___"___________20__

APPROVED:

Head of OPiMOUP _________________ « ___"___________20__

Approved by the educational and methodological council of the university

« ___»___________20 __ Protocol No.____

When studying the topic " Information from probability theory and mathematical statistics”, special attention should be paid to the methods of presenting and processing statistical data. Theoretical and selective characteristics. General scheme for testing hypotheses. Errors of type 1 and 2. Point and interval estimates. Statistical properties of estimates. Analysis of dependencies of two random variables.

Subject. Least square method.

h1, h2 – steps, i.e. the difference between two neighboring options.

In this case, the sample correlation coefficient

,

Moreover, the term is convenient to calculate using calculation table 1.

The values ​​can be found using the formulas

For the reverse transition, the expressions are used

Example Find the sample linear regression equation of Y on X based on the correlation table.

Solution. To simplify the calculations, let's move on to conditional options, which are calculated using the formulas

,

and create a transformed correlation table with conditional options

Then we will compile a new table in which we will enter the calculated values ​​in the upper right corner of the filled cell and in the lower left corner, after which we sum the upper values ​​in the rows to obtain the values ​​Vj and the lower values ​​in the columns for Ui and calculate the values ​​and .

vjVj

With a large number of trials, the same value X can occur nx times, the same value Y can occur ny times, and the same pair of numbers (x; y) can occur nxy times,

and usually the sample size.

Therefore, observational data is Grouped, i.e., nx, ny, nxy are calculated. All grouped data is recorded in the form of a table, which is called a correlation table.

If both regression lines of Y on X and X on Y are straight, then the correlation is linear.

The sample equation of the straight regression line Y on X has the form:

The parameters pyx and B, which are determined by the least squares method, have the form:

where yx is the conditional average; XВ and Ув are sample averages of characteristics X and Y; -x and -y are sample standard deviations; gV is the sample correlation coefficient.

The sample equation of the straight line regression of X on Y has the form:

We assume that observational data on characteristics X and Y are given in the form of a correlation table with equally spaced options.

Then we move on to the conditional options:

where C1 is the variant of trait X that has the highest frequency; C 2 - variant of trait Y, which has the highest frequency; h1 — step (difference between two adjacent options X); h2 - step (the difference between two adjacent options Y).

Then the sample correlation coefficient

The quantities u, v, su, sv can be found by the product method, or directly using the formulas

Knowing these quantities, we will find the parameters included in the regression equations using the formulas

TYPICAL CONTROL WORK UNDER SECTION 6. 12.1. Random Events

12.1. Random Events

12.1.1. The box contains 6 identical pairs of black gloves and 4 identical pairs of beige gloves. Find the probability that two gloves drawn at random form a pair.

Consider event A—two gloves drawn at random form a pair; and hypotheses: B1 - a pair of black gloves was extracted, B2 - a pair of beige gloves was extracted, B3 - the extracted gloves do not form a pair.

The probability of hypothesis B1 by the multiplication theorem is equal to the product of the probabilities that the first glove is black and the second glove is black, i.e.

Similarly, the probability of hypothesis Bi is:

Since hypotheses B1, B2 and B3 constitute a complete group of events, the probability of hypothesis B3 is equal to:

According to the total probability formula, we have:

where Pb (A) is the probability that a pair is formed by two black gloves and Pb1 (A) = 1; pB1 (A) is the probability that two beige gloves form a pair and Pb2 (A) = 1; and, finally, РВз(A) - the probability that a pair is formed by gloves of different colors and

Thus, the probability that two gloves drawn at random form a pair is

12.1.2. The urn contains 3 white balls and 5 black balls. 3 balls are drawn at random, one at a time, and after each extraction they are returned to the urn. Find the probability that among the drawn balls there will be:

a) exactly two white balls, b) at least two white balls.

Solution. We have a scheme with return, i.e. each time the composition of the balls does not change:

a) when three balls are drawn, two of them must be white and one black. In this case, black can be either first, or second, or third. Applying the theorems of addition and multiplication of probabilities together, we have:

b) taking out at least two white balls means that there must be either two or three white balls:

12.1.3. The urn contains 6 white and 5 black balls. Three balls are drawn at random in succession without returning them to the urn. Find the probability that the third ball in a row will be white.

Solution. If the third ball must be white, then the first two balls can be white, or white and black, or black and white, or black, i.e. there are four groups of non-

joint events. Applying the probability multiplication theorem to them, we get:

P = P1(5 . P2(5 . P3(5 + (P1(5 . P2ch. P3(5 + P14 . P2(5 . P3(5) + P1ch. P2ch. P3(5 =

A A 4 A A 5 A A 5 A A 6=540 = A

P. 10. 9 + I. 10. 9 + I. 10. 9 + I. 10. 9 = 990 = IT