Least squares method of linear equations. Finding regression line parameters

3. Approximation of functions using the method

least squares

The least squares method is used when processing experimental results for approximations (approximations) experimental data analytical formula. The specific type of formula is chosen, as a rule, for physical reasons. Such formulas could be:

and others.

The essence of the least squares method is as follows. Let the measurement results be presented in the table:

Table 4
				x n
				y n

(3.1)

where f - known function, a 0 , a 1 , …, a m - unknown constant parameters whose values must be found. In the least squares method, the approximation of function (3.1) to the experimental dependence is considered best if the condition is satisfied

(3.2)

that is amounts a squared deviations of the desired analytical function from the experimental dependence should be minimal .

Note that the function Q called residual.

Since the discrepancy

then it has a minimum. A necessary condition for the minimum of a function of several variables is the equality to zero of all partial derivatives of this function with respect to the parameters. Thus, finding the best values of the parameters of the approximating function (3.1), that is, their values at which Q = Q (a 0 , a 1 , …, a m ) is minimal, reduces to solving the system of equations:

(3.3)

The least squares method can be given the following geometric interpretation: among an infinite family of lines of a given type, one line is found for which the sum of the squared differences of the ordinates of the experimental points and the corresponding ordinates of the points found by the equation of this line will be the smallest.

Finding the parameters of a linear function

Let the experimental data be represented by a linear function:

It is required to select the following values a and b , for which the function

(3.4)

will be minimal. The necessary conditions for the minimum of function (3.4) are reduced to the system of equations:

After transformations, we obtain a system of two linear equations with two unknowns:

(3.5)

solving which, we find the required values of the parameters a and b.

Finding the Parameters of a Quadratic Function

If the approximating function is a quadratic dependence

then its parameters a, b, c found from the minimum condition of the function:

(3.6)

The conditions for the minimum of function (3.6) are reduced to the system of equations:

After transformations, we obtain a system of three linear equations with three unknowns:

(3.7)

at solution of which we find the required values of the parameters a, b and c.

Example . Let the experiment result in the following table of values: x and y:

Table 5

y i	0,705	0,495	0,426	0,357	0,368	0,406	0,549	0,768

It is required to approximate the experimental data with linear and quadratic functions.

Solution. Finding the parameters of the approximating functions is reduced to solving systems of linear equations (3.5) and (3.7). To solve the problem we will use a spreadsheet processor Excel.

1. First, let’s connect sheets 1 and 2. Enter the experimental values x i and y i into columns A and B, starting from the second line (we will place the column headings in the first line). Then we calculate the sums for these columns and place them in the tenth row.

In columns C–G place the calculation and summation respectively

2. Let's uncouple the sheets. We will carry out further calculations in a similar way for the linear dependence on Sheet 1 and for the quadratic dependence on Sheet 2.

3. Under the resulting table, we will form a matrix of coefficients and a column vector of free terms. Let's solve the system of linear equations using the following algorithm:

To calculate the inverse matrix and multiply matrices, we use Master functions and functions MOBR And MUMNIFE.

4. In the block of cells H2: H 9 based on the obtained coefficients we calculate approximating value polynomialy i calc., in block I 2: I 9 – deviations D y i = y i exp. - y i calc.,in column J – the residual:

The resulting tables and those built using Chart Wizards graphs are shown in Figures 6, 7, 8.

Rice. 6. Table for calculating the coefficients of a linear function,

approximating experimental data.

Rice. 7. Table for calculating the coefficients of a quadratic function,

approximatingexperimental data.

Rice. 8. Graphical representation of approximation results

experimental data by linear and quadratic functions.

Answer. The experimental data were approximated by a linear dependence y = 0,07881 x + 0,442262 with residual Q = 0,165167 and quadratic dependence y = 3,115476 x 2 – 5,2175 x + 2,529631 with residual Q = 0,002103 .

Tasks. Approximate a function given by a table, linear and quadratic functions.

Table 6

№0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

3,030

3,142

3,358

3,463

3,772

3,251

3,170

3,665

№ 1

3,314

3,278

3,262

3,292

3,332

3,397

3,487

3,563

№ 2

1,045

1,162

1,264

1,172

1,070

0,898

0,656

0,344

№ 3

6,715

6,735

6,750

6,741

6,645

6,639

6,647

6,612

№ 4

2,325

2,515

2,638

2,700

2,696

2,626

2,491

2,291

№ 5

1.752

1,762

1,777

1,797

1,821

1,850

1,884

1,944

№ 6

1,924

1,710

1,525

1,370

1,264

1,190

1,148

1,127

№ 7

1,025

1,144

1,336

1,419

1,479

1,530

1,568

1,248

№ 8

5,785

5,685

5,605

5,545

5,505

5,480

5,495

5,510

№ 9

4,052

4,092

4,152

4,234

4,338

4,468

4,599

After leveling, we obtain a function of the following form: g (x) = x + 1 3 + 1 .

We can approximate this data using the linear relationship y = a x + b by calculating the corresponding parameters. To do this, we will need to apply the so-called least squares method. You will also need to make a drawing to check which line will best align the experimental data.

Yandex.RTB R-A-339285-1

What exactly is OLS (least squares method)

The main thing we need to do is to find such coefficients of linear dependence at which the value of the function of two variables F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2 will be the smallest. In other words, for certain values of a and b, the sum of the squared deviations of the presented data from the resulting straight line will have a minimum value. This is the meaning of the least squares method. All we need to do to solve the example is to find the extremum of the function of two variables.

How to derive formulas for calculating coefficients

In order to derive formulas for calculating coefficients, you need to create and solve a system of equations with two variables. To do this, we calculate the partial derivatives of the expression F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2 with respect to a and b and equate them to 0.

δ F (a , b) δ a = 0 δ F (a , b) δ b = 0 ⇔ - 2 ∑ i = 1 n (y i - (a x i + b)) x i = 0 - 2 ∑ i = 1 n ( y i - (a x i + b)) = 0 ⇔ a ∑ i = 1 n x i 2 + b ∑ i = 1 n x i = ∑ i = 1 n x i y i a ∑ i = 1 n x i + ∑ i = 1 n b = ∑ i = 1 n y i ⇔ a ∑ i = 1 n x i 2 + b ∑ i = 1 n x i = ∑ i = 1 n x i y i a ∑ i = 1 n x i + n b = ∑ i = 1 n y i

To solve a system of equations, you can use any methods, for example, substitution or Cramer's method. As a result, we should have formulas that can be used to calculate coefficients using the least squares method.

n ∑ i = 1 n x i y i - ∑ i = 1 n x i ∑ i = 1 n y i n ∑ i = 1 n - ∑ i = 1 n x i 2 b = ∑ i = 1 n y i - a ∑ i = 1 n x i n

We have calculated the values of the variables at which the function
F (a , b) = ∑ i = 1 n (y i - (a x i + b)) 2 will take the minimum value. In the third paragraph we will prove why it is exactly like this.

This is the application of the least squares method in practice. Its formula, which is used to find the parameter a, includes ∑ i = 1 n x i, ∑ i = 1 n y i, ∑ i = 1 n x i y i, ∑ i = 1 n x i 2, as well as the parameter
n – it denotes the amount of experimental data. We advise you to calculate each amount separately. The value of the coefficient b is calculated immediately after a.

Let's go back to the original example.

Example 1

Here we have n equal to five. To make it more convenient to calculate the required amounts included in the coefficient formulas, let’s fill out the table.

	i = 1	i=2	i=3	i=4	i=5	∑ i = 1 5
x i	0	1	2	4	5	12
y i	2 , 1	2 , 4	2 , 6	2 , 8	3	12 , 9
x i y i	0	2 , 4	5 , 2	11 , 2	15	33 , 8
x i 2	0	1	4	16	25	46

Solution

The fourth row includes the data obtained by multiplying the values from the second row by the values of the third for each individual i. The fifth line contains the data from the second, squared. The last column shows the sums of the values of individual rows.

Let's use the least squares method to calculate the coefficients a and b we need. To do this, substitute the required values from the last column and calculate the amounts:

n ∑ i = 1 n x i y i - ∑ i = 1 n x i ∑ i = 1 n y i n ∑ i = 1 n - ∑ i = 1 n x i 2 b = ∑ i = 1 n y i - a ∑ i = 1 n x i n ⇒ a = 5 33, 8 - 12 12, 9 5 46 - 12 2 b = 12, 9 - a 12 5 ⇒ a ≈ 0, 165 b ≈ 2, 184

It turns out that the required approximating straight line will look like y = 0, 165 x + 2, 184. Now we need to determine which line will better approximate the data - g (x) = x + 1 3 + 1 or 0, 165 x + 2, 184. Let's estimate using the least squares method.

To calculate the error, we need to find the sum of squared deviations of the data from the straight lines σ 1 = ∑ i = 1 n (y i - (a x i + b i)) 2 and σ 2 = ∑ i = 1 n (y i - g (x i)) 2, the minimum value will correspond to a more suitable line.

σ 1 = ∑ i = 1 n (y i - (a x i + b i)) 2 = = ∑ i = 1 5 (y i - (0, 165 x i + 2, 184)) 2 ≈ 0, 019 σ 2 = ∑ i = 1 n (y i - g (x i)) 2 = = ∑ i = 1 5 (y i - (x i + 1 3 + 1)) 2 ≈ 0.096

Answer: since σ 1< σ 2 , то прямой, наилучшим образом аппроксимирующей исходные данные, будет
y = 0.165 x + 2.184.

The least squares method is clearly shown in the graphical illustration. The red line marks the straight line g (x) = x + 1 3 + 1, the blue line marks y = 0, 165 x + 2, 184. The original data is indicated by pink dots.

Let us explain why exactly approximations of this type are needed.

They can be used in tasks that require data smoothing, as well as in those where data must be interpolated or extrapolated. For example, in the problem discussed above, one could find the value of the observed quantity y at x = 3 or at x = 6. We have devoted a separate article to such examples.

Proof of the OLS method

In order for the function to take a minimum value when a and b are calculated, it is necessary that at a given point the matrix of the quadratic form of the differential of the function of the form F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2 is positive definite. Let's show you how it should look.

Example 2

We have a second order differential of the following form:

d 2 F (a ; b) = δ 2 F (a ; b) δ a 2 d 2 a + 2 δ 2 F (a ; b) δ a δ b d a d b + δ 2 F (a ; b) δ b 2 d 2 b

Solution

δ 2 F (a ; b) δ a 2 = δ δ F (a ; b) δ a δ a = = δ - 2 ∑ i = 1 n (y i - (a x i + b)) x i δ a = 2 ∑ i = 1 n (x i) 2 δ 2 F (a; b) δ a δ b = δ δ F (a; b) δ a δ b = = δ - 2 ∑ i = 1 n (y i - (a x i + b) ) x i δ b = 2 ∑ i = 1 n x i δ 2 F (a ; b) δ b 2 = δ δ F (a ; b) δ b δ b = δ - 2 ∑ i = 1 n (y i - (a x i + b)) δ b = 2 ∑ i = 1 n (1) = 2 n

In other words, we can write it like this: d 2 F (a ; b) = 2 ∑ i = 1 n (x i) 2 d 2 a + 2 2 ∑ x i i = 1 n d a d b + (2 n) d 2 b.

We obtained a matrix of the quadratic form M = 2 ∑ i = 1 n (x i) 2 2 ∑ i = 1 n x i 2 ∑ i = 1 n x i 2 n .

In this case, the values of individual elements will not change depending on a and b . Is this matrix positive definite? To answer this question, let's check whether its angular minors are positive.

We calculate the angular minor of the first order: 2 ∑ i = 1 n (x i) 2 > 0 . Since the points x i do not coincide, the inequality is strict. We will keep this in mind in further calculations.

We calculate the second order angular minor:

d e t (M) = 2 ∑ i = 1 n (x i) 2 2 ∑ i = 1 n x i 2 ∑ i = 1 n x i 2 n = 4 n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2

After this, we proceed to prove the inequality n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 using mathematical induction.

Let's check whether this inequality is valid for an arbitrary n. Let's take 2 and calculate:

2 ∑ i = 1 2 (x i) 2 - ∑ i = 1 2 x i 2 = 2 x 1 2 + x 2 2 - x 1 + x 2 2 = = x 1 2 - 2 x 1 x 2 + x 2 2 = x 1 + x 2 2 > 0

We have obtained a correct equality (if the values x 1 and x 2 do not coincide).

Let us make the assumption that this inequality will be true for n, i.e. n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 – true.
Now we will prove the validity for n + 1, i.e. that (n + 1) ∑ i = 1 n + 1 (x i) 2 - ∑ i = 1 n + 1 x i 2 > 0, if n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 .

We calculate:

(n + 1) ∑ i = 1 n + 1 (x i) 2 - ∑ i = 1 n + 1 x i 2 = = (n + 1) ∑ i = 1 n (x i) 2 + x n + 1 2 - ∑ i = 1 n x i + x n + 1 2 = = n ∑ i = 1 n (x i) 2 + n x n + 1 2 + ∑ i = 1 n (x i) 2 + x n + 1 2 - - ∑ i = 1 n x i 2 + 2 x n + 1 ∑ i = 1 n x i + x n + 1 2 = = ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + n x n + 1 2 - x n + 1 ∑ i = 1 n x i + ∑ i = 1 n (x i) 2 = = ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + x n + 1 2 - 2 x n + 1 x 1 + x 1 2 + + x n + 1 2 - 2 x n + 1 x 2 + x 2 2 + . . . + x n + 1 2 - 2 x n + 1 x 1 + x n 2 = = n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + + (x n + 1 - x 1) 2 + (x n + 1 - x 2) 2 + . . . + (x n - 1 - x n) 2 > 0

The expression enclosed in curly braces will be greater than 0 (based on what we assumed in step 2), and the remaining terms will be greater than 0, since they are all squares of numbers. We have proven the inequality.

Answer: the found a and b will correspond to the smallest value of the function F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2, which means that they are the required parameters of the least squares method (LSM).

If you notice an error in the text, please highlight it and press Ctrl+Enter

It is widely used in econometrics in the form of a clear economic interpretation of its parameters.

Linear regression comes down to finding an equation of the form

Equation of the form allows based on specified parameter values X have theoretical values of the resultant characteristic, substituting the actual values of the factor into it X.

The construction of linear regression comes down to estimating its parameters - A And V. Linear regression parameter estimates can be found using different methods.

The classical approach to estimating linear regression parameters is based on least squares method(MNC).

The least squares method allows us to obtain such parameter estimates A And V, at which the sum of squared deviations of the actual values of the resultant characteristic (y) from calculated (theoretical) minimum:

To find the minimum of a function, you need to calculate the partial derivatives for each of the parameters A And b and set them equal to zero.

Let's denote through S, then:

Transforming the formula, we obtain the following system of normal equations for estimating parameters A And V:

Solving the system of normal equations (3.5) either by the method of sequential elimination of variables or by the method of determinants, we find the required estimates of the parameters A And V.

Parameter V called the regression coefficient. Its value shows the average change in the result with a change in the factor by one unit.

The regression equation is always supplemented with an indicator of the closeness of the connection. When using linear regression, such an indicator is the linear correlation coefficient. There are different modifications of the linear correlation coefficient formula. Some of them are given below:

As is known, the linear correlation coefficient is within the limits: -1 ≤ ≤ 1.

To assess the quality of selection of a linear function, the square is calculated

Linear correlation coefficient called coefficient of determination. The coefficient of determination characterizes the proportion of variance of the resulting characteristic y, explained by regression in the total variance of the resulting trait:

Accordingly, the value 1 characterizes the share of variance y, caused by the influence of other factors not taken into account in the model.

Questions for self-control

1. The essence of the least squares method?

2. How many variables does pairwise regression provide?

3. What coefficient determines the closeness of the connection between changes?

4. Within what limits is the coefficient of determination determined?

5. Estimation of parameter b in correlation-regression analysis?

1. Christopher Dougherty. Introduction to econometrics. - M.: INFRA - M, 2001 - 402 p.

2. S.A. Borodich. Econometrics. Minsk LLC “New Knowledge” 2001.

3. R.U. Rakhmetova Short course in econometrics. Tutorial. Almaty. 2004. -78p.

4. I.I. Eliseeva. Econometrics. - M.: “Finance and Statistics”, 2002

5. Monthly information and analytical magazine.

Nonlinear economic models. Nonlinear regression models. Transformation of variables.

Nonlinear economic models..

Transformation of variables.

Elasticity coefficient.

If there are nonlinear relationships between economic phenomena, then they are expressed using the corresponding nonlinear functions: for example, an equilateral hyperbola , parabolas of the second degree and etc.

There are two classes of nonlinear regressions:

1. Regressions that are nonlinear with respect to the explanatory variables included in the analysis, but linear with respect to the estimated parameters, for example:

Polynomials of various degrees - , ;

Equilateral hyperbola - ;

Semilogarithmic function - .

2. Regressions that are nonlinear in the parameters being estimated, for example:

Power - ;

Demonstrative - ;

Exponential - .

The total sum of squared deviations of individual values of the resulting characteristic at from the average value is caused by the influence of many reasons. Let us conditionally divide the entire set of reasons into two groups: factor being studied x And other factors.

If the factor does not influence the result, then the regression line on the graph is parallel to the axis Oh And

Then the entire variance of the resulting characteristic is due to the influence of other factors and the total sum of squared deviations will coincide with the residual. If other factors do not influence the result, then y tied With X functionally and the residual sum of squares is zero. In this case, the sum of squared deviations explained by the regression is the same as the total sum of squares.

Since not all points of the correlation field lie on the regression line, their scatter always occurs as a result of the influence of the factor X, i.e. regression at By X, and caused by other causes (unexplained variation). The suitability of a regression line for forecasting depends on what part of the total variation of the trait at accounts for the explained variation

Obviously, if the sum of squared deviations due to regression is greater than the residual sum of squares, then the regression equation is statistically significant and the factor X has a significant impact on the result u.

, i.e., with the number of freedom of independent variation of a characteristic. The number of degrees of freedom is related to the number of units of the population n and the number of constants determined from it. In relation to the problem under study, the number of degrees of freedom should show how many independent deviations from P

The assessment of the significance of the regression equation as a whole is given using F-Fisher criterion. In this case, a null hypothesis is put forward that the regression coefficient is equal to zero, i.e. b = 0, and therefore the factor X does not affect the result u.

The immediate calculation of the F-test is preceded by analysis of variance. The central place in it is occupied by the decomposition of the total sum of squared deviations of a variable at from the average value at into two parts - “explained” and “unexplained”:

- total sum of squared deviations;

- the sum of squared deviations explained by regression;

- residual sum of squared deviations.

Any sum of squared deviations is related to the number of degrees of freedom , i.e., with the number of freedom of independent variation of a characteristic. The number of degrees of freedom is related to the number of population units n and with the number of constants determined from it. In relation to the problem under study, the number of degrees of freedom should show how many independent deviations from P possible required to form a given sum of squares.

Dispersion per degree of freedomD.

F-ratios (F-test):

If the null hypothesis is true, then the factor and residual variances do not differ from each other. For H 0, a refutation is necessary so that the factor dispersion exceeds the residual dispersion several times. The English statistician Snedekor developed tables of critical values F-relations at different levels of significance of the null hypothesis and different numbers of degrees of freedom. Table value F-criterion is the maximum value of the ratio of variances that can occur in case of random divergence for a given level of probability of the presence of the null hypothesis. Calculated value F-relationships are considered reliable if o is greater than the table.

In this case, the null hypothesis about the absence of a relationship between signs is rejected and a conclusion is drawn about the significance of this relationship: F fact > F table H 0 is rejected.

If the value is less than the table F fact ‹, F table, then the probability of the null hypothesis is higher than a specified level and cannot be rejected without serious risk of drawing the wrong conclusion about the presence of a relationship. In this case, the regression equation is considered statistically insignificant. But he doesn’t deviate.

Standard error of regression coefficient

To assess the significance of the regression coefficient, its value is compared with its standard error, i.e. the actual value is determined t-Student's test: which is then compared with the table value at a certain significance level and number of degrees of freedom ( n- 2).

Standard parameter error A:

The significance of the linear correlation coefficient is checked based on the magnitude of the error correlation coefficient t r:

Total trait variance X:

Multiple Linear Regression

Model building

Multiple regression represents a regression of an effective characteristic with two or more factors, i.e. a model of the form

Regression can give good results in modeling if the influence of other factors affecting the object of study can be neglected. The behavior of individual economic variables cannot be controlled, i.e. it is not possible to ensure the equality of all other conditions for assessing the influence of one factor under study. In this case, you should try to identify the influence of other factors by introducing them into the model, i.e., construct a multiple regression equation: y = a+b 1 x 1 +b 2 +…+b p x p + .

The main goal of multiple regression is to build a model with a large number of factors, while determining the influence of each of them separately, as well as their combined impact on the modeled indicator. The specification of the model includes two ranges of issues: selection of factors and choice of the type of regression equation

The method of least squares (OLS) allows you to estimate various quantities using the results of many measurements containing random errors.

Characteristics of MNEs

The main idea of this method is that the sum of squared errors is considered as a criterion for the accuracy of solving the problem, which they strive to minimize. When using this method, both numerical and analytical approaches can be used.

In particular, as a numerical implementation, the least squares method involves taking as many measurements as possible of an unknown random variable. Moreover, the more calculations, the more accurate the solution will be. Based on this set of calculations (initial data), another set of estimated solutions is obtained, from which the best one is then selected. If the set of solutions is parameterized, then the least squares method will be reduced to finding the optimal value of the parameters.

As an analytical approach to the implementation of LSM on a set of initial data (measurements) and an expected set of solutions, a certain one (functional) is determined, which can be expressed by a formula obtained as a certain hypothesis that requires confirmation. In this case, the least squares method comes down to finding the minimum of this functional on the set of squared errors of the original data.

Please note that it is not the errors themselves, but the squares of the errors. Why? The fact is that often deviations of measurements from the exact value are both positive and negative. When determining the average, simple summation may lead to an incorrect conclusion about the quality of the estimate, since the cancellation of positive and negative values will reduce the power of sampling multiple measurements. And, consequently, the accuracy of the assessment.

To prevent this from happening, the squared deviations are summed up. Even moreover, in order to equalize the dimension of the measured value and the final estimate, the sum of the squared errors is extracted

Some applications of MNC

MNC is widely used in various fields. For example, in probability theory and mathematical statistics, the method is used to determine such a characteristic of a random variable as the standard deviation, which determines the width of the range of values of the random variable.

Which finds the widest application in various fields of science and practical activity. This could be physics, chemistry, biology, economics, sociology, psychology, and so on and so forth. By the will of fate, I often have to deal with the economy, and therefore today I will arrange for you a trip to an amazing country called Econometrics=) ...How can you not want it?! It’s very good there – you just need to make up your mind! ...But what you probably definitely want is to learn how to solve problems least squares method. And especially diligent readers will learn to solve them not only accurately, but also VERY QUICKLY ;-) But first general statement of the problem+ accompanying example:

Suppose that in a certain subject area, indicators that have a quantitative expression are studied. At the same time, there is every reason to believe that the indicator depends on the indicator. This assumption can be either a scientific hypothesis or based on basic common sense. Let's leave science aside, however, and explore more appetizing areas - namely, grocery stores. Let's denote by:

– retail area of a grocery store, sq.m.,
– annual turnover of a grocery store, million rubles.

It is absolutely clear that the larger the store area, the greater in most cases its turnover will be.

Suppose that after carrying out observations/experiments/calculations/dances with a tambourine we have numerical data at our disposal:

With grocery stores, I think everything is clear: - this is the area of the 1st store, - its annual turnover, - the area of the 2nd store, - its annual turnover, etc. By the way, it is not at all necessary to have access to classified materials - a fairly accurate assessment of trade turnover can be obtained by means of mathematical statistics. However, let’s not get distracted, the commercial espionage course is already paid =)

Tabular data can also be written in the form of points and depicted in the familiar form Cartesian system .

Let's answer an important question: How many points are needed for a qualitative study?

The bigger, the better. The minimum acceptable set consists of 5-6 points. In addition, when the amount of data is small, “anomalous” results cannot be included in the sample. So, for example, a small elite store can earn orders of magnitude more than “its colleagues,” thereby distorting the general pattern that you need to find!

To put it very simply, we need to select a function, schedule which passes as close as possible to the points . This function is called approximating (approximation - approximation) or theoretical function . Generally speaking, an obvious “contender” immediately appears here - a high-degree polynomial, the graph of which passes through ALL points. But this option is complicated and often simply incorrect. (since the graph will “loop” all the time and poorly reflect the main trend).

Thus, the sought function must be quite simple and at the same time adequately reflect the dependence. As you might guess, one of the methods for finding such functions is called least squares method. First, let's look at its essence in general terms. Let some function approximate experimental data:

How to evaluate the accuracy of this approximation? Let us also calculate the differences (deviations) between the experimental and functional values (we study the drawing). The first thought that comes to mind is to estimate how large the sum is, but the problem is that the differences can be negative (For example, ) and deviations as a result of such summation will cancel each other out. Therefore, as an estimate of the accuracy of the approximation, it begs to take the sum modules deviations:

or collapsed: (in case anyone doesn’t know: – this is the sum icon, and – an auxiliary “counter” variable, which takes values from 1 to ).

By approximating experimental points with different functions, we will obtain different values, and obviously, where this sum is smaller, that function is more accurate.

Such a method exists and it is called least modulus method. However, in practice it has become much more widespread least square method, in which possible negative values are eliminated not by the module, but by squaring the deviations:

, after which efforts are aimed at selecting a function such that the sum of squared deviations was as small as possible. Actually, this is where the name of the method comes from.

And now we return to another important point: as noted above, the selected function should be quite simple - but there are also many such functions: linear , hyperbolic, exponential, logarithmic, quadratic etc. And, of course, here I would immediately like to “reduce the field of activity.” Which class of functions should I choose for research? A primitive but effective technique:

– The easiest way is to depict points on the drawing and analyze their location. If they tend to run in a straight line, then you should look for equation of a line with optimal values and . In other words, the task is to find SUCH coefficients so that the sum of squared deviations is the smallest.

If the points are located, for example, along hyperbole, then it is obviously clear that the linear function will give a poor approximation. In this case, we are looking for the most “favorable” coefficients for the hyperbola equation – those that give the minimum sum of squares .

Now note that in both cases we are talking about functions of two variables, whose arguments are searched dependency parameters:

And essentially we need to solve a standard problem - find minimum function of two variables.

Let's remember our example: suppose that “store” points tend to be located in a straight line and there is every reason to believe that linear dependence turnover from retail space. Let's find SUCH coefficients “a” and “be” such that the sum of squared deviations was the smallest. Everything is as usual - first 1st order partial derivatives. According to linearity rule You can differentiate right under the sum icon:

If you want to use this information for an essay or term paper, I will be very grateful for the link in the list of sources; you will find such detailed calculations in few places:

Let's create a standard system:

We reduce each equation by “two” and, in addition, “break up” the sums:

Note : independently analyze why “a” and “be” can be taken out beyond the sum icon. By the way, formally this can be done with the sum

Let's rewrite the system in “applied” form:

after which the algorithm for solving our problem begins to emerge:

Do we know the coordinates of the points? We know. Amounts can we find it? Easily. Let's make the simplest system of two linear equations in two unknowns(“a” and “be”). We solve the system, for example, Cramer's method, as a result of which we obtain a stationary point. Checking sufficient condition for an extremum, we can verify that at this point the function reaches exactly minimum. The check involves additional calculations and therefore we will leave it behind the scenes (if necessary, the missing frame can be viewed). We draw the final conclusion:

Function the best way (at least compared to any other linear function) brings experimental points closer . Roughly speaking, its graph passes as close as possible to these points. In tradition econometrics the resulting approximating function is also called paired linear regression equation .

The problem under consideration is of great practical importance. In our example situation, Eq. allows you to predict what trade turnover ("Igrek") the store will have at one or another value of the sales area (one or another meaning of “x”). Yes, the resulting forecast will only be a forecast, but in many cases it will turn out to be quite accurate.

I will analyze just one problem with “real” numbers, since there are no difficulties in it - all calculations are at the level of the 7th-8th grade school curriculum. In 95 percent of cases, you will be asked to find just a linear function, but at the very end of the article I will show that it is no more difficult to find the equations of the optimal hyperbola, exponential and some other functions.

In fact, all that remains is to distribute the promised goodies - so that you can learn to solve such examples not only accurately, but also quickly. We carefully study the standard:

Task

As a result of studying the relationship between two indicators, the following pairs of numbers were obtained:

Using the least squares method, find the linear function that best approximates the empirical (experienced) data. Make a drawing on which to construct experimental points and a graph of the approximating function in a Cartesian rectangular coordinate system . Find the sum of squared deviations between the empirical and theoretical values. Find out if the feature would be better (from the point of view of the least squares method) bring experimental points closer.

Please note that the “x” meanings are natural, and this has a characteristic meaningful meaning, which I will talk about a little later; but they, of course, can also be fractional. In addition, depending on the content of a particular task, both “X” and “game” values can be completely or partially negative. Well, we have been given a “faceless” task, and we begin it solution:

We find the coefficients of the optimal function as a solution to the system:

For the purpose of more compact recording, the “counter” variable can be omitted, since it is already clear that the summation is carried out from 1 to .

It is more convenient to calculate the required amounts in tabular form:

Calculations can be carried out on a microcalculator, but it is much better to use Excel - both faster and without errors; watch a short video:

Thus, we get the following system:

Here you can multiply the second equation by 3 and subtract the 2nd from the 1st equation term by term. But this is luck - in practice, systems are often not a gift, and in such cases it saves Cramer's method:
, which means the system has a unique solution.

Let's check. I understand that you don’t want to, but why skip errors where they can absolutely not be missed? Let us substitute the found solution into the left side of each equation of the system:

The right-hand sides of the corresponding equations are obtained, which means that the system is solved correctly.

Thus, the desired approximating function: – from all linear functions It is she who best approximates the experimental data.

Unlike straight dependence of the store's turnover on its area, the found dependence is reverse (principle “the more, the less”), and this fact is immediately revealed by the negative slope. Function tells us that with an increase in a certain indicator by 1 unit, the value of the dependent indicator decreases average by 0.65 units. As they say, the higher the price of buckwheat, the less it is sold.

To plot the graph of the approximating function, we find its two values:

and execute the drawing:

The constructed straight line is called trend line (namely, a linear trend line, i.e. in the general case, a trend is not necessarily a straight line). Everyone is familiar with the expression “to be in trend,” and I think that this term does not need additional comments.

Let's calculate the sum of squared deviations between empirical and theoretical values. Geometrically, this is the sum of the squares of the lengths of the “raspberry” segments (two of which are so small that they are not even visible).

Let's summarize the calculations in a table:

Again, they can be done manually; just in case, I’ll give an example for the 1st point:

but it is much more effective to do it in the already known way:

We repeat once again: What is the meaning of the result obtained? From all linear functions y function the indicator is the smallest, that is, in its family it is the best approximation. And here, by the way, the final question of the problem is not accidental: what if the proposed exponential function would it be better to bring the experimental points closer?

Let's find the corresponding sum of squared deviations - to distinguish, I will denote them by the letter “epsilon”. The technique is exactly the same:

And again, just in case, calculations for the 1st point:

In Excel we use the standard function EXP (syntax can be found in Excel Help).

Conclusion: , which means that the exponential function approximates the experimental points worse than a straight line .

But here it should be noted that “worse” is doesn't mean yet, what is wrong. Now I have built a graph of this exponential function - and it also passes close to the points - so much so that without analytical research it is difficult to say which function is more accurate.

This concludes the solution, and I return to the question of the natural values of the argument. In various studies, usually economic or sociological, natural “X’s” are used to number months, years or other equal time intervals. Consider, for example, the following problem.