Least squares absolute errors. Approximation of experimental data

Let us approximate the function by a polynomial of degree 2. To do this, we calculate the coefficients of the normal system of equations:

, ,

Let's create a normal least squares system, which has the form:

The solution to the system is easy to find:, , .

Thus, a polynomial of the 2nd degree is found: .

Theoretical information

Example 2. Finding the optimal degree of a polynomial.

Example 3. Derivation of a normal system of equations for finding the parameters of the empirical dependence.

Let us derive a system of equations to determine the coefficients and functions , which carries out the root-mean-square approximation of a given function by points. Let's compose a function and write down the necessary extremum condition for it:

Then the normal system will take the form:

We obtained a linear system of equations for unknown parameters and, which is easily solved.

Theoretical information

Experimental data on the values ​​of variables X And at are given in the table.

As a result of their alignment, the function is obtained

Using least square method, approximate these data by a linear dependence y=ax+b(find parameters A And b). Find out which of the two lines better (in the sense of the least squares method) aligns the experimental data. Make a drawing.

The essence of the least squares method (LSM).

The task is to find the linear dependence coefficients at which the function of two variables A And btakes the smallest value. That is, given A And b the sum of squared deviations of the experimental data from the found straight line will be the smallest. This is the whole point of the least squares method.

Thus, solving the example comes down to finding the extremum of a function of two variables.

Deriving formulas for finding coefficients.

A system of two equations with two unknowns is compiled and solved. Finding the partial derivatives of a function by variables A And b, we equate these derivatives to zero.

We solve the resulting system of equations using any method (for example by substitution method or Cramer’s method) and obtain formulas for finding coefficients using the least squares method (LSM).

Given A And b function takes the smallest value. The proof of this fact is given below in the text at the end of the page.

That's the whole method of least squares. Formula for finding the parameter a contains the sums , , , and parameter n— amount of experimental data. We recommend calculating the values ​​of these amounts separately.

Coefficient b found after calculation a.

It's time to remember the original example.


In our example n=5. We fill out the table for the convenience of calculating the amounts that are included in the formulas of the required coefficients.

The values ​​in the fourth row of the table are obtained by multiplying the values ​​of the 2nd row by the values ​​of the 3rd row for each number i.

The values ​​in the fifth row of the table are obtained by squaring the values ​​in the 2nd row for each number i.

The values ​​in the last column of the table are the sums of the values ​​across the rows.

We use the formulas of the least squares method to find the coefficients A And b. We substitute the corresponding values ​​from the last column of the table into them:

Hence, y = 0.165x+2.184— the desired approximating straight line.

It remains to find out which of the lines y = 0.165x+2.184 or better approximates the original data, that is, makes an estimate using the least squares method.

Error estimation of the least squares method.

To do this, you need to calculate the sum of squared deviations of the original data from these lines And , a smaller value corresponds to a line that better approximates the original data in the sense of the least squares method.

Since , then straight y = 0.165x+2.184 better approximates the original data.

Graphic illustration of the least squares (LS) method.

Everything is clearly visible on the graphs. The red line is the found straight line y = 0.165x+2.184, the blue line is , pink dots are the original data.

Why is this needed, why all these approximations?

I personally use it to solve problems of data smoothing, interpolation and extrapolation problems (in the original example they might be asked to find the value of an observed value y at x=3 or when x=6 using the least squares method). But we’ll talk more about this later in another section of the site.

So that when found A And b function takes the smallest value, it is necessary that at this point the matrix of the quadratic form of the second order differential for the function was positive definite. Let's show it.

The second order differential has the form:

That is

Therefore, the matrix of quadratic form has the form

and the values ​​of the elements do not depend on A And b.

Let us show that the matrix is ​​positive definite. To do this, the angular minors must be positive.

Angular minor of the first order . The inequality is strict because the points do not coincide. In what follows we will imply this.

Second order angular minor

Let's prove that by the method of mathematical induction.

Conclusion: found values A And b correspond to the smallest value of the function , therefore, are the required parameters for the least squares method.

Developing a forecast using the least squares method. Example of problem solution

Extrapolation is a scientific research method that is based on the dissemination of past and present trends, patterns, and connections to the future development of the forecast object. Extrapolation methods include moving average method, exponential smoothing method, least squares method.

Essence least squares method consists in minimizing the sum of square deviations between observed and calculated values. The calculated values ​​are found using the selected equation - the regression equation. The smaller the distance between the actual values ​​and the calculated ones, the more accurate the forecast based on the regression equation.

A theoretical analysis of the essence of the phenomenon being studied, the change in which is reflected by a time series, serves as the basis for choosing a curve. Sometimes considerations about the nature of the increase in the levels of the series are taken into account. Thus, if output growth is expected in an arithmetic progression, then smoothing is performed in a straight line. If it turns out that the growth is in geometric progression, then smoothing must be done using an exponential function.

Working formula for the least squares method : Y t+1 = a*X + b, where t + 1 – forecast period; Уt+1 – predicted indicator; a and b are coefficients; X is a symbol of time.

Calculation of coefficients a and b is carried out using the following formulas:

where, Uf – actual values ​​of the dynamics series; n – number of time series levels;

Smoothing time series using the least squares method serves to reflect the pattern of development of the phenomenon being studied. In the analytical expression of a trend, time is considered as an independent variable, and the levels of the series act as a function of this independent variable.

The development of a phenomenon does not depend on how many years have passed since the starting point, but on what factors influenced its development, in what direction and with what intensity. From here it is clear that the development of a phenomenon over time is the result of the action of these factors.

Correctly establishing the type of curve, the type of analytical dependence on time is one of the most difficult tasks of predictive analysis .

The selection of the type of function that describes the trend, the parameters of which are determined by the least squares method, is carried out in most cases empirically, by constructing a number of functions and comparing them with each other according to the value of the mean square error, calculated by the formula:

where UV are the actual values ​​of the dynamics series; Ur – calculated (smoothed) values ​​of the dynamics series; n – number of time series levels; p – the number of parameters defined in formulas describing the trend (development trend).

Disadvantages of the least squares method :

  • when trying to describe the economic phenomenon being studied using a mathematical equation, the forecast will be accurate for a short period of time and the regression equation should be recalculated as new information becomes available;
  • the complexity of selecting a regression equation that is solvable using standard computer programs.

An example of using the least squares method to develop a forecast

Task . There are data characterizing the unemployment rate in the region, %

  • Construct a forecast of the unemployment rate in the region for November, December, January using the following methods: moving average, exponential smoothing, least squares.
  • Calculate the errors in the resulting forecasts using each method.
  • Compare the results and draw conclusions.

Least squares solution

To solve this, we will draw up a table in which we will make the necessary calculations:

ε = 28.63/10 = 2.86% forecast accuracy high.

Conclusion : Comparing the results obtained from the calculations moving average method , exponential smoothing method and the least squares method, we can say that the average relative error when calculating using the exponential smoothing method falls within the range of 20-50%. This means that the accuracy of the forecast in this case is only satisfactory.

In the first and third cases, the forecast accuracy is high, since the average relative error is less than 10%. But the moving average method made it possible to obtain more reliable results (forecast for November - 1.52%, forecast for December - 1.53%, forecast for January - 1.49%), since the average relative error when using this method is the smallest - 1 ,13%.

Least square method

Least squares method (LSM).

The greater the number of experimental points, the more accurate the statistical assessment of the coefficients (due to a decrease in the Student coefficient) and the closer the estimate to the estimate of the general sample.

Obtaining values ​​at each experimental point is often associated with significant labor costs, so a compromise number of experiments is often carried out that gives a manageable estimate and does not lead to excessive labor costs. As a rule, the number of experimental points for a linear least squares dependence with two coefficients is selected in the region of 5-7 points.

A Brief Theory of Least Squares for Linear Relationships

Let's say we have a set of experimental data in the form of pairs of values ​​[`y_i`, `x_i`], where `i` is the number of one experimental measurement from 1 to `n`; `y_i` - the value of the measured quantity at point `i`; `x_i` - the value of the parameter we set at point `i`.

As an example, consider the operation of Ohm's law. By changing the voltage (potential difference) between sections of an electrical circuit, we measure the amount of current passing through this section. Physics gives us a dependence found experimentally:

`I = U/R`,
where `I` is the current strength; `R` - resistance; `U` - voltage.

In this case, `y_i` is the current value being measured, and `x_i` is the voltage value.

As another example, consider the absorption of light by a solution of a substance in solution. Chemistry gives us the formula:

`A = ε l C`,
where `A` is the optical density of the solution; `ε` - transmittance of the solute; `l` - path length when light passes through a cuvette with a solution; `C` is the concentration of the dissolved substance.

In this case, `y_i` is the measured value of optical density `A`, and `x_i` is the concentration value of the substance that we specify.

We will consider the case when the relative error in the assignment `x_i` is significantly less than the relative error in the measurement `y_i`. We will also assume that all measured values ​​`y_i` are random and normally distributed, i.e. obey the normal distribution law.

In the case of a linear dependence of `y` on `x`, we can write the theoretical dependence:
`y = a + b x`.

From a geometric point of view, the coefficient `b` denotes the tangent of the angle of inclination of the line to the `x` axis, and the coefficient `a` - the value of `y` at the point of intersection of the line with the `y` axis (at `x = 0`).

Finding the regression line parameters.

In an experiment, the measured values ​​of `y_i` cannot exactly lie on the theoretical straight line due to measurement errors, which are always inherent in real life. Therefore, a linear equation must be represented by a system of equations:
`y_i = a + b x_i + ε_i` (1),
where `ε_i` is the unknown measurement error of `y` in the `i`-th experiment.

Dependency (1) is also called regression, i.e. the dependence of two quantities on each other with statistical significance.

The task of restoring the dependence is to find the coefficients `a` and `b` from the experimental points [`y_i`, `x_i`].

To find the coefficients `a` and `b` it is usually used least square method(MNC). It is a special case of the maximum likelihood principle.

Let's rewrite (1) in the form `ε_i = y_i - a - b x_i`.

Then the sum of squared errors will be
`Φ = sum_(i=1)^(n) ε_i^2 = sum_(i=1)^(n) (y_i - a - b x_i)^2`. (2)

The principle of least squares (least squares) is to minimize the sum (2) with respect to parameters `a` and `b`.

The minimum is achieved when the partial derivatives of the sum (2) with respect to the coefficients `a` and `b` are equal to zero:
`frac(partial Φ)(partial a) = frac(partial sum_(i=1)^(n) (y_i - a - b x_i)^2)(partial a) = 0`
`frac(partial Φ)(partial b) = frac(partial sum_(i=1)^(n) (y_i - a - b x_i)^2)(partial b) = 0`

Expanding the derivatives, we obtain a system of two equations with two unknowns:
`sum_(i=1)^(n) (2a + 2bx_i — 2y_i) = sum_(i=1)^(n) (a + bx_i — y_i) = 0`
`sum_(i=1)^(n) (2bx_i^2 + 2ax_i — 2x_iy_i) = sum_(i=1)^(n) (bx_i^2 + ax_i — x_iy_i) = 0`

We open the brackets and transfer the sums independent of the required coefficients to the other half, we obtain a system of linear equations:
`sum_(i=1)^(n) y_i = a n + b sum_(i=1)^(n) bx_i`
`sum_(i=1)^(n) x_iy_i = a sum_(i=1)^(n) x_i + b sum_(i=1)^(n) x_i^2`

Solving the resulting system, we find formulas for the coefficients `a` and `b`:

`a = frac(sum_(i=1)^(n) y_i sum_(i=1)^(n) x_i^2 — sum_(i=1)^(n) x_i sum_(i=1)^(n ) x_iy_i) (n sum_(i=1)^(n) x_i^2 — (sum_(i=1)^(n) x_i)^2)` (3.1)

`b = frac(n sum_(i=1)^(n) x_iy_i — sum_(i=1)^(n) x_i sum_(i=1)^(n) y_i) (n sum_(i=1)^ (n) x_i^2 — (sum_(i=1)^(n) x_i)^2)` (3.2)

These formulas have solutions when `n > 1` (the line can be constructed using at least 2 points) and when the determinant `D = n sum_(i=1)^(n) x_i^2 - (sum_(i= 1)^(n) x_i)^2 != 0`, i.e. when the `x_i` points in the experiment are different (i.e. when the line is not vertical).

Estimation of errors of regression line coefficients

For a more accurate assessment of the error in calculating the coefficients `a` and `b`, a large number of experimental points is desirable. When `n = 2`, it is impossible to estimate the error of the coefficients, because the approximating line will uniquely pass through two points.

The error of the random variable `V` is determined by law of error accumulation
`S_V^2 = sum_(i=1)^p (frac(partial f)(partial z_i))^2 S_(z_i)^2`,
where `p` is the number of parameters `z_i` with error `S_(z_i)`, which affect the error `S_V`;
`f` is a function of the dependence of `V` on `z_i`.

Let us write down the law of error accumulation for the error of coefficients `a` and `b`
`S_a^2 = sum_(i=1)^(n)(frac(partial a)(partial y_i))^2 S_(y_i)^2 + sum_(i=1)^(n)(frac(partial a )(partial x_i))^2 S_(x_i)^2 = S_y^2 sum_(i=1)^(n)(frac(partial a)(partial y_i))^2 `,
`S_b^2 = sum_(i=1)^(n)(frac(partial b)(partial y_i))^2 S_(y_i)^2 + sum_(i=1)^(n)(frac(partial b )(partial x_i))^2 S_(x_i)^2 = S_y^2 sum_(i=1)^(n)(frac(partial b)(partial y_i))^2 `,
because `S_(x_i)^2 = 0` (we previously made a reservation that the error `x` is negligible).

`S_y^2 = S_(y_i)^2` - error (variance, squared standard deviation) in the measurement of `y`, assuming that the error is uniform for all values ​​of `y`.

Substituting formulas for calculating `a` and `b` into the resulting expressions we get

`S_a^2 = S_y^2 frac(sum_(i=1)^(n) (sum_(i=1)^(n) x_i^2 — x_i sum_(i=1)^(n) x_i)^2 ) (D^2) = S_y^2 frac((n sum_(i=1)^(n) x_i^2 — (sum_(i=1)^(n) x_i)^2) sum_(i=1) ^(n) x_i^2) (D^2) = S_y^2 frac(sum_(i=1)^(n) x_i^2) (D)` (4.1)

`S_b^2 = S_y^2 frac(sum_(i=1)^(n) (n x_i — sum_(i=1)^(n) x_i)^2) (D^2) = S_y^2 frac( n (n sum_(i=1)^(n) x_i^2 — (sum_(i=1)^(n) x_i)^2)) (D^2) = S_y^2 frac(n) (D) ` (4.2)

In most real experiments, the value of `Sy` is not measured. To do this, it is necessary to carry out several parallel measurements (experiments) at one or several points in the plan, which increases the time (and possibly the cost) of the experiment. Therefore, it is usually assumed that the deviation of `y` from the regression line can be considered random. The estimate of variance `y` in this case is calculated using the formula.

`S_y^2 = S_(y, rest)^2 = frac(sum_(i=1)^n (y_i - a - b x_i)^2) (n-2)`.

The `n-2` divisor appears because our number of degrees of freedom has decreased due to the calculation of two coefficients using the same sample of experimental data.

This estimate is also called the residual variance relative to the regression line `S_(y, rest)^2`.

The significance of coefficients is assessed using the Student’s t test

`t_a = frac(|a|) (S_a)`, `t_b = frac(|b|) (S_b)`

If the calculated criteria `t_a`, `t_b` are less than the tabulated criteria `t(P, n-2)`, then it is considered that the corresponding coefficient is not significantly different from zero with a given probability `P`.

To assess the quality of the description of a linear relationship, you can compare `S_(y, rest)^2` and `S_(bar y)` relative to the mean using the Fisher criterion.

`S_(bar y) = frac(sum_(i=1)^n (y_i — bar y)^2) (n-1) = frac(sum_(i=1)^n (y_i — (sum_(i= 1)^n y_i) /n)^2) (n-1)` - sample estimate of the variance `y` relative to the mean.

To assess the effectiveness of the regression equation to describe the dependence, the Fisher coefficient is calculated
`F = S_(bar y) / S_(y, rest)^2`,
which is compared with the tabular Fisher coefficient `F(p, n-1, n-2)`.

If `F > F(P, n-1, n-2)`, the difference between the description of the relationship `y = f(x)` using the regression equation and the description using the mean is considered statistically significant with probability `P`. Those. regression describes the dependence better than the spread of `y` around the mean.

Click on the chart
to add values ​​to the table

Least square method. The least squares method means the determination of unknown parameters a, b, c, the accepted functional dependence

The least squares method refers to the determination of unknown parameters a, b, c,… accepted functional dependence

y = f(x,a,b,c,…),

which would provide a minimum of the mean square (variance) of the error

, (24)

where x i, y i is a set of pairs of numbers obtained from the experiment.

Since the condition for the extremum of a function of several variables is the condition that its partial derivatives are equal to zero, then the parameters a, b, c,… are determined from the system of equations:

; ; ; … (25)

It must be remembered that the least squares method is used to select parameters after the type of function y = f(x) defined

If, from theoretical considerations, no conclusions can be drawn about what the empirical formula should be, then one has to be guided by visual representations, primarily by graphical representations of the observed data.

In practice, they are most often limited to the following types of functions:

1) linear ;

2) quadratic a.

Least square method

Least square method ( OLS, OLS, Ordinary Least Squares) - one of the basic methods of regression analysis for estimating unknown parameters of regression models using sample data. The method is based on minimizing the sum of squares of regression residuals.

It should be noted that the least squares method itself can be called a method for solving a problem in any area if the solution lies in or satisfies some criterion for minimizing the sum of squares of some functions of the required variables. Therefore, the least squares method can also be used for an approximate representation (approximation) of a given function by other (simpler) functions, when finding a set of quantities that satisfy equations or constraints, the number of which exceeds the number of these quantities, etc.

The essence of MNC

Let some (parametric) model of a probabilistic (regression) relationship between the (explained) variable be given y and many factors (explanatory variables) x

where is the vector of unknown model parameters

- random model error.

Let there also be sample observations of the values ​​of these variables. Let be the observation number (). Then are the values ​​of the variables in the th observation. Then, for given values ​​of parameters b, it is possible to calculate the theoretical (model) values ​​of the explained variable y:

The size of the residuals depends on the values ​​of the parameters b.

The essence of the least squares method (ordinary, classical) is to find parameters b for which the sum of the squares of the residuals (eng. Residual Sum of Squares) will be minimal:

In the general case, this problem can be solved by numerical optimization (minimization) methods. In this case they talk about nonlinear least squares(NLS or NLLS - English) Non-Linear Least Squares). In many cases it is possible to obtain an analytical solution. To solve the minimization problem, it is necessary to find stationary points of the function by differentiating it with respect to the unknown parameters b, equating the derivatives to zero and solving the resulting system of equations:

If the model's random errors are normally distributed, have the same variance, and are uncorrelated, OLS parameter estimates are the same as maximum likelihood estimates (MLM).

OLS in the case of a linear model

Let the regression dependence be linear:

Let y is a column vector of observations of the explained variable, and is a matrix of factor observations (the rows of the matrix are the vectors of factor values ​​in a given observation, the columns are the vector of values ​​of a given factor in all observations). The matrix representation of the linear model is:

Then the vector of estimates of the explained variable and the vector of regression residuals will be equal

Accordingly, the sum of squares of the regression residuals will be equal to

Differentiating this function with respect to the vector of parameters and equating the derivatives to zero, we obtain a system of equations (in matrix form):


The solution of this system of equations gives the general formula for least squares estimates for a linear model:

For analytical purposes, the latter representation of this formula is useful. If in a regression model the data centered, then in this representation the first matrix has the meaning of a sample covariance matrix of factors, and the second is a vector of covariances of factors with the dependent variable. If in addition the data is also normalized to MSE (that is, ultimately standardized), then the first matrix has the meaning of a sample correlation matrix of factors, the second vector - a vector of sample correlations of factors with the dependent variable.

An important property of OLS estimates for models with constant- the line of the constructed regression passes through the center of gravity of the sample data, that is, the equality is satisfied:

In particular, in the extreme case, when the only regressor is a constant, we find that the OLS estimate of the only parameter (the constant itself) is equal to the average value of the explained variable. That is, the arithmetic mean, known for its good properties from the laws of large numbers, is also an least squares estimate - it satisfies the criterion of the minimum sum of squared deviations from it.

Example: simplest (pairwise) regression

In the case of paired linear regression, the calculation formulas are simplified (you can do without matrix algebra):

Properties of OLS estimators

First of all, we note that for linear models, OLS estimates are linear estimates, as follows from the above formula. For unbiased OLS estimates, it is necessary and sufficient to fulfill the most important condition of regression analysis: the mathematical expectation of a random error, conditional on the factors, must be equal to zero. This condition, in particular, is satisfied if

  1. the mathematical expectation of random errors is zero, and
  2. factors and random errors are independent random variables.

The second condition - the condition of exogeneity of factors - is fundamental. If this property is not met, then we can assume that almost any estimates will be extremely unsatisfactory: they will not even be consistent (that is, even a very large amount of data does not allow us to obtain high-quality estimates in this case). In the classical case, a stronger assumption is made about the determinism of the factors, as opposed to a random error, which automatically means that the exogeneity condition is met. In the general case, for the consistency of the estimates, it is sufficient to satisfy the exogeneity condition together with the convergence of the matrix to some non-singular matrix as the sample size increases to infinity.

In order for, in addition to consistency and unbiasedness, estimates of (ordinary) least squares to be also effective (the best in the class of linear unbiased estimates), additional properties of random error must be met:

These assumptions can be formulated for the covariance matrix of the random error vector

A linear model that satisfies these conditions is called classical. OLS estimates for classical linear regression are unbiased, consistent and the most effective estimates in the class of all linear unbiased estimates (in the English literature the abbreviation is sometimes used BLUE (Best Linear Unbaised Estimator) - the best linear unbiased estimate; in Russian literature the Gauss-Markov theorem is more often cited). As is easy to show, the covariance matrix of the vector of coefficient estimates will be equal to:

Generalized OLS

The least squares method allows for broad generalization. Instead of minimizing the sum of squares of the residuals, one can minimize some positive definite quadratic form of the vector of residuals, where is some symmetric positive definite weight matrix. Conventional least squares is a special case of this approach, where the weight matrix is ​​proportional to the identity matrix. As is known from the theory of symmetric matrices (or operators), for such matrices there is a decomposition. Consequently, the specified functional can be represented as follows, that is, this functional can be represented as the sum of the squares of some transformed “remainders”. Thus, we can distinguish a class of least squares methods - LS methods (Least Squares).

It has been proven (Aitken's theorem) that for a generalized linear regression model (in which no restrictions are imposed on the covariance matrix of random errors), the most effective (in the class of linear unbiased estimates) are the so-called estimates. generalized Least Squares (GLS - Generalized Least Squares)- LS method with a weight matrix equal to the inverse covariance matrix of random errors: .

It can be shown that the formula for GLS estimates of the parameters of a linear model has the form

The covariance matrix of these estimates will accordingly be equal to

In fact, the essence of OLS lies in a certain (linear) transformation (P) of the original data and the application of ordinary OLS to the transformed data. The purpose of this transformation is that for the transformed data, the random errors already satisfy the classical assumptions.

Weighted OLS

In the case of a diagonal weight matrix (and therefore a covariance matrix of random errors), we have the so-called weighted Least Squares (WLS). In this case, the weighted sum of squares of the model residuals is minimized, that is, each observation receives a “weight” that is inversely proportional to the variance of the random error in this observation: . In fact, the data are transformed by weighting the observations (dividing by an amount proportional to the estimated standard deviation of the random errors), and ordinary OLS is applied to the weighted data.

Some special cases of using MNC in practice

Approximation of linear dependence

Let us consider the case when, as a result of studying the dependence of a certain scalar quantity on a certain scalar quantity (This could be, for example, the dependence of voltage on current strength: , where is a constant value, the resistance of the conductor), measurements of these quantities were carried out, as a result of which the values ​​and their corresponding values. The measurement data must be recorded in a table.

Table. Measurement results.

Measurement no.

The question is: what value of the coefficient can be selected to best describe the dependence? According to the least squares method, this value should be such that the sum of the squared deviations of the values ​​from the values

was minimal

The sum of squared deviations has one extremum - a minimum, which allows us to use this formula. Let us find from this formula the value of the coefficient. To do this, we transform its left side as follows:

The last formula allows us to find the value of the coefficient, which is what was required in the problem.


Until the beginning of the 19th century. scientists did not have certain rules for solving a system of equations in which the number of unknowns is less than the number of equations; Until that time, private techniques were used that depended on the type of equations and on the wit of the calculators, and therefore different calculators, based on the same observational data, came to different conclusions. Gauss (1795) was the first to use the method, and Legendre (1805) independently discovered and published it under its modern name (French. Méthode des moindres quarrés ) . Laplace related the method to probability theory, and the American mathematician Adrain (1808) considered its probability-theoretic applications. The method was widespread and improved by further research by Encke, Bessel, Hansen and others.

Alternative uses of OLS

The idea of ​​the least squares method can also be used in other cases not directly related to regression analysis. The fact is that the sum of squares is one of the most common proximity measures for vectors (Euclidean metric in finite-dimensional spaces).

One application is the “solution” of systems of linear equations in which the number of equations is greater than the number of variables

where the matrix is ​​not square, but rectangular of size .

Such a system of equations, in the general case, has no solution (if the rank is actually greater than the number of variables). Therefore, this system can be “solved” only in the sense of choosing such a vector to minimize the “distance” between the vectors and . To do this, you can apply the criterion of minimizing the sum of squares of the differences between the left and right sides of the system equations, that is. It is easy to show that solving this minimization problem leads to solving the following system of equations

After leveling, we obtain a function of the following form: g (x) = x + 1 3 + 1 .

We can approximate this data using the linear relationship y = a x + b by calculating the corresponding parameters. To do this, we will need to apply the so-called least squares method. You will also need to make a drawing to check which line will best align the experimental data.

What exactly is OLS (least squares method)

The main thing we need to do is to find such coefficients of linear dependence at which the value of the function of two variables F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2 will be the smallest. In other words, for certain values ​​of a and b, the sum of the squared deviations of the presented data from the resulting straight line will have a minimum value. This is the meaning of the least squares method. All we need to do to solve the example is to find the extremum of the function of two variables.

How to derive formulas for calculating coefficients

In order to derive formulas for calculating coefficients, you need to create and solve a system of equations with two variables. To do this, we calculate the partial derivatives of the expression F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2 with respect to a and b and equate them to 0.

δ F (a , b) δ a = 0 δ F (a , b) δ b = 0 ⇔ - 2 ∑ i = 1 n (y i - (a x i + b)) x i = 0 - 2 ∑ i = 1 n ( y i - (a x i + b)) = 0 ⇔ a ∑ i = 1 n x i 2 + b ∑ i = 1 n x i = ∑ i = 1 n x i y i a ∑ i = 1 n x i + ∑ i = 1 n b = ∑ i = 1 n y i ⇔ a ∑ i = 1 n x i 2 + b ∑ i = 1 n x i = ∑ i = 1 n x i y i a ∑ i = 1 n x i + n b = ∑ i = 1 n y i

To solve a system of equations, you can use any methods, for example, substitution or Cramer's method. As a result, we should have formulas that can be used to calculate coefficients using the least squares method.

n ∑ i = 1 n x i y i - ∑ i = 1 n x i ∑ i = 1 n y i n ∑ i = 1 n - ∑ i = 1 n x i 2 b = ∑ i = 1 n y i - a ∑ i = 1 n x i n

We have calculated the values ​​of the variables at which the function
F (a , b) = ∑ i = 1 n (y i - (a x i + b)) 2 will take the minimum value. In the third paragraph we will prove why it is exactly like this.

This is the application of the least squares method in practice. Its formula, which is used to find the parameter a, includes ∑ i = 1 n x i, ∑ i = 1 n y i, ∑ i = 1 n x i y i, ∑ i = 1 n x i 2, as well as the parameter
n – it denotes the amount of experimental data. We advise you to calculate each amount separately. The value of the coefficient b is calculated immediately after a.

Let's go back to the original example.

Example 1

Here we have n equal to five. To make it more convenient to calculate the required amounts included in the coefficient formulas, let’s fill out the table.

i = 1 i=2 i=3 i=4 i=5 ∑ i = 1 5
x i 0 1 2 4 5 12
y i 2 , 1 2 , 4 2 , 6 2 , 8 3 12 , 9
x i y i 0 2 , 4 5 , 2 11 , 2 15 33 , 8
x i 2 0 1 4 16 25 46


The fourth row includes the data obtained by multiplying the values ​​from the second row by the values ​​of the third for each individual i. The fifth line contains the data from the second, squared. The last column shows the sums of the values ​​of individual rows.

Let's use the least squares method to calculate the coefficients a and b we need. To do this, substitute the required values ​​from the last column and calculate the amounts:

n ∑ i = 1 n x i y i - ∑ i = 1 n x i ∑ i = 1 n y i n ∑ i = 1 n - ∑ i = 1 n x i 2 b = ∑ i = 1 n y i - a ∑ i = 1 n x i n ⇒ a = 5 33, 8 - 12 12, 9 5 46 - 12 2 b = 12, 9 - a 12 5 ⇒ a ≈ 0, 165 b ≈ 2, 184

It turns out that the required approximating straight line will look like y = 0, 165 x + 2, 184. Now we need to determine which line will better approximate the data - g (x) = x + 1 3 + 1 or 0, 165 x + 2, 184. Let's estimate using the least squares method.

To calculate the error, we need to find the sum of squared deviations of the data from the straight lines σ 1 = ∑ i = 1 n (y i - (a x i + b i)) 2 and σ 2 = ∑ i = 1 n (y i - g (x i)) 2, the minimum value will correspond to a more suitable line.

σ 1 = ∑ i = 1 n (y i - (a x i + b i)) 2 = = ∑ i = 1 5 (y i - (0, 165 x i + 2, 184)) 2 ≈ 0, 019 σ 2 = ∑ i = 1 n (y i - g (x i)) 2 = = ∑ i = 1 5 (y i - (x i + 1 3 + 1)) 2 ≈ 0.096

Answer: since σ 1< σ 2 , то прямой, наилучшим образом аппроксимирующей исходные данные, будет
y = 0.165 x + 2.184.

The least squares method is clearly shown in the graphical illustration. The red line marks the straight line g (x) = x + 1 3 + 1, the blue line marks y = 0, 165 x + 2, 184. The original data is indicated by pink dots.

Let us explain why exactly approximations of this type are needed.

They can be used in tasks that require data smoothing, as well as in those where data must be interpolated or extrapolated. For example, in the problem discussed above, one could find the value of the observed quantity y at x = 3 or at x = 6. We have devoted a separate article to such examples.

Proof of the OLS method

In order for the function to take a minimum value when a and b are calculated, it is necessary that at a given point the matrix of the quadratic form of the differential of the function of the form F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2 is positive definite. Let's show you how it should look.

Example 2

We have a second order differential of the following form:

d 2 F (a ; b) = δ 2 F (a ; b) δ a 2 d 2 a + 2 δ 2 F (a ; b) δ a δ b d a d b + δ 2 F (a ; b) δ b 2 d 2 b


δ 2 F (a ; b) δ a 2 = δ δ F (a ; b) δ a δ a = = δ - 2 ∑ i = 1 n (y i - (a x i + b)) x i δ a = 2 ∑ i = 1 n (x i) 2 δ 2 F (a; b) δ a δ b = δ δ F (a; b) δ a δ b = = δ - 2 ∑ i = 1 n (y i - (a x i + b) ) x i δ b = 2 ∑ i = 1 n x i δ 2 F (a ; b) δ b 2 = δ δ F (a ; b) δ b δ b = δ - 2 ∑ i = 1 n (y i - (a x i + b)) δ b = 2 ∑ i = 1 n (1) = 2 n

In other words, we can write it like this: d 2 F (a ; b) = 2 ∑ i = 1 n (x i) 2 d 2 a + 2 2 ∑ x i i = 1 n d a d b + (2 n) d 2 b.

We obtained a matrix of the quadratic form M = 2 ∑ i = 1 n (x i) 2 2 ∑ i = 1 n x i 2 ∑ i = 1 n x i 2 n .

In this case, the values ​​of individual elements will not change depending on a and b . Is this matrix positive definite? To answer this question, let's check whether its angular minors are positive.

We calculate the angular minor of the first order: 2 ∑ i = 1 n (x i) 2 > 0 . Since the points x i do not coincide, the inequality is strict. We will keep this in mind in further calculations.

We calculate the second order angular minor:

d e t (M) = 2 ∑ i = 1 n (x i) 2 2 ∑ i = 1 n x i 2 ∑ i = 1 n x i 2 n = 4 n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2

After this, we proceed to prove the inequality n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 using mathematical induction.

  1. Let's check whether this inequality is valid for an arbitrary n. Let's take 2 and calculate:

2 ∑ i = 1 2 (x i) 2 - ∑ i = 1 2 x i 2 = 2 x 1 2 + x 2 2 - x 1 + x 2 2 = = x 1 2 - 2 x 1 x 2 + x 2 2 = x 1 + x 2 2 > 0

We have obtained a correct equality (if the values ​​x 1 and x 2 do not coincide).

  1. Let us make the assumption that this inequality will be true for n, i.e. n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 – true.
  2. Now we will prove the validity for n + 1, i.e. that (n + 1) ∑ i = 1 n + 1 (x i) 2 - ∑ i = 1 n + 1 x i 2 > 0, if n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 .

We calculate:

(n + 1) ∑ i = 1 n + 1 (x i) 2 - ∑ i = 1 n + 1 x i 2 = = (n + 1) ∑ i = 1 n (x i) 2 + x n + 1 2 - ∑ i = 1 n x i + x n + 1 2 = = n ∑ i = 1 n (x i) 2 + n x n + 1 2 + ∑ i = 1 n (x i) 2 + x n + 1 2 - - ∑ i = 1 n x i 2 + 2 x n + 1 ∑ i = 1 n x i + x n + 1 2 = = ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + n x n + 1 2 - x n + 1 ∑ i = 1 n x i + ∑ i = 1 n (x i) 2 = = ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + x n + 1 2 - 2 x n + 1 x 1 + x 1 2 + + x n + 1 2 - 2 x n + 1 x 2 + x 2 2 + . . . + x n + 1 2 - 2 x n + 1 x 1 + x n 2 = = n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + + (x n + 1 - x 1) 2 + (x n + 1 - x 2) 2 + . . . + (x n - 1 - x n) 2 > 0

The expression enclosed in curly braces will be greater than 0 (based on what we assumed in step 2), and the remaining terms will be greater than 0, since they are all squares of numbers. We have proven the inequality.

Answer: the found a and b will correspond to the smallest value of the function F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2, which means that they are the required parameters of the least squares method (LSM).

If a certain physical quantity depends on another quantity, then this dependence can be studied by measuring y at different values ​​of x. As a result of measurements, a number of values ​​are obtained:

x 1, x 2, ..., x i, ..., x n;

y 1 , y 2 , ..., y i , ... , y n .

Based on the data of such an experiment, it is possible to construct a graph of the dependence y = ƒ(x). The resulting curve makes it possible to judge the form of the function ƒ(x). However, the constant coefficients that enter into this function remain unknown. They can be determined using the least squares method. Experimental points, as a rule, do not lie exactly on the curve. The least squares method requires that the sum of the squares of the deviations of the experimental points from the curve, i.e. 2 was the smallest.

In practice, this method is most often (and most simply) used in the case of a linear relationship, i.e. When

y = kx or y = a + bx.

Linear dependence is very widespread in physics. And even when the relationship is nonlinear, they usually try to construct a graph so as to get a straight line. For example, if it is assumed that the refractive index of glass n is related to the light wavelength λ by the relation n = a + b/λ 2, then the dependence of n on λ -2 is plotted on the graph.

Consider the dependency y = kx(a straight line passing through the origin). Let's compose the value φ the sum of the squares of the deviations of our points from the straight line

The value of φ is always positive and turns out to be smaller the closer our points are to the straight line. The least squares method states that the value for k should be chosen such that φ has a minimum


The calculation shows that the root-mean-square error in determining the value of k is equal to

, (20)
where n is the number of measurements.

Let us now consider a slightly more difficult case, when the points must satisfy the formula y = a + bx(a straight line that does not pass through the origin).

The task is to find the best values ​​of a and b from the available set of values ​​x i, y i.

Let us again compose the quadratic form φ, equal to the sum of the squared deviations of points x i, y i from the straight line

and find the values ​​of a and b for which φ has a minimum




The joint solution of these equations gives


The root mean square errors of determination of a and b are equal


.  (24)

When processing measurement results using this method, it is convenient to summarize all the data in a table in which all the amounts included in formulas (19)(24) are preliminarily calculated. The forms of these tables are given in the examples below.

Example 1. The basic equation of the dynamics of rotational motion ε = M/J (a straight line passing through the origin) was studied. At different values ​​of the moment M, the angular acceleration ε of a certain body was measured. It is required to determine the moment of inertia of this body. The results of measurements of the moment of force and angular acceleration are listed in the second and third columns table 5.

Table 5
n M, N m ε, s -1 M 2 M ε ε - kM (ε - kM) 2
1 1.44 0.52 2.0736 0.7488 0.039432 0.001555
2 3.12 1.06 9.7344 3.3072 0.018768 0.000352
3 4.59 1.45 21.0681 6.6555 -0.08181 0.006693
4 5.90 1.92 34.81 11.328 -0.049 0.002401
5 7.45 2.56 55.5025 19.072 0.073725 0.005435
– – 123.1886 41.1115 – 0.016436

Using formula (19) we determine:


To determine the root mean square error, we use formula (20)

0.005775kg-1 · m -2 .

According to formula (18) we have

; .

S J = (2.996 0.005775)/0.3337 = 0.05185 kg m2.

Having set the reliability P = 0.95, using the table of Student coefficients for n = 5, we find t = 2.78 and determine the absolute error ΔJ = 2.78 0.05185 = 0.1441 ≈ 0.2 kg m2.

Let's write the results in the form:

J = (3.0 ± 0.2) kg m2;

Example 2. Let's calculate the temperature coefficient of metal resistance using the least squares method. Resistance depends linearly on temperature

R t = R 0 (1 + α t°) = R 0 + R 0 α t°.

The free term determines the resistance R 0 at a temperature of 0 ° C, and the slope coefficient is the product of the temperature coefficient α and the resistance R 0 .

The results of measurements and calculations are given in the table ( see table 6).

Table 6
n t°, s r, Ohm t-¯t (t-¯t) 2 (t-¯t)r r - bt - a (r - bt - a) 2 .10 -6
1 23 1.242 -62.8333 3948.028 -78.039 0.007673 58.8722
2 59 1.326 -26.8333 720.0278 -35.581 -0.00353 12.4959
3 84 1.386 -1.83333 3.361111 -2.541 -0.00965 93.1506
4 96 1.417 10.16667 103.3611 14.40617 -0.01039 107.898
5 120 1.512 34.16667 1167.361 51.66 0.021141 446.932
6 133 1.520 47.16667 2224.694 71.69333 -0.00524 27.4556
515 8.403 – 8166.833 21.5985 – 746.804
∑/n 85.83333 1.4005 – – – – –

Using formulas (21), (22) we determine

R 0 = ¯ R- α R 0 ¯ t = 1.4005 - 0.002645 85.83333 = 1.1735 Ohm.

Let's find an error in the definition of α. Since , then according to formula (18) we have:


Using formulas (23), (24) we have


0.014126 Ohm.

Having set the reliability to P = 0.95, using the table of Student coefficients for n = 6, we find t = 2.57 and determine the absolute error Δα = 2.57 0.000132 = 0.000338 deg -1.

α = (23 ± 4) 10 -4 hail-1 at P = 0.95.

Example 3. It is required to determine the radius of curvature of the lens using Newton's rings. The radii of Newton's rings r m were measured and the numbers of these rings m were determined. The radii of Newton's rings are related to the radius of curvature of the lens R and the ring number by the equation

r 2 m = mλR - 2d 0 R,

where d 0 the thickness of the gap between the lens and the plane-parallel plate (or the deformation of the lens),

λ wavelength of incident light.

λ = (600 ± 6) nm;
r 2 m = y;
m = x;
λR = b;
-2d 0 R = a,

then the equation will take the form y = a + bx.


The results of measurements and calculations are entered into table 7.

Table 7
n x = m y = r 2, 10 -2 mm 2 m -¯ m (m -¯m) 2 (m -¯ m)y y - bx - a, 10 -4 (y - bx - a) 2 , 10 -6
1 1 6.101 -2.5 6.25 -0.152525 12.01 1.44229
2 2 11.834 -1.5 2.25 -0.17751 -9.6 0.930766
3 3 17.808 -0.5 0.25 -0.08904 -7.2 0.519086
4 4 23.814 0.5 0.25 0.11907 -1.6 0.0243955
5 5 29.812 1.5 2.25 0.44718 3.28 0.107646
6 6 35.760 2.5 6.25 0.894 3.12 0.0975819
21 125.129 – 17.5 1.041175 – 3.12176
∑/n 3.5 20.8548333 – – – – –