The name of the method implies least squares. Adding trend lines to a chart

100 RUR bonus for first order

Select the type of work Diploma work Course work Abstract Master's thesis Practice report Article Report Review Test work Monograph Problem solving Business plan Answers to questions Creative work Essay Drawing Essays Translation Presentations Typing Other Increasing the uniqueness of the text Master's thesis Laboratory work On-line help

Find out the price

The least squares method is a mathematical (mathematical-statistical) technique used to align time series, identify the form of correlation between random variables, etc. It consists in the fact that the function describing a given phenomenon is approximated by a simpler function. Moreover, the latter is selected in such a way that the standard deviation (see Dispersion) of the actual levels of the function at the observed points from the aligned ones is the smallest.

For example, according to available data ( xi,yi) (i = 1, 2, ..., n) such a curve is constructed y = a + bx, at which the minimum sum of squared deviations is achieved

i.e., a function depending on two parameters is minimized: a- segment on the ordinate axis and b- straight line slope.

Equations giving the necessary conditions for minimizing the function S(a,b), are called normal equations. As approximating functions, not only linear (alignment along a straight line), but also quadratic, parabolic, exponential, etc. are used. For an example of aligning a time series along a straight line, see Fig. M.2, where the sum of squared distances ( y 1 – ȳ 1)2 + (y 2 – ȳ 2)2 .... is the smallest, and the resulting straight line best reflects the trend of a dynamic series of observations of a certain indicator over time.

For unbiased OLS estimates, it is necessary and sufficient to fulfill the most important condition of regression analysis: the mathematical expectation of a random error, conditional on the factors, must be equal to zero. This condition, in particular, is met if: 1.the mathematical expectation of random errors is zero, and 2.factors and random errors are independent random variables. The first condition can be considered always fulfilled for models with a constant, since the constant takes on a non-zero mathematical expectation of errors. The second condition - the condition of exogeneity of factors - is fundamental. If this property is not met, then we can assume that almost any estimates will be extremely unsatisfactory: they will not even be consistent (that is, even a very large amount of data does not allow us to obtain high-quality estimates in this case).

The most common method of statistical estimation of parameters of regression equations is the least squares method. This method is based on a number of assumptions regarding the nature of the data and the results of the model. The main ones are a clear division of the original variables into dependent and independent, the uncorrelatedness of the factors included in the equations, the linearity of the relationship, the absence of autocorrelation of the residuals, the equality of their mathematical expectations to zero and constant dispersion.

One of the main hypotheses of OLS is the assumption of equality of variances of deviations ei, i.e. their spread around the average (zero) value of the series should be a stable value. This property is called homoscedasticity. In practice, the variances of deviations are quite often unequal, that is, heteroskedasticity is observed. This may be due to various reasons. For example, there may be errors in the source data. Occasional inaccuracies in the source information, such as errors in the order of numbers, can have a significant impact on the results. Often, a larger spread of deviations єi is observed with large values of the dependent variable (variables). If the data contains a significant error, then, naturally, the deviation of the model value calculated from the erroneous data will also be large. In order to get rid of this error, we need to reduce the contribution of this data to the calculation results, assigning less weight to them than to all others. This idea is implemented in weighted OLS.

Approximation of experimental data is a method based on replacing experimentally obtained data with an analytical function that most closely passes or coincides at nodal points with the original values (data obtained during an experiment or experiment). Currently, there are two ways to define an analytical function:

By constructing an n-degree interpolation polynomial that passes directly through all points a given data array. In this case, the approximating function is presented in the form of: an interpolation polynomial in Lagrange form or an interpolation polynomial in Newton form.

By constructing an n-degree approximating polynomial that passes in the immediate vicinity of points from a given data array. Thus, the approximating function smoothes out all random noise (or errors) that may arise during the experiment: the measured values during the experiment depend on random factors that fluctuate according to their own random laws (measurement or instrument errors, inaccuracy or experimental errors). In this case, the approximating function is determined using the least squares method.

Least square method(in the English literature Ordinary Least Squares, OLS) is a mathematical method based on determining an approximating function that is constructed in the closest proximity to points from a given array of experimental data. The closeness of the original and approximating functions F(x) is determined by a numerical measure, namely: the sum of squared deviations of experimental data from the approximating curve F(x) should be the smallest.

Approximating curve constructed using the least squares method

The least squares method is used:

To solve overdetermined systems of equations when the number of equations exceeds the number of unknowns;

To find a solution in the case of ordinary (not overdetermined) nonlinear systems of equations;

To approximate point values with some approximating function.

The approximating function using the least squares method is determined from the condition of the minimum sum of squared deviations of the calculated approximating function from a given array of experimental data. This criterion of the least squares method is written as the following expression:

The values of the calculated approximating function at the nodal points,

A given array of experimental data at nodal points.

The quadratic criterion has a number of “good” properties, such as differentiability, providing a unique solution to the approximation problem with polynomial approximating functions.

Depending on the conditions of the problem, the approximating function is a polynomial of degree m

The degree of the approximating function does not depend on the number of nodal points, but its dimension must always be less than the dimension (number of points) of a given experimental data array.

∙ If the degree of the approximating function is m=1, then we approximate the tabular function with a straight line (linear regression).

∙ If the degree of the approximating function is m=2, then we approximate the table function with a quadratic parabola (quadratic approximation).

∙ If the degree of the approximating function is m=3, then we approximate the table function with a cubic parabola (cubic approximation).

In the general case, when it is necessary to construct an approximating polynomial of degree m for given table values, the condition for the minimum of the sum of squared deviations over all nodal points is rewritten in the following form:

- unknown coefficients of the approximating polynomial of degree m;

The number of table values specified.

A necessary condition for the existence of a minimum of a function is the equality to zero of its partial derivatives with respect to unknown variables . As a result, we obtain the following system of equations:

Let's transform the resulting linear system of equations: open the brackets and move the free terms to the right side of the expression. As a result, the resulting system of linear algebraic expressions will be written in the following form:

This system of linear algebraic expressions can be rewritten in matrix form:

As a result, a system of linear equations of dimension m+1 was obtained, which consists of m+1 unknowns. This system can be solved using any method for solving linear algebraic equations (for example, the Gaussian method). As a result of the solution, unknown parameters of the approximating function will be found that provide the minimum sum of squared deviations of the approximating function from the original data, i.e. best possible quadratic approximation. It should be remembered that if even one value of the source data changes, all coefficients will change their values, since they are completely determined by the source data.

Approximation of source data by linear dependence

(linear regression)

As an example, let's consider the technique for determining the approximating function, which is specified in the form of a linear dependence. In accordance with the least squares method, the condition for the minimum of the sum of squared deviations is written in the following form:

Coordinates of table nodes;

Unknown coefficients of the approximating function, which is specified as a linear dependence.

A necessary condition for the existence of a minimum of a function is the equality to zero of its partial derivatives with respect to unknown variables. As a result, we obtain the following system of equations:

Let us transform the resulting linear system of equations.

We solve the resulting system of linear equations. The coefficients of the approximating function in analytical form are determined as follows (Cramer’s method):

These coefficients ensure the construction of a linear approximating function in accordance with the criterion of minimizing the sum of squares of the approximating function from the given tabular values (experimental data).

Algorithm for implementing the least squares method

1. Initial data:

An array of experimental data with the number of measurements N is specified

The degree of the approximating polynomial (m) is specified

2. Calculation algorithm:

2.1. The coefficients are determined for constructing a system of equations with dimensions

Coefficients of the system of equations (left side of the equation)

- index of the column number of the square matrix of the system of equations

Free terms of a system of linear equations (right side of the equation)

- index of the row number of the square matrix of the system of equations

2.2. Formation of a system of linear equations with dimension .

2.3. Solving a system of linear equations to determine the unknown coefficients of an approximating polynomial of degree m.

2.4. Determination of the sum of squared deviations of the approximating polynomial from the original values at all nodal points

The found value of the sum of squared deviations is the minimum possible.

Approximation using other functions

It should be noted that when approximating the original data in accordance with the least squares method, the logarithmic function, exponential function and power function are sometimes used as the approximating function.

Logarithmic approximation

Let's consider the case when the approximating function is given by a logarithmic function of the form:

After leveling, we obtain a function of the following form: g (x) = x + 1 3 + 1 .

We can approximate this data using the linear relationship y = a x + b by calculating the corresponding parameters. To do this, we will need to apply the so-called least squares method. You will also need to make a drawing to check which line will best align the experimental data.

Yandex.RTB R-A-339285-1

What exactly is OLS (least squares method)

The main thing we need to do is to find such coefficients of linear dependence at which the value of the function of two variables F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2 will be the smallest. In other words, for certain values of a and b, the sum of the squared deviations of the presented data from the resulting straight line will have a minimum value. This is the meaning of the least squares method. All we need to do to solve the example is to find the extremum of the function of two variables.

How to derive formulas for calculating coefficients

In order to derive formulas for calculating coefficients, you need to create and solve a system of equations with two variables. To do this, we calculate the partial derivatives of the expression F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2 with respect to a and b and equate them to 0.

δ F (a , b) δ a = 0 δ F (a , b) δ b = 0 ⇔ - 2 ∑ i = 1 n (y i - (a x i + b)) x i = 0 - 2 ∑ i = 1 n ( y i - (a x i + b)) = 0 ⇔ a ∑ i = 1 n x i 2 + b ∑ i = 1 n x i = ∑ i = 1 n x i y i a ∑ i = 1 n x i + ∑ i = 1 n b = ∑ i = 1 n y i ⇔ a ∑ i = 1 n x i 2 + b ∑ i = 1 n x i = ∑ i = 1 n x i y i a ∑ i = 1 n x i + n b = ∑ i = 1 n y i

To solve a system of equations, you can use any methods, for example, substitution or Cramer's method. As a result, we should have formulas that can be used to calculate coefficients using the least squares method.

n ∑ i = 1 n x i y i - ∑ i = 1 n x i ∑ i = 1 n y i n ∑ i = 1 n - ∑ i = 1 n x i 2 b = ∑ i = 1 n y i - a ∑ i = 1 n x i n

We have calculated the values of the variables at which the function
F (a , b) = ∑ i = 1 n (y i - (a x i + b)) 2 will take the minimum value. In the third paragraph we will prove why it is exactly like this.

This is the application of the least squares method in practice. Its formula, which is used to find the parameter a, includes ∑ i = 1 n x i, ∑ i = 1 n y i, ∑ i = 1 n x i y i, ∑ i = 1 n x i 2, as well as the parameter
n – it denotes the amount of experimental data. We advise you to calculate each amount separately. The value of the coefficient b is calculated immediately after a.

Let's go back to the original example.

Example 1

Here we have n equal to five. To make it more convenient to calculate the required amounts included in the coefficient formulas, let’s fill out the table.

	i = 1	i=2	i = 3	i=4	i=5	∑ i = 1 5
x i	0	1	2	4	5	12
y i	2 , 1	2 , 4	2 , 6	2 , 8	3	12 , 9
x i y i	0	2 , 4	5 , 2	11 , 2	15	33 , 8
x i 2	0	1	4	16	25	46

Solution

The fourth row includes the data obtained by multiplying the values from the second row by the values of the third for each individual i. The fifth line contains the data from the second, squared. The last column shows the sums of the values of individual rows.

Let's use the least squares method to calculate the coefficients a and b we need. To do this, substitute the required values from the last column and calculate the amounts:

n ∑ i = 1 n x i y i - ∑ i = 1 n x i ∑ i = 1 n y i n ∑ i = 1 n - ∑ i = 1 n x i 2 b = ∑ i = 1 n y i - a ∑ i = 1 n x i n ⇒ a = 5 33, 8 - 12 12, 9 5 46 - 12 2 b = 12, 9 - a 12 5 ⇒ a ≈ 0, 165 b ≈ 2, 184

It turns out that the required approximating straight line will look like y = 0, 165 x + 2, 184. Now we need to determine which line will better approximate the data - g (x) = x + 1 3 + 1 or 0, 165 x + 2, 184. Let's estimate using the least squares method.

To calculate the error, we need to find the sum of squared deviations of the data from the straight lines σ 1 = ∑ i = 1 n (y i - (a x i + b i)) 2 and σ 2 = ∑ i = 1 n (y i - g (x i)) 2, the minimum value will correspond to a more suitable line.

σ 1 = ∑ i = 1 n (y i - (a x i + b i)) 2 = = ∑ i = 1 5 (y i - (0, 165 x i + 2, 184)) 2 ≈ 0, 019 σ 2 = ∑ i = 1 n (y i - g (x i)) 2 = = ∑ i = 1 5 (y i - (x i + 1 3 + 1)) 2 ≈ 0.096

Answer: since σ 1< σ 2 , то прямой, наилучшим образом аппроксимирующей исходные данные, будет
y = 0.165 x + 2.184.

The least squares method is clearly shown in the graphical illustration. The red line marks the straight line g (x) = x + 1 3 + 1, the blue line marks y = 0, 165 x + 2, 184. The original data is indicated by pink dots.

Let us explain why exactly approximations of this type are needed.

They can be used in tasks that require data smoothing, as well as in those where data must be interpolated or extrapolated. For example, in the problem discussed above, one could find the value of the observed quantity y at x = 3 or at x = 6. We have devoted a separate article to such examples.

Proof of the OLS method

In order for the function to take a minimum value when a and b are calculated, it is necessary that at a given point the matrix of the quadratic form of the differential of the function of the form F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2 is positive definite. Let's show you how it should look.

Example 2

We have a second order differential of the following form:

d 2 F (a ; b) = δ 2 F (a ; b) δ a 2 d 2 a + 2 δ 2 F (a ; b) δ a δ b d a d b + δ 2 F (a ; b) δ b 2 d 2 b

Solution

δ 2 F (a ; b) δ a 2 = δ δ F (a ; b) δ a δ a = = δ - 2 ∑ i = 1 n (y i - (a x i + b)) x i δ a = 2 ∑ i = 1 n (x i) 2 δ 2 F (a; b) δ a δ b = δ δ F (a; b) δ a δ b = = δ - 2 ∑ i = 1 n (y i - (a x i + b) ) x i δ b = 2 ∑ i = 1 n x i δ 2 F (a ; b) δ b 2 = δ δ F (a ; b) δ b δ b = δ - 2 ∑ i = 1 n (y i - (a x i + b)) δ b = 2 ∑ i = 1 n (1) = 2 n

In other words, we can write it like this: d 2 F (a ; b) = 2 ∑ i = 1 n (x i) 2 d 2 a + 2 2 ∑ x i i = 1 n d a d b + (2 n) d 2 b.

We obtained a matrix of the quadratic form M = 2 ∑ i = 1 n (x i) 2 2 ∑ i = 1 n x i 2 ∑ i = 1 n x i 2 n .

In this case, the values of individual elements will not change depending on a and b . Is this matrix positive definite? To answer this question, let's check whether its angular minors are positive.

We calculate the angular minor of the first order: 2 ∑ i = 1 n (x i) 2 > 0 . Since the points x i do not coincide, the inequality is strict. We will keep this in mind in further calculations.

We calculate the second order angular minor:

d e t (M) = 2 ∑ i = 1 n (x i) 2 2 ∑ i = 1 n x i 2 ∑ i = 1 n x i 2 n = 4 n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2

After this, we proceed to prove the inequality n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 using mathematical induction.

Let's check whether this inequality is valid for an arbitrary n. Let's take 2 and calculate:

2 ∑ i = 1 2 (x i) 2 - ∑ i = 1 2 x i 2 = 2 x 1 2 + x 2 2 - x 1 + x 2 2 = = x 1 2 - 2 x 1 x 2 + x 2 2 = x 1 + x 2 2 > 0

We have obtained a correct equality (if the values x 1 and x 2 do not coincide).

Let us make the assumption that this inequality will be true for n, i.e. n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 – true.
Now we will prove the validity for n + 1, i.e. that (n + 1) ∑ i = 1 n + 1 (x i) 2 - ∑ i = 1 n + 1 x i 2 > 0, if n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 .

We calculate:

(n + 1) ∑ i = 1 n + 1 (x i) 2 - ∑ i = 1 n + 1 x i 2 = = (n + 1) ∑ i = 1 n (x i) 2 + x n + 1 2 - ∑ i = 1 n x i + x n + 1 2 = = n ∑ i = 1 n (x i) 2 + n x n + 1 2 + ∑ i = 1 n (x i) 2 + x n + 1 2 - - ∑ i = 1 n x i 2 + 2 x n + 1 ∑ i = 1 n x i + x n + 1 2 = = ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + n x n + 1 2 - x n + 1 ∑ i = 1 n x i + ∑ i = 1 n (x i) 2 = = ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + x n + 1 2 - 2 x n + 1 x 1 + x 1 2 + + x n + 1 2 - 2 x n + 1 x 2 + x 2 2 + . . . + x n + 1 2 - 2 x n + 1 x 1 + x n 2 = = n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + + (x n + 1 - x 1) 2 + (x n + 1 - x 2) 2 + . . . + (x n - 1 - x n) 2 > 0

The expression enclosed in curly braces will be greater than 0 (based on what we assumed in step 2), and the remaining terms will be greater than 0, since they are all squares of numbers. We have proven the inequality.

Answer: the found a and b will correspond to the smallest value of the function F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2, which means that they are the required parameters of the least squares method (LSM).

If you notice an error in the text, please highlight it and press Ctrl+Enter

Ordinary Least Squares (OLS) method- a mathematical method used to solve various problems, based on minimizing the sum of squared deviations of certain functions from the desired variables. It can be used to “solve” overdetermined systems of equations (when the number of equations exceeds the number of unknowns), to find solutions in the case of ordinary (not overdetermined) nonlinear systems of equations, to approximate point values of some function. OLS is one of the basic methods of regression analysis for estimating unknown parameters of regression models from sample data.

Encyclopedic YouTube

1 / 5

✪ Least squares method. Subject

✪ Mitin I.V. - Processing of physical results. experiment - Least squares method (Lecture 4)

✪ Least squares method, lesson 1/2. Linear function

✪ Econometrics. Lecture 5. Least squares method

✪ Least squares method. Answers

Subtitles

Story

Until the beginning of the 19th century. scientists did not have certain rules for solving a system of equations in which the number of unknowns is less than the number of equations; Until that time, private techniques were used that depended on the type of equations and on the wit of the calculators, and therefore different calculators, based on the same observational data, came to different conclusions. Gauss (1795) was the first to use the method, and Legendre (1805) independently discovered and published it under its modern name (French. Méthode des moindres quarrés) . Laplace connected the method with probability theory, and the American mathematician Adrain (1808) considered its probability-theoretic applications. The method was widespread and improved by further research by Encke, Bessel, Hansen and others.

The essence of the least squares method

Let x (\displaystyle x)- kit n (\displaystyle n) unknown variables (parameters), f i (x) (\displaystyle f_(i)(x)), , m > n (\displaystyle m>n)- a set of functions from this set of variables. The task is to select such values x (\displaystyle x), so that the values of these functions are as close as possible to certain values y i (\displaystyle y_(i)). Essentially we are talking about the “solution” of an overdetermined system of equations f i (x) = y i (\displaystyle f_(i)(x)=y_(i)), i = 1 , … , m (\displaystyle i=1,\ldots ,m) in the indicated sense of maximum proximity of the left and right parts of the system. The essence of the least squares method is to select as a “proximity measure” the sum of squared deviations of the left and right sides | f i (x) − y i | (\displaystyle |f_(i)(x)-y_(i)|). Thus, the essence of MNC can be expressed as follows:

∑ i e i 2 = ∑ i (y i − f i (x)) 2 → min x (\displaystyle \sum _(i)e_(i)^(2)=\sum _(i)(y_(i)-f_( i)(x))^(2)\rightarrow \min _(x)).

If the system of equations has a solution, then the minimum of the sum of squares will be equal to zero and exact solutions to the system of equations can be found analytically or, for example, using various numerical optimization methods. If the system is overdetermined, that is, loosely speaking, the number of independent equations is greater than the number of desired variables, then the system does not have an exact solution and the least squares method allows us to find some “optimal” vector x (\displaystyle x) in the sense of maximum proximity of vectors y (\displaystyle y) And f (x) (\displaystyle f(x)) or maximum proximity of the deviation vector e (\displaystyle e) to zero (closeness is understood in the sense of Euclidean distance).

Example - system of linear equations

In particular, the method of least squares can be used to "solve" a system of linear equations

A x = b (\displaystyle Ax=b),

Where A (\displaystyle A) rectangular size matrix m × n , m > n (\displaystyle m\times n,m>n)(i.e. the number of rows of matrix A is greater than the number of sought variables).

In the general case, such a system of equations has no solution. Therefore, this system can be “solved” only in the sense of choosing such a vector x (\displaystyle x) to minimize the "distance" between vectors A x (\displaystyle Ax) And b (\displaystyle b). To do this, you can apply the criterion of minimizing the sum of squares of the differences between the left and right sides of the system equations, that is (A x − b) T (A x − b) → min (\displaystyle (Ax-b)^(T)(Ax-b)\rightarrow \min ). It is easy to show that solving this minimization problem leads to solving the following system of equations

A T A x = A T b ⇒ x = (A T A) − 1 A T b (\displaystyle A^(T)Ax=A^(T)b\Rightarrow x=(A^(T)A)^(-1)A^ (T)b).

OLS in regression analysis (data approximation)

Let there be n (\displaystyle n) values of some variable y (\displaystyle y)(this could be the results of observations, experiments, etc.) and related variables x (\displaystyle x). The challenge is to ensure that the relationship between y (\displaystyle y) And x (\displaystyle x) approximate by some function known to within some unknown parameters b (\displaystyle b), that is, actually find the best values of the parameters b (\displaystyle b), maximally approximating the values f (x , b) (\displaystyle f(x,b)) to actual values y (\displaystyle y). In fact, this comes down to the case of “solving” an overdetermined system of equations with respect to b (\displaystyle b):

F (x t , b) = y t , t = 1 , … , n (\displaystyle f(x_(t),b)=y_(t),t=1,\ldots ,n).

In regression analysis and in particular in econometrics, probabilistic models of dependence between variables are used

Y t = f (x t , b) + ε t (\displaystyle y_(t)=f(x_(t),b)+\varepsilon _(t)),

Where ε t (\displaystyle \varepsilon _(t))- so called random errors models.

Accordingly, deviations of the observed values y (\displaystyle y) from model f (x , b) (\displaystyle f(x,b)) is already assumed in the model itself. The essence of the least squares method (ordinary, classical) is to find such parameters b (\displaystyle b), at which the sum of squared deviations (errors, for regression models they are often called regression residuals) e t (\displaystyle e_(t)) will be minimal:

b ^ O L S = arg ⁡ min b R S S (b) (\displaystyle (\hat (b))_(OLS)=\arg \min _(b)RSS(b)),

Where R S S (\displaystyle RSS)- English Residual Sum of Squares is defined as:

R S S (b) = e T e = ∑ t = 1 n e t 2 = ∑ t = 1 n (y t − f (x t , b)) 2 (\displaystyle RSS(b)=e^(T)e=\sum _ (t=1)^(n)e_(t)^(2)=\sum _(t=1)^(n)(y_(t)-f(x_(t),b))^(2) ).

In the general case, this problem can be solved by numerical optimization (minimization) methods. In this case they talk about nonlinear least squares(NLS or NLLS - English Non-Linear Least Squares). In many cases it is possible to obtain an analytical solution. To solve the minimization problem, it is necessary to find stationary points of the function R S S (b) (\displaystyle RSS(b)), differentiating it according to unknown parameters b (\displaystyle b), equating the derivatives to zero and solving the resulting system of equations:

∑ t = 1 n (y t − f (x t , b)) ∂ f (x t , b) ∂ b = 0 (\displaystyle \sum _(t=1)^(n)(y_(t)-f(x_ (t),b))(\frac (\partial f(x_(t),b))(\partial b))=0).

OLS in the case of linear regression

Let the regression dependence be linear:

y t = ∑ j = 1 k b j x t j + ε = x t T b + ε t (\displaystyle y_(t)=\sum _(j=1)^(k)b_(j)x_(tj)+\varepsilon =x_( t)^(T)b+\varepsilon _(t)).

Let y is the column vector of observations of the variable being explained, and X (\displaystyle X)- This (n × k) (\displaystyle ((n\times k)))-matrix of factor observations (rows of the matrix are vectors of factor values in a given observation, columns are a vector of values of a given factor in all observations). The matrix representation of the linear model has the form:

y = X b + ε (\displaystyle y=Xb+\varepsilon ).

Then the vector of estimates of the explained variable and the vector of regression residuals will be equal

y ^ = X b , e = y − y ^ = y − X b (\displaystyle (\hat (y))=Xb,\quad e=y-(\hat (y))=y-Xb).

Accordingly, the sum of squares of the regression residuals will be equal to

R S S = e T e = (y − X b) T (y − X b) (\displaystyle RSS=e^(T)e=(y-Xb)^(T)(y-Xb)).

Differentiating this function with respect to the vector of parameters b (\displaystyle b) and equating the derivatives to zero, we obtain a system of equations (in matrix form):

(X T X) b = X T y (\displaystyle (X^(T)X)b=X^(T)y).

In deciphered matrix form, this system of equations looks like this:

(∑ x t 1 2 ∑ x t 1 x t 2 ∑ x t 1 x t 3 … ∑ x t 1 x t k ∑ x t 2 x t 1 ∑ x t 2 2 ∑ x t 2 x t 3 … ∑ x t 2 x t k ∑ x t 3 x t 1 ∑ x t 3 x t 2 ∑ x t 3 2 … ∑ x t 3 x t k ⋮ ⋮ ⋮ ⋱ ⋮ ∑ x t k x t 1 ∑ x t k x t 2 ∑ x t k x t 3 … ∑ x t k 2) (b 1 b 2 b 3 ⋮ b k) = (∑ x t 1 y t ∑ x t 2 y t ∑ x t 3 y t ⋮ ∑ x t k y t) , (\displaystyle (\begin(pmatrix)\sum x_(t1)^(2)&\sum x_(t1)x_(t2)&\sum x_(t1)x_(t3)&\ldots &\sum x_(t1)x_(tk)\\\sum x_(t2)x_(t1)&\sum x_(t2)^(2)&\sum x_(t2)x_(t3)&\ldots &\ sum x_(t2)x_(tk)\\\sum x_(t3)x_(t1)&\sum x_(t3)x_(t2)&\sum x_(t3)^(2)&\ldots &\sum x_ (t3)x_(tk)\\\vdots &\vdots &\vdots &\ddots &\vdots \\\sum x_(tk)x_(t1)&\sum x_(tk)x_(t2)&\sum x_ (tk)x_(t3)&\ldots &\sum x_(tk)^(2)\\\end(pmatrix))(\begin(pmatrix)b_(1)\\b_(2)\\b_(3 )\\\vdots \\b_(k)\\\end(pmatrix))=(\begin(pmatrix)\sum x_(t1)y_(t)\\\sum x_(t2)y_(t)\\ \sum x_(t3)y_(t)\\\vdots \\\sum x_(tk)y_(t)\\\end(pmatrix)),) where all sums are taken over all valid values t (\displaystyle t).

If a constant is included in the model (as usual), then x t 1 = 1 (\displaystyle x_(t1)=1) in front of everyone t (\displaystyle t), therefore, in the upper left corner of the matrix of the system of equations there is the number of observations n (\displaystyle n), and in the remaining elements of the first row and first column - simply the sums of the variable values: ∑ x t j (\displaystyle \sum x_(tj)) and the first element of the right side of the system is ∑ y t (\displaystyle \sum y_(t)).

The solution of this system of equations gives the general formula for least squares estimates for a linear model:

b ^ O L S = (X T X) − 1 X T y = (1 n X T X) − 1 1 n X T y = V x − 1 C x y (\displaystyle (\hat (b))_(OLS)=(X^(T )X)^(-1)X^(T)y=\left((\frac (1)(n))X^(T)X\right)^(-1)(\frac (1)(n ))X^(T)y=V_(x)^(-1)C_(xy)).

For analytical purposes, the last representation of this formula turns out to be useful (in the system of equations when dividing by n, arithmetic means appear instead of sums). If in a regression model the data centered, then in this representation the first matrix has the meaning of a sample covariance matrix of factors, and the second is a vector of covariances of factors with the dependent variable. If in addition the data is also normalized to MSE (that is, ultimately standardized), then the first matrix has the meaning of a sample correlation matrix of factors, the second vector - a vector of sample correlations of factors with the dependent variable.

An important property of OLS estimates for models with constant- the line of the constructed regression passes through the center of gravity of the sample data, that is, the equality is satisfied:

y ¯ = b 1 ^ + ∑ j = 2 k b ^ j x ¯ j (\displaystyle (\bar (y))=(\hat (b_(1)))+\sum _(j=2)^(k) (\hat (b))_(j)(\bar (x))_(j)).

In particular, in the extreme case, when the only regressor is a constant, we find that the OLS estimate of the only parameter (the constant itself) is equal to the average value of the explained variable. That is, the arithmetic mean, known for its good properties from the laws of large numbers, is also an least squares estimate - it satisfies the criterion of the minimum sum of squared deviations from it.

The simplest special cases

In the case of paired linear regression y t = a + b x t + ε t (\displaystyle y_(t)=a+bx_(t)+\varepsilon _(t)), when the linear dependence of one variable on another is estimated, the calculation formulas are simplified (you can do without matrix algebra). The system of equations has the form:

(1 x ¯ x ¯ x 2 ¯) (a b) = (y ¯ x y ¯) (\displaystyle (\begin(pmatrix)1&(\bar (x))\\(\bar (x))&(\bar (x^(2)))\\\end(pmatrix))(\begin(pmatrix)a\\b\\\end(pmatrix))=(\begin(pmatrix)(\bar (y))\\ (\overline (xy))\\\end(pmatrix))).

From here it is easy to find coefficient estimates:

( b ^ = Cov ⁡ (x , y) Var ⁡ (x) = x y ¯ − x ¯ y ¯ x 2 ¯ − x ¯ 2 , a ^ = y ¯ − b x ¯ . (\displaystyle (\begin(cases) (\hat (b))=(\frac (\mathop (\textrm (Cov)) (x,y))(\mathop (\textrm (Var)) (x)))=(\frac ((\overline (xy))-(\bar (x))(\bar (y)))((\overline (x^(2)))-(\overline (x))^(2))),\\( \hat (a))=(\bar (y))-b(\bar (x)).\end(cases)))

Despite the fact that in the general case models with a constant are preferable, in some cases it is known from theoretical considerations that a constant a (\displaystyle a) must be equal to zero. For example, in physics the relationship between voltage and current is U = I ⋅ R (\displaystyle U=I\cdot R); When measuring voltage and current, it is necessary to estimate the resistance. In this case, we are talking about the model y = b x (\displaystyle y=bx). In this case, instead of a system of equations we have a single equation

(∑ x t 2) b = ∑ x t y t (\displaystyle \left(\sum x_(t)^(2)\right)b=\sum x_(t)y_(t)).

Therefore, the formula for estimating the single coefficient has the form

B ^ = ∑ t = 1 n x t y t ∑ t = 1 n x t 2 = x y ¯ x 2 ¯ (\displaystyle (\hat (b))=(\frac (\sum _(t=1)^(n)x_(t )y_(t))(\sum _(t=1)^(n)x_(t)^(2)))=(\frac (\overline (xy))(\overline (x^(2)) ))).

The case of a polynomial model

If the data is fit by a polynomial regression function of one variable f (x) = b 0 + ∑ i = 1 k b i x i (\displaystyle f(x)=b_(0)+\sum \limits _(i=1)^(k)b_(i)x^(i)), then, perceiving degrees x i (\displaystyle x^(i)) as independent factors for each i (\displaystyle i) it is possible to estimate the model parameters based on the general formula for estimating the parameters of a linear model. To do this, it is enough to take into account in the general formula that with such an interpretation x t i x t j = x t i x t j = x t i + j (\displaystyle x_(ti)x_(tj)=x_(t)^(i)x_(t)^(j)=x_(t)^(i+j)) And x t j y t = x t j y t (\displaystyle x_(tj)y_(t)=x_(t)^(j)y_(t)). Consequently, the matrix equations in this case will take the form:

(n ∑ n x t … ∑ n x t k ∑ n x t ∑ n x i 2 … ∑ m x i k + 1 ⋮ ⋮ ⋱ ⋮ ∑ n x t k ∑ n x t k + 1 … ∑ n x t 2 k) [ b 0 b 1 ⋮ b k ] = [ ∑ n y t ∑ n x t y t ⋮ ∑ n x t k y t ] . (\displaystyle (\begin(pmatrix)n&\sum \limits _(n)x_(t)&\ldots &\sum \limits _(n)x_(t)^(k)\\\sum \limits _( n)x_(t)&\sum \limits _(n)x_(i)^(2)&\ldots &\sum \limits _(m)x_(i)^(k+1)\\\vdots & \vdots &\ddots &\vdots \\\sum \limits _(n)x_(t)^(k)&\sum \limits _(n)x_(t)^(k+1)&\ldots &\ sum \limits _(n)x_(t)^(2k)\end(pmatrix))(\begin(bmatrix)b_(0)\\b_(1)\\\vdots \\b_(k)\end( bmatrix))=(\begin(bmatrix)\sum \limits _(n)y_(t)\\\sum \limits _(n)x_(t)y_(t)\\\vdots \\\sum \limits _(n)x_(t)^(k)y_(t)\end(bmatrix)).)

Statistical properties of OLS estimators

First of all, we note that for linear models, OLS estimates are linear estimates, as follows from the above formula. For unbiased OLS estimates, it is necessary and sufficient to fulfill the most important condition of regression analysis: the mathematical expectation of a random error, conditional on the factors, must be equal to zero. This condition, in particular, is satisfied if

the mathematical expectation of random errors is zero, and
factors and random errors are independent random variables.

The second condition - the condition of exogeneity of factors - is fundamental. If this property is not met, then we can assume that almost any estimates will be extremely unsatisfactory: they will not even be consistent (that is, even a very large amount of data does not allow us to obtain high-quality estimates in this case). In the classical case, a stronger assumption is made about the determinism of the factors, as opposed to a random error, which automatically means that the exogeneity condition is met. In the general case, for the consistency of the estimates, it is sufficient to satisfy the exogeneity condition together with the convergence of the matrix V x (\displaystyle V_(x)) to some non-singular matrix as the sample size increases to infinity.

In order for, in addition to consistency and unbiasedness, estimates of (ordinary) least squares to be also effective (the best in the class of linear unbiased estimates), additional properties of random error must be met:

These assumptions can be formulated for the covariance matrix of the random error vector V (ε) = σ 2 I (\displaystyle V(\varepsilon)=\sigma ^(2)I).

A linear model that satisfies these conditions is called classical. OLS estimates for classical linear regression are unbiased, consistent and the most effective estimates in the class of all linear unbiased estimates (in the English literature the abbreviation is sometimes used BLUE (Best Linear Unbiased Estimator) - the best linear unbiased estimate; In Russian literature, the Gauss-Markov theorem is more often cited). As is easy to show, the covariance matrix of the vector of coefficient estimates will be equal to:

V (b ^ O L S) = σ 2 (X T X) − 1 (\displaystyle V((\hat (b))_(OLS))=\sigma ^(2)(X^(T)X)^(-1 )).

Efficiency means that this covariance matrix is “minimal” (any linear combination of coefficients, and in particular the coefficients themselves, have minimal variance), that is, in the class of linear unbiased estimators, OLS estimators are best. The diagonal elements of this matrix - the variances of coefficient estimates - are important parameters of the quality of the obtained estimates. However, it is not possible to calculate the covariance matrix because the random error variance is unknown. It can be proven that an unbiased and consistent (for a classical linear model) estimate of the variance of random errors is the quantity:

S 2 = R S S / (n − k) (\displaystyle s^(2)=RSS/(n-k)).

Substituting this value into the formula for the covariance matrix, we obtain an estimate of the covariance matrix. The resulting estimates are also unbiased and consistent. It is also important that the estimate of the error variance (and hence the variance of the coefficients) and the estimates of the model parameters are independent random variables, which makes it possible to obtain test statistics for testing hypotheses about the model coefficients.

It should be noted that if the classical assumptions are not met, OLS parameter estimates are not the most efficient and, where W (\displaystyle W) is some symmetric positive definite weight matrix. Conventional least squares is a special case of this approach, where the weight matrix is proportional to the identity matrix. As is known, for symmetric matrices (or operators) there is an expansion W = P T P (\displaystyle W=P^(T)P). Therefore, the specified functional can be represented as follows e T P T P e = (P e) T P e = e ∗ T e ∗ (\displaystyle e^(T)P^(T)Pe=(Pe)^(T)Pe=e_(*)^(T)e_( *)), that is, this functional can be represented as the sum of the squares of some transformed “remainders”. Thus, we can distinguish a class of least squares methods - LS methods (Least Squares).

It has been proven (Aitken’s theorem) that for a generalized linear regression model (in which no restrictions are imposed on the covariance matrix of random errors), the most effective (in the class of linear unbiased estimates) are the so-called estimates. generalized Least Squares (GLS - Generalized Least Squares)- LS method with a weight matrix equal to the inverse covariance matrix of random errors: W = V ε − 1 (\displaystyle W=V_(\varepsilon )^(-1)).

It can be shown that the formula for GLS estimates of the parameters of a linear model has the form

B ^ G L S = (X T V − 1 X) − 1 X T V − 1 y (\displaystyle (\hat (b))_(GLS)=(X^(T)V^(-1)X)^(-1) X^(T)V^(-1)y).

The covariance matrix of these estimates will accordingly be equal to

V (b ^ G L S) = (X T V − 1 X) − 1 (\displaystyle V((\hat (b))_(GLS))=(X^(T)V^(-1)X)^(- 1)).

In fact, the essence of OLS lies in a certain (linear) transformation (P) of the original data and the application of ordinary OLS to the transformed data. The purpose of this transformation is that for the transformed data, the random errors already satisfy the classical assumptions.

Weighted OLS

In the case of a diagonal weight matrix (and therefore a covariance matrix of random errors), we have the so-called weighted Least Squares (WLS). In this case, the weighted sum of squares of the model residuals is minimized, that is, each observation receives a “weight” that is inversely proportional to the variance of the random error in this observation: e T W e = ∑ t = 1 n e t 2 σ t 2 (\displaystyle e^(T)We=\sum _(t=1)^(n)(\frac (e_(t)^(2))(\ sigma_(t)^(2)))). In fact, the data are transformed by weighting the observations (dividing by an amount proportional to the estimated standard deviation of the random errors), and ordinary OLS is applied to the weighted data.

ISBN 978-5-7749-0473-0 .

Econometrics. Textbook / Ed. Eliseeva I.I. - 2nd ed. - M.: Finance and Statistics, 2006. - 576 p. - ISBN 5-279-02786-3.

Alexandrova N.V. History of mathematical terms, concepts, notations: dictionary-reference book. - 3rd ed. - M.: LKI, 2008. - 248 p. - ISBN 978-5-382-00839-4. I.V. Mitin, Rusakov V.S. Analysis and processing of experimental data - 5th edition - 24 p.

Least squares method (LSM) Ordinary Least Squares, OLS) -- a mathematical method used to solve various problems, based on minimizing the sum of squared deviations of certain functions from the desired variables. It can be used to “solve” overdetermined systems of equations (when the number of equations exceeds the number of unknowns), to find a solution in the case of ordinary (not overdetermined) nonlinear systems of equations, to approximate point values with some function. OLS is one of the basic methods of regression analysis for estimating unknown parameters of regression models from sample data.

The essence of the least squares method

Let be a set of unknown variables (parameters), and let be a set of functions from this set of variables. The task is to select such values of x that the values of these functions are as close as possible to certain values. Essentially, we are talking about the “solution” of an overdetermined system of equations in the indicated sense of maximum proximity of the left and right parts of the system. The essence of the least squares method is to select as a “proximity measure” the sum of squared deviations of the left and right sides - . Thus, the essence of MNC can be expressed as follows:

If the system of equations has a solution, then the minimum of the sum of squares will be equal to zero and exact solutions to the system of equations can be found analytically or, for example, using various numerical optimization methods. If the system is overdetermined, that is, loosely speaking, the number of independent equations is greater than the number of desired variables, then the system does not have an exact solution and the least squares method allows one to find some “optimal” vector in the sense of the maximum proximity of vectors and or the maximum proximity of the deviation vector to zero (proximity understood in the sense of Euclidean distance).

Example - system of linear equations

In particular, the method of least squares can be used to "solve" a system of linear equations

where the matrix is not square, but rectangular in size (more precisely, the rank of matrix A is greater than the number of sought variables).

In the general case, such a system of equations has no solution. Therefore, this system can be “solved” only in the sense of choosing such a vector as to minimize the “distance” between the vectors and. To do this, you can apply the criterion of minimizing the sum of squares of the differences between the left and right sides of the system equations, that is. It is easy to show that solving this minimization problem leads to solving the following system of equations

Using the pseudoinversion operator, the solution can be rewritten as follows:

where is the pseudo-inverse matrix for.

This problem can also be “solved” using the so-called weighted least squares method (see below), when different equations of the system receive different weights for theoretical reasons.

A strict justification and establishment of the boundaries of the substantive applicability of the method were given by A. A. Markov and A. N. Kolmogorov.

OLS in regression analysis (data approximation)[edit | edit wiki text] Let there be values of some variable (this can be the results of observations, experiments, etc.) and the corresponding variables. The task is to approximate the relationship between and by some function known to within some unknown parameters, that is, to actually find the best parameter values that bring the values as close as possible to the actual values. In fact, this comes down to the case of “solving” an overdetermined system of equations with respect to:

In regression analysis and in particular in econometrics, probabilistic models of dependence between variables are used

where are the so-called random errors of the model.

Accordingly, deviations of the observed values from the model ones are assumed in the model itself. The essence of the least squares method (ordinary, classical) is to find such parameters for which the sum of squared deviations (errors, for regression models they are often called regression residuals) will be minimal:

where - English Residual Sum of Squares is defined as:

In the general case, this problem can be solved by numerical optimization (minimization) methods. In this case, they talk about non-linear least squares (NLS or NLLS - Non-Linear Least Squares). In many cases it is possible to obtain an analytical solution. To solve the minimization problem, it is necessary to find stationary points of the function by differentiating it with respect to unknown parameters, equating the derivatives to zero and solving the resulting system of equations:

OLS in the case of linear regression[edit | edit wiki text]

Let the regression dependence be linear:

Let y be a column vector of observations of the explained variable, and let y be a matrix of factor observations (the rows of the matrix are vectors of factor values in a given observation, and the columns are a vector of values of a given factor in all observations). The matrix representation of the linear model is:

Then the vector of estimates of the explained variable and the vector of regression residuals will be equal

Accordingly, the sum of squares of the regression residuals will be equal to

Differentiating this function with respect to the vector of parameters and equating the derivatives to zero, we obtain a system of equations (in matrix form):

In deciphered matrix form, this system of equations looks like this:

where all sums are taken over all valid values.

If a constant is included in the model (as usual), then for all, therefore in the upper left corner of the matrix of the system of equations there is the number of observations, and in the remaining elements of the first row and first column there are simply the sums of the values of the variables: and the first element of the right side of the system is .

The solution of this system of equations gives the general formula for least squares estimates for a linear model:

For analytical purposes, the last representation of this formula turns out to be useful (in the system of equations when dividing by n, arithmetic means appear instead of sums). If in a regression model the data is centered, then in this representation the first matrix has the meaning of a sample covariance matrix of factors, and the second is a vector of covariances of factors with the dependent variable. If, in addition, the data are also normalized to standard deviation (that is, ultimately standardized), then the first matrix has the meaning of a sample correlation matrix of factors, the second vector - a vector of sample correlations of factors with the dependent variable.

An important property of OLS estimates for models with a constant is that the constructed regression line passes through the center of gravity of the sample data, that is, the equality holds:

The simplest special cases[edit | edit wiki text]

In the case of paired linear regression, when the linear dependence of one variable on another is estimated, the calculation formulas are simplified (you can do without matrix algebra). The system of equations has the form:

From here it is easy to find coefficient estimates:

Although in general models with a constant are preferable, in some cases it is known from theoretical considerations that the constant should be equal to zero. For example, in physics the relationship between voltage and current is; When measuring voltage and current, it is necessary to estimate the resistance. In this case, we are talking about a model. In this case, instead of a system of equations we have a single equation

Therefore, the formula for estimating the single coefficient has the form

Statistical properties of OLS estimates[edit | edit wiki text]

The first condition can be considered always satisfied for models with a constant, since the constant takes on a non-zero mathematical expectation of errors (therefore, models with a constant are generally preferable). least square regression covariance

The second condition - the condition of exogeneity of factors - is fundamental. If this property is not met, then we can assume that almost any estimates will be extremely unsatisfactory: they will not even be consistent (that is, even a very large amount of data does not allow us to obtain high-quality estimates in this case). In the classical case, a stronger assumption is made about the determinism of the factors, as opposed to a random error, which automatically means that the exogeneity condition is met. In the general case, for the consistency of the estimates, it is sufficient to satisfy the exogeneity condition together with the convergence of the matrix to some non-singular matrix as the sample size increases to infinity.

In order for, in addition to consistency and unbiasedness, estimates of (ordinary) LSM to be also effective (the best in the class of linear unbiased estimates), additional properties of random error must be met:

Constant (identical) variance of random errors in all observations (no heteroscedasticity):

Lack of correlation (autocorrelation) of random errors in different observations with each other

These assumptions can be formulated for the covariance matrix of the random error vector

A linear model that satisfies these conditions is called classical. OLS estimates for classical linear regression are unbiased, consistent and the most effective estimates in the class of all linear unbiased estimates (in the English literature the abbreviation BLUE (Best Linear Unbiased Estimator) is sometimes used - the best linear unbiased estimate; in the domestic literature the Gauss theorem is more often given - Markov). As is easy to show, the covariance matrix of the vector of coefficient estimates will be equal to:

Efficiency means that this covariance matrix is “minimal” (any linear combination of coefficients, and in particular the coefficients themselves, have minimal variance), that is, in the class of linear unbiased estimators, OLS estimators are best. The diagonal elements of this matrix—variances of coefficient estimates—are important parameters of the quality of the obtained estimates. However, it is not possible to calculate the covariance matrix because the random error variance is unknown. It can be proven that an unbiased and consistent (for a classical linear model) estimate of the variance of random errors is the quantity:

It should be noted that if the classical assumptions are not met, OLS estimates of parameters are not the most efficient estimates (while remaining unbiased and consistent). However, the estimate of the covariance matrix deteriorates even more - it becomes biased and untenable. This means that statistical conclusions about the quality of the constructed model in this case can be extremely unreliable. One of the options for solving the last problem is to use special estimates of the covariance matrix, which are consistent with violations of classical assumptions (standard errors in the White form and standard errors in the Newey-West form). Another approach is to use the so-called generalized least squares method.

Generalized OLS[edit | edit wiki text]

Main article: Generalized least squares

The least squares method allows for broad generalization. Instead of minimizing the sum of squares of the residuals, one can minimize some positive definite quadratic form of the vector of residuals, where is some symmetric positive definite weight matrix. Conventional least squares is a special case of this approach, where the weight matrix is proportional to the identity matrix. As is known from the theory of symmetric matrices (or operators), there is a decomposition for such matrices. Therefore, the specified functional can be represented as follows

that is, this functional can be represented as the sum of the squares of some transformed “remainders”. Thus, we can distinguish a class of least squares methods - LS methods (Least Squares).

It has been proven (Aitken’s theorem) that for a generalized linear regression model (in which no restrictions are imposed on the covariance matrix of random errors), the most effective (in the class of linear unbiased estimates) are the so-called estimates. generalized least squares (GLS - Generalized Least Squares) - LS method with a weight matrix equal to the inverse covariance matrix of random errors: .

It can be shown that the formula for GLS estimates of the parameters of a linear model has the form

The covariance matrix of these estimates will accordingly be equal to

Weighted OLS[edit | edit wiki text]

In the case of a diagonal weight matrix (and therefore a covariance matrix of random errors), we have the so-called weighted least squares (WLS - Weighted Least Squares). In this case, the weighted sum of squares of the model residuals is minimized, that is, each observation receives a “weight” that is inversely proportional to the variance of the random error in this observation:

In fact, the data are transformed by weighting the observations (dividing by an amount proportional to the estimated standard deviation of the random errors), and ordinary OLS is applied to the weighted data.