Manual least squares method. Linear regression

Least squares method

Least squares method ( OLS, OLS, Ordinary Least Squares) - one of the basic methods of regression analysis for estimating unknown parameters of regression models using sample data. The method is based on minimizing the sum of squares of regression residuals.

It should be noted that the least squares method itself can be called a method for solving a problem in any area if the solution is or satisfies some criterion for minimizing the sum of squares of some functions of the required variables. Therefore, the least squares method can also be used for an approximate representation (approximation) of a given function by other (simpler) functions, when finding a set of quantities that satisfy equations or constraints, the number of which exceeds the number of these quantities, etc.

The essence of MNC

Let some (parametric) model of a probabilistic (regression) relationship between the (explained) variable be given y and many factors (explanatory variables) x

where is the vector of unknown model parameters

- random model error.

Let there also be sample observations of the values of these variables. Let be the observation number (). Then are the values of the variables in the th observation. Then, for given values of parameters b, it is possible to calculate the theoretical (model) values of the explained variable y:

The size of the residuals depends on the values of the parameters b.

The essence of the least squares method (ordinary, classical) is to find parameters b for which the sum of the squares of the residuals (eng. Residual Sum of Squares) will be minimal:

In the general case, this problem can be solved by numerical optimization (minimization) methods. In this case they talk about nonlinear least squares(NLS or NLLS - English) Non-Linear Least Squares). In many cases it is possible to obtain an analytical solution. To solve the minimization problem, it is necessary to find stationary points of the function by differentiating it with respect to the unknown parameters b, equating the derivatives to zero and solving the resulting system of equations:

If the model's random errors are normally distributed, have the same variance, and are uncorrelated, OLS parameter estimates are the same as maximum likelihood estimates (MLM).

OLS in the case of a linear model

Let the regression dependence be linear:

Let y is a column vector of observations of the explained variable, and is a matrix of factor observations (the rows of the matrix are the vectors of factor values in a given observation, the columns are the vector of values of a given factor in all observations). The matrix representation of the linear model is:

Then the vector of estimates of the explained variable and the vector of regression residuals will be equal

Accordingly, the sum of squares of the regression residuals will be equal to

Differentiating this function with respect to the vector of parameters and equating the derivatives to zero, we obtain a system of equations (in matrix form):

The solution of this system of equations gives the general formula for least squares estimates for a linear model:

For analytical purposes, the latter representation of this formula is useful. If in a regression model the data centered, then in this representation the first matrix has the meaning of a sample covariance matrix of factors, and the second is a vector of covariances of factors with the dependent variable. If in addition the data is also normalized to MSE (that is, ultimately standardized), then the first matrix has the meaning of a sample correlation matrix of factors, the second vector - a vector of sample correlations of factors with the dependent variable.

An important property of OLS estimates for models with constant- the constructed regression line passes through the center of gravity of the sample data, that is, the equality is satisfied:

In particular, in the extreme case, when the only regressor is a constant, we find that the OLS estimate of the only parameter (the constant itself) is equal to the average value of the explained variable. That is, the arithmetic mean, known for its good properties from the laws of large numbers, is also an least squares estimate - it satisfies the criterion of the minimum sum of squared deviations from it.

Example: simplest (pairwise) regression

In the case of paired linear regression, the calculation formulas are simplified (you can do without matrix algebra):

Properties of OLS estimators

First of all, we note that for linear models, OLS estimates are linear estimates, as follows from the above formula. For unbiased OLS estimates, it is necessary and sufficient to fulfill the most important condition of regression analysis: the mathematical expectation of a random error, conditional on the factors, must be equal to zero. This condition, in particular, is satisfied if

the mathematical expectation of random errors is zero, and
factors and random errors are independent random variables.

The second condition - the condition of exogeneity of factors - is fundamental. If this property is not met, then we can assume that almost any estimates will be extremely unsatisfactory: they will not even be consistent (that is, even a very large amount of data does not allow us to obtain high-quality estimates in this case). In the classical case, a stronger assumption is made about the determinism of the factors, as opposed to a random error, which automatically means that the exogeneity condition is met. In the general case, for the consistency of the estimates, it is sufficient to satisfy the exogeneity condition together with the convergence of the matrix to some non-singular matrix as the sample size increases to infinity.

In order for, in addition to consistency and unbiasedness, estimates of (ordinary) least squares to be also effective (the best in the class of linear unbiased estimates), additional properties of random error must be met:

These assumptions can be formulated for the covariance matrix of the random error vector

A linear model that satisfies these conditions is called classical. OLS estimates for classical linear regression are unbiased, consistent and the most effective estimates in the class of all linear unbiased estimates (in the English literature the abbreviation is sometimes used BLUE (Best Linear Unbaised Estimator) - the best linear unbiased estimate; in Russian literature the Gauss-Markov theorem is more often cited). As is easy to show, the covariance matrix of the vector of coefficient estimates will be equal to:

Generalized OLS

The least squares method allows for broad generalization. Instead of minimizing the sum of squares of the residuals, one can minimize some positive definite quadratic form of the vector of residuals, where is some symmetric positive definite weight matrix. Conventional least squares is a special case of this approach, where the weight matrix is proportional to the identity matrix. As is known from the theory of symmetric matrices (or operators), for such matrices there is a decomposition. Consequently, the specified functional can be represented as follows, that is, this functional can be represented as the sum of the squares of some transformed “remainders”. Thus, we can distinguish a class of least squares methods - LS methods (Least Squares).

It has been proven (Aitken's theorem) that for a generalized linear regression model (in which no restrictions are imposed on the covariance matrix of random errors), the most effective (in the class of linear unbiased estimates) are the so-called estimates. generalized Least Squares (GLS - Generalized Least Squares)- LS method with a weight matrix equal to the inverse covariance matrix of random errors: .

It can be shown that the formula for GLS estimates of the parameters of a linear model has the form

The covariance matrix of these estimates will accordingly be equal to

In fact, the essence of OLS lies in a certain (linear) transformation (P) of the original data and the application of ordinary OLS to the transformed data. The purpose of this transformation is that for the transformed data, the random errors already satisfy the classical assumptions.

Weighted OLS

In the case of a diagonal weight matrix (and therefore a covariance matrix of random errors), we have the so-called weighted Least Squares (WLS). In this case, the weighted sum of squares of the model residuals is minimized, that is, each observation receives a “weight” that is inversely proportional to the variance of the random error in this observation: . In fact, the data are transformed by weighting the observations (dividing by an amount proportional to the estimated standard deviation of the random errors), and ordinary OLS is applied to the weighted data.

Some special cases of using MNC in practice

Approximation of linear dependence

Let us consider the case when, as a result of studying the dependence of a certain scalar quantity on a certain scalar quantity (This could be, for example, the dependence of voltage on current strength: , where is a constant value, the resistance of the conductor), measurements of these quantities were carried out, as a result of which the values and their corresponding values. The measurement data must be recorded in a table.

Table. Measurement results.

Measurement no.
1
2
3
4
5
6

The question is: what value of the coefficient can be selected to best describe the dependence? According to the least squares method, this value should be such that the sum of the squared deviations of the values from the values

was minimal

The sum of squared deviations has one extremum - a minimum, which allows us to use this formula. Let us find from this formula the value of the coefficient. To do this, we transform its left side as follows:

The last formula allows us to find the value of the coefficient, which is what was required in the problem.

Story

Until the beginning of the 19th century. scientists did not have certain rules for solving a system of equations in which the number of unknowns is less than the number of equations; Until that time, private techniques were used that depended on the type of equations and on the wit of the calculators, and therefore different calculators, based on the same observational data, came to different conclusions. Gauss (1795) was the first to use the method, and Legendre (1805) independently discovered and published it under its modern name (French. Méthode des moindres quarrés ) . Laplace related the method to probability theory, and the American mathematician Adrain (1808) considered its probability-theoretic applications. The method was widespread and improved by further research by Encke, Bessel, Hansen and others.

Alternative uses of OLS

The idea of the least squares method can also be used in other cases not directly related to regression analysis. The fact is that the sum of squares is one of the most common proximity measures for vectors (Euclidean metric in finite-dimensional spaces).

One application is the “solution” of systems of linear equations in which the number of equations is greater than the number of variables

where the matrix is not square, but rectangular of size .

Such a system of equations, in the general case, has no solution (if the rank is actually greater than the number of variables). Therefore, this system can be “solved” only in the sense of choosing such a vector to minimize the “distance” between the vectors and . To do this, you can apply the criterion of minimizing the sum of squares of the differences between the left and right sides of the system equations, that is. It is easy to show that solving this minimization problem leads to solving the following system of equations

Approximation of experimental data is a method based on replacing experimentally obtained data with an analytical function that most closely passes or coincides at nodal points with the original values (data obtained during an experiment or experiment). Currently, there are two ways to define an analytical function:

By constructing an n-degree interpolation polynomial that passes directly through all points a given data array. In this case, the approximating function is presented in the form of: an interpolation polynomial in Lagrange form or an interpolation polynomial in Newton form.

By constructing an n-degree approximating polynomial that passes in the closest proximity to points from a given data array. Thus, the approximating function smooths out all random noise (or errors) that may arise during the experiment: the measured values during the experiment depend on random factors that fluctuate according to their own random laws (measurement or instrument errors, inaccuracy or experimental errors). In this case, the approximating function is determined using the least squares method.

Least squares method(in the English-language literature Ordinary Least Squares, OLS) is a mathematical method based on determining the approximating function, which is constructed in the closest proximity to points from a given array of experimental data. The closeness of the original and approximating functions F(x) is determined by a numerical measure, namely: the sum of squared deviations of experimental data from the approximating curve F(x) should be the smallest.

Approximating curve constructed using the least squares method

The least squares method is used:

To solve overdetermined systems of equations when the number of equations exceeds the number of unknowns;

To find a solution in the case of ordinary (not overdetermined) nonlinear systems of equations;

To approximate point values with some approximating function.

The approximating function using the least squares method is determined from the condition of the minimum sum of squared deviations of the calculated approximating function from a given array of experimental data. This criterion of the least squares method is written as the following expression:

The values of the calculated approximating function at the nodal points,

A given array of experimental data at nodal points.

The quadratic criterion has a number of “good” properties, such as differentiability, providing a unique solution to the approximation problem with polynomial approximating functions.

Depending on the conditions of the problem, the approximating function is a polynomial of degree m

The degree of the approximating function does not depend on the number of nodal points, but its dimension must always be less than the dimension (number of points) of a given experimental data array.

∙ If the degree of the approximating function is m=1, then we approximate the tabular function with a straight line (linear regression).

∙ If the degree of the approximating function is m=2, then we approximate the table function with a quadratic parabola (quadratic approximation).

∙ If the degree of the approximating function is m=3, then we approximate the table function with a cubic parabola (cubic approximation).

In the general case, when it is necessary to construct an approximating polynomial of degree m for given table values, the condition for the minimum of the sum of squared deviations over all nodal points is rewritten in the following form:

- unknown coefficients of the approximating polynomial of degree m;

The number of table values specified.

A necessary condition for the existence of a minimum of a function is the equality to zero of its partial derivatives with respect to unknown variables . As a result, we obtain the following system of equations:

Let's transform the resulting linear system of equations: open the brackets and move the free terms to the right side of the expression. As a result, the resulting system of linear algebraic expressions will be written in the following form:

This system of linear algebraic expressions can be rewritten in matrix form:

As a result, a system of linear equations of dimension m+1 was obtained, which consists of m+1 unknowns. This system can be solved using any method for solving linear algebraic equations (for example, the Gaussian method). As a result of the solution, unknown parameters of the approximating function will be found that provide the minimum sum of squared deviations of the approximating function from the original data, i.e. best possible quadratic approximation. It should be remembered that if even one value of the source data changes, all coefficients will change their values, since they are completely determined by the source data.

Approximation of source data by linear dependence

(linear regression)

As an example, let's consider the technique for determining the approximating function, which is specified as a linear dependence. In accordance with the least squares method, the condition for the minimum of the sum of squared deviations is written in the following form:

Coordinates of table nodes;

Unknown coefficients of the approximating function, which is specified as a linear dependence.

A necessary condition for the existence of a minimum of a function is the equality to zero of its partial derivatives with respect to unknown variables. As a result, we obtain the following system of equations:

Let us transform the resulting linear system of equations.

We solve the resulting system of linear equations. The coefficients of the approximating function in analytical form are determined as follows (Cramer’s method):

These coefficients ensure the construction of a linear approximating function in accordance with the criterion of minimizing the sum of squares of the approximating function from the given tabular values (experimental data).

Algorithm for implementing the least squares method

1. Initial data:

An array of experimental data with the number of measurements N is specified

The degree of the approximating polynomial (m) is specified

2. Calculation algorithm:

2.1. The coefficients are determined for constructing a system of equations with dimensions

Coefficients of the system of equations (left side of the equation)

- index of the column number of the square matrix of the system of equations

Free terms of a system of linear equations (right side of the equation)

- index of the row number of the square matrix of the system of equations

2.2. Formation of a system of linear equations with dimension .

2.3. Solving a system of linear equations to determine the unknown coefficients of an approximating polynomial of degree m.

2.4. Determination of the sum of squared deviations of the approximating polynomial from the original values at all nodal points

The found value of the sum of squared deviations is the minimum possible.

Approximation using other functions

It should be noted that when approximating the original data in accordance with the least squares method, the logarithmic function, exponential function and power function are sometimes used as the approximating function.

Logarithmic approximation

Let's consider the case when the approximating function is given by a logarithmic function of the form:

It has many applications, as it allows an approximate representation of a given function by other simpler ones. LSM can be extremely useful in processing observations, and it is actively used to estimate some quantities based on the results of measurements of others containing random errors. In this article, you will learn how to implement least squares calculations in Excel.

Statement of the problem using a specific example

Suppose there are two indicators X and Y. Moreover, Y depends on X. Since OLS interests us from the point of view of regression analysis (in Excel its methods are implemented using built-in functions), we should immediately move on to considering a specific problem.

So, let X be the retail space of a grocery store, measured in square meters, and Y be the annual turnover, determined in millions of rubles.

It is required to make a forecast of what turnover (Y) the store will have if it has this or that retail space. Obviously, the function Y = f (X) is increasing, since the hypermarket sells more goods than the stall.

A few words about the correctness of the initial data used for prediction

Let's say we have a table built using data for n stores.

According to mathematical statistics, the results will be more or less correct if data on at least 5-6 objects is examined. In addition, “anomalous” results cannot be used. In particular, an elite small boutique can have a turnover that is several times greater than the turnover of large retail outlets of the “masmarket” class.

The essence of the method

The table data can be depicted on a Cartesian plane in the form of points M 1 (x 1, y 1), ... M n (x n, y n). Now the solution to the problem will be reduced to the selection of an approximating function y = f (x), which has a graph passing as close as possible to the points M 1, M 2, .. M n.

Of course, you can use a high-degree polynomial, but this option is not only difficult to implement, but also simply incorrect, since it will not reflect the main trend that needs to be detected. The most reasonable solution is to search for the straight line y = ax + b, which best approximates the experimental data, or more precisely, the coefficients a and b.

Accuracy assessment

With any approximation, assessing its accuracy is of particular importance. Let us denote by e i the difference (deviation) between the functional and experimental values for point x i, i.e. e i = y i - f (x i).

Obviously, to assess the accuracy of the approximation, you can use the sum of deviations, i.e., when choosing a straight line for an approximate representation of the dependence of X on Y, you should give preference to the one with the smallest value of the sum e i at all points under consideration. However, not everything is so simple, since along with positive deviations there will also be negative ones.

The issue can be solved using deviation modules or their squares. The last method is the most widely used. It is used in many areas, including regression analysis (in Excel, it is implemented using two built-in functions), and has long been proven to be effective.

Least squares method

Excel, as you know, has a built-in AutoSum function that allows you to calculate the values of all values located in the selected range. Thus, nothing will prevent us from calculating the value of the expression (e 1 2 + e 2 2 + e 3 2 + ... e n 2).

In mathematical notation this looks like:

Since the decision was initially made to approximate using a straight line, we have:

Thus, the task of finding the straight line that best describes the specific dependence of the quantities X and Y comes down to calculating the minimum of a function of two variables:

To do this, you need to equate the partial derivatives with respect to the new variables a and b to zero, and solve a primitive system consisting of two equations with 2 unknowns of the form:

After some simple transformations, including division by 2 and manipulation of sums, we get:

Solving it, for example, using Cramer’s method, we obtain a stationary point with certain coefficients a * and b *. This is the minimum, i.e. to predict what turnover a store will have for a certain area, the straight line y = a * x + b * is suitable, which is a regression model for the example in question. Of course, it will not allow you to find the exact result, but it will help you get an idea of whether purchasing a specific area on store credit will pay off.

How to Implement Least Squares in Excel

Excel has a function for calculating values using least squares. It has the following form: “TREND” (known Y values; known X values; new X values; constant). Let's apply the formula for calculating OLS in Excel to our table.

To do this, enter the “=” sign in the cell in which the result of the calculation using the least squares method in Excel should be displayed and select the “TREND” function. In the window that opens, fill in the appropriate fields, highlighting:

range of known values for Y (in this case, data for trade turnover);
range x 1 , …x n , i.e. the size of retail space;
both known and unknown values of x, for which you need to find out the size of the turnover (for information about their location on the worksheet, see below).

In addition, the formula contains the logical variable “Const”. If you enter 1 in the corresponding field, this will mean that you should carry out the calculations, assuming that b = 0.

If you need to find out the forecast for more than one x value, then after entering the formula you should not press “Enter”, but you need to type the combination “Shift” + “Control” + “Enter” on the keyboard.

Some features

Regression analysis can be accessible even to dummies. The Excel formula for predicting the value of an array of unknown variables—TREND—can be used even by those who have never heard of least squares. It is enough just to know some of the features of its work. In particular:

If you arrange the range of known values of the variable y in one row or column, then each row (column) with known values of x will be perceived by the program as a separate variable.
If a range with known x is not specified in the TREND window, then when using a function in Excel, the program will treat it as an array consisting of integers, the number of which corresponds to the range with the given values of the y variable.
To output an array of “predicted” values, the expression for calculating the trend must be entered as an array formula.
If new x values are not specified, then the TREND function considers them equal to the known ones. If they are not specified, then array 1 is taken as an argument; 2; 3; 4;…, which is commensurate with the range with already specified parameters y.
The range containing the new x values must have the same or more rows or columns as the range containing the given y values. In other words, it must be proportional to the independent variables.
An array with known x values can contain multiple variables. However, if we are talking about only one, then it is required that the ranges with the given values of x and y be proportional. In the case of several variables, it is necessary that the range with the given y values fit in one column or one row.

PREDICTION function

Implemented using several functions. One of them is called “PREDICTION”. It is similar to “TREND”, i.e. it gives the result of calculations using the least squares method. However, only for one X, for which the value of Y is unknown.

Now you know formulas in Excel for dummies that allow you to predict the future value of a particular indicator according to a linear trend.

Having chosen the type of regression function, i.e. the type of the considered model of the dependence of Y on X (or X on Y), for example, a linear model y x =a+bx, it is necessary to determine the specific values of the model coefficients.

For different values of a and b, it is possible to construct an infinite number of dependencies of the form y x = a + bx, i.e. there are an infinite number of straight lines on the coordinate plane, but we need a dependency that best corresponds to the observed values. Thus, the task comes down to selecting the best coefficients.

We look for the linear function a+bx based only on a certain number of available observations. To find the function with the best fit to the observed values, we use the least squares method.

Let us denote: Y i - the value calculated by the equation Y i =a+bx i. y i - measured value, ε i =y i -Y i - difference between measured and calculated values using the equation, ε i =y i -a-bx i .

The least squares method requires that ε i, the difference between the measured y i and the values Y i calculated from the equation, be minimal. Consequently, we find the coefficients a and b so that the sum of the squared deviations of the observed values from the values on the straight regression line is the smallest:

By examining this function of arguments a and for extremum using derivatives, we can prove that the function takes a minimum value if the coefficients a and b are solutions to the system:

(2)

If we divide both sides of the normal equations by n, we get:

Considering that (3)

We get , from here, substituting the value of a into the first equation, we get:

In this case, b is called the regression coefficient; a is called the free term of the regression equation and is calculated using the formula:

The resulting straight line is an estimate for the theoretical regression line. We have:

So, is a linear regression equation.

Regression can be direct (b>0) and reverse (b Example 1. The results of measuring the values of X and Y are given in the table:

x i	-2	0	1	2	4
y i	0.5	1	1.5	2	3

Assuming that there is a linear relationship between X and Y y=a+bx, determine the coefficients a and b using the least squares method.

Solution. Here n=5
x i =-2+0+1+2+4=5;
x i 2 =4+0+1+4+16=25
x i y i =-2 0.5+0 1+1 1.5+2 2+4 3=16.5
y i =0.5+1+1.5+2+3=8

and normal system (2) has the form

Solving this system, we get: b=0.425, a=1.175. Therefore y=1.175+0.425x.

Example 2. There is a sample of 10 observations of economic indicators (X) and (Y).

x i	180	172	173	169	175	170	179	170	167	174
y i	186	180	176	171	182	166	182	172	169	177

You need to find a sample regression equation of Y on X. Construct a sample regression line of Y on X.

Solution. 1. Let's sort the data according to the values x i and y i . We get a new table:

x i	167	169	170	170	172	173	174	175	179	180
y i	169	171	166	172	180	176	177	182	182	186

To simplify the calculations, we will draw up a calculation table in which we will enter the necessary numerical values.

x i	y i	x i 2	x i y i
167	169	27889	28223
169	171	28561	28899
170	166	28900	28220
170	172	28900	29240
172	180	29584	30960
173	176	29929	30448
174	177	30276	30798
175	182	30625	31850
179	182	32041	32578
180	186	32400	33480
∑x i =1729	∑y i =1761	∑x i 2 299105	∑x i y i =304696
x=172.9	y=176.1	x i 2 =29910.5	xy=30469.6

According to formula (4), we calculate the regression coefficient

and according to formula (5)

Thus, the sample regression equation is y=-59.34+1.3804x.
Let's plot the points (x i ; y i) on the coordinate plane and mark the regression line.

Fig 4

Figure 4 shows how the observed values are located relative to the regression line. To numerically assess the deviations of y i from Y i, where y i are observed and Y i are values determined by regression, let’s create a table:

x i	y i	Y i	Y i -y i
167	169	168.055	-0.945
169	171	170.778	-0.222
170	166	172.140	6.140
170	172	172.140	0.140
172	180	174.863	-5.137
173	176	176.225	0.225
174	177	177.587	0.587
175	182	178.949	-3.051
179	182	184.395	2.395
180	186	185.757	-0.243

Yi values are calculated according to the regression equation.

The noticeable deviation of some observed values from the regression line is explained by the small number of observations. When studying the degree of linear dependence of Y on X, the number of observations is taken into account. The strength of the dependence is determined by the value of the correlation coefficient.

Example.

Experimental data on the values of variables X And at are given in the table.

As a result of their alignment, the function is obtained

Using least squares method, approximate these data by a linear dependence y=ax+b(find parameters A And b). Find out which of the two lines better (in the sense of the least squares method) aligns the experimental data. Make a drawing.

The essence of the least squares method (LSM).

The task is to find the linear dependence coefficients at which the function of two variables A And b takes the smallest value. That is, given A And b the sum of squared deviations of the experimental data from the found straight line will be the smallest. This is the whole point of the least squares method.

Thus, solving the example comes down to finding the extremum of a function of two variables.

Deriving formulas for finding coefficients.

A system of two equations with two unknowns is compiled and solved. Finding partial derivatives of a function with respect to variables A And b, we equate these derivatives to zero.

We solve the resulting system of equations using any method (for example by substitution method or ) and obtain formulas for finding coefficients using the least squares method (LSM).

Given A And b function takes the smallest value. The proof of this fact is given.

That's the whole method of least squares. Formula for finding the parameter a contains the sums , , , and parameter n- amount of experimental data. We recommend calculating the values of these amounts separately. Coefficient b found after calculation a.

It's time to remember the original example.

Solution.

In our example n=5. We fill out the table for the convenience of calculating the amounts that are included in the formulas of the required coefficients.

The values in the fourth row of the table are obtained by multiplying the values of the 2nd row by the values of the 3rd row for each number i.

The values in the fifth row of the table are obtained by squaring the values in the 2nd row for each number i.

The values in the last column of the table are the sums of the values across the rows.

We use the formulas of the least squares method to find the coefficients A And b. We substitute the corresponding values from the last column of the table into them:

Hence, y = 0.165x+2.184- the desired approximating straight line.

It remains to find out which of the lines y = 0.165x+2.184 or better approximates the original data, that is, makes an estimate using the least squares method.

Error estimation of the least squares method.

To do this, you need to calculate the sum of squared deviations of the original data from these lines And , a smaller value corresponds to a line that better approximates the original data in the sense of the least squares method.

Since , then straight y = 0.165x+2.184 better approximates the original data.

Graphic illustration of the least squares (LS) method.

Everything is clearly visible on the graphs. The red line is the found straight line y = 0.165x+2.184, the blue line is , pink dots are the original data.

Why is this needed, why all these approximations?

I personally use it to solve problems of data smoothing, interpolation and extrapolation problems (in the original example they might be asked to find the value of an observed value y at x=3 or when x=6 using the least squares method). But we’ll talk more about this later in another section of the site.

Proof.

So that when found A And b function takes the smallest value, it is necessary that at this point the matrix of the quadratic form of the second order differential for the function was positive definite. Let's show it.