Why is it valid to use squared Euclidean distances in high dimensions in multiple regression?

scherezade29pc

scherezade29pc

Answered question

2022-07-14

Why is it valid to use squared Euclidean distances in high dimensions in multiple regression?
Euclidean distance is not linear in high dimensions. However, in multiple regression the idea is to minimize square distances from data points to a hyperplane.
Other data analysis techniques have been considered problematic for their reliance on Euclidean distances (nearest neighbors), and dimensionality reduction techniques have been proposed.
Why is this not a problem in multiple regression?

Answer & Explanation

Anaya Gregory

Anaya Gregory

Beginner2022-07-15Added 14 answers

I'm not sure how regression applies to this question but Euclidean distance is valid in any number of dimensions as shown by the Pythagorean theorem where:
A 2 + B 2 + C 2 + + X 2 = Y 2 where X , Y   are arbitrary variables
For example, we know that 3 2 + 4 2 = 5 2 can be a diagonal on the front of a box. If the depth of the box is 12, then we can also know that the distance between opposite corners is shown by the equation 3 2 + 4 2 + 12 2 = 13 2   because the 5 2 in ( 5 , 12 , 13 ) calculations can be replaced with 3 2 + 4 2
Likewise, if we have a 4D box 3 2 + 4 2 + 12 2 + 84 2 = 85 2 because the triples ( 3 , 4 , 5 ) ( 5 , 12 , 13 ) ( 13 , 84 , 85 ) can all be similarly joined into a quintuple. The process can be reversed for dimensional reduction. For example 3 2 + 4 2 + 12 2 + 84 2 = 3 2 + 12.64911064 2 + 84 2 = 85
At least one form of regression works on minimizing distances and the corner-to-corner distances here are the shortest [straight-line] distance. The example has shown integer solutions but works the same with non-integers such as those found by   A = x 1 x 0 B = y 1 y 0 C = z 1 z 0
There are methods of finding missing values if one is unknown and I have been compiling many of these in a paper. If these would be useful, I can probably translate them to your application if I can understand what your application is.
Ethen Frey

Ethen Frey

Beginner2022-07-16Added 6 answers

The nice thing about squared Euclidean distance is that is the simplest function that is differentiable, so you can set the derivative (or partial derivatives) to zero to get an extreme value.
It also has other nice properties such as easily giving the minimizer of the distance of a value to a set of values is the mean (i.e., minimizing i = 1 n ( x i a ) 2 with respect to a gives a = ( 1 / n ) i = 1 n x i ).
Whether this is what you want is another matter.

Do you have a similar question?

Recalculate according to your conditions!

New Questions in Inferential Statistics

Ask your question.
Get an expert answer.

Let our experts help you. Answer in as fast as 15 minutes.

Didn't find what you were looking for?