Why is it valid to use squared Euclidean distances in high dimensions in multiple regression?Euclidean distance is not linear in high dimensions. However, in multiple regression the idea is to minimize square distances from data points to a hyperplane.Other data analysis techniques have been considered problematic for their reliance on Euclidean distances (nearest neighbors), and dimensionality reduction techniques have been proposed.Why is this not a problem in multiple regression?

Question

Anaya Gregory · Accepted Answer

I&#039;m not sure how regression applies to this question but Euclidean distance is valid in any number of dimensions as shown by the Pythagorean theorem where:      A    2    +      B    2    +      C    2    +  ⋯  +      X    2    =      Y    2      where    X  ,  Y     are arbitrary variablesFor example, we know that       3    2    +      4    2    =      5    2   can be a diagonal on the front of a box. If the depth of the box is   12, then we can also know that the distance between opposite corners is shown by the equation       3    2    +      4    2    +      12    2    =      13    2      because the       5    2   in   (  5  ,  12  ,  13  ) calculations can be replaced with       3    2    +      4    2  Likewise, if we have a   4D box         3    2    +      4    2    +      12    2    +      84    2    =      85    2     because the triples     (  3  ,  4  ,  5  )    (  5  ,  12  ,  13  )    (  13  ,  84  ,  85  )   can all be similarly joined into a quintuple. The process can be reversed for dimensional reduction. For example         3    2    +      4    2    +      12    2    +      84    2    =        3    2    +      12.64911064    2    +      84    2    =  85At least one form of regression works on minimizing distances and the corner-to-corner distances here are the shortest [straight-line] distance. The example has shown integer solutions but works the same with non-integers such as those found by      A  =      x    1    −      x    0      B  =      y    1    −      y    0      C  =      z    1    −      z    0      ⋯There are methods of finding missing values if one is unknown and I have been compiling many of these in a paper. If these would be useful, I can probably translate them to your application if I can understand what your application is.

Ethen Frey · Accepted Answer

The nice thing about squared Euclidean distance is that is the simplest function that is differentiable, so you can set the derivative (or partial derivatives) to zero to get an extreme value.It also has other nice properties such as easily giving the minimizer of the distance of a value to a set of values is the mean (i.e., minimizing       ∑          i      =      1        n    (      x    i    −  a      )    2   with respect to   a gives   a  =  (  1      /    n  )      ∑          i      =      1        n        x    i  ).Whether this is what you want is another matter.

Why is it valid to use squared Euclidean distances in high dimensions in multiple regression?

Answered question

Answer & Explanation

New Questions in Inferential Statistics