kolutastmr

2022-07-01

Clustering data using mixed integer linear programming
I am trying to understand if it is possible to use mixed integer linear programming (MILP) in order to perform a basic clustering operation to a dataset D. I know there exists standard algorithms for it but I am particularly interested in MILP.
Assume one has the dataset $D={D}_{1}\cup {D}_{2}$ with:
${D}_{1}=\left[1,2,2,3,4,3,1,0,2,2,1\right]\phantom{\rule{0ex}{0ex}}{D}_{2}=\left[7,6,5,8,7,7,4,7,9,8,6\right]$
Is there some MILP algorithm that can cluster the concatenated and shuffled data SD, with S some permutation acting on D, which can cluster each element of D to either ${D}_{1}$ or ${D}_{2}$?

Dobermann82

Expert

Step 1
Let continuous variables ${x}_{1}$ and ${x}_{2}$ represent the centers of the two clusters. Let binary variable ${y}_{i}$ indicate whether element index i (with value ${d}_{i}$) is in ${D}_{1}$. Now minimize $\sum _{i}|{d}_{i}-{c}_{1}|{y}_{i}+\sum _{i}|{d}_{i}-{c}_{2}|\left(1-{y}_{i}\right),$, which you can linearize as follows. Introduce nonnegative continuous variables ${z}_{i,1}$ and ${z}_{i,2}$ to represent the summands, and enforce ${y}_{i}=1\phantom{\rule{thickmathspace}{0ex}}⟹\phantom{\rule{thickmathspace}{0ex}}{z}_{i,1}\ge |{d}_{i}-{c}_{1}|$ and ${y}_{i}=0\phantom{\rule{thickmathspace}{0ex}}⟹\phantom{\rule{thickmathspace}{0ex}}{z}_{i,1}\ge |{d}_{i}-{c}_{2}|$.
Step 2
The problem is to minimize $\sum _{i}{z}_{i,1}+\sum _{i}{z}_{i,2}$
subject to linear big-M constraints
$\begin{array}{rl}\left({d}_{i}-{c}_{1}\right)-{z}_{i,1}& \le {M}_{1}\left(1-{y}_{i}\right)\\ -\left({d}_{i}-{c}_{1}\right)-{z}_{i,1}& \le {M}_{1}\left(1-{y}_{i}\right)\\ \left({d}_{i}-{c}_{2}\right)-{z}_{i,2}& \le {M}_{2}{y}_{i}\\ -\left({d}_{i}-{c}_{2}\right)-{z}_{i,2}& \le {M}_{2}{y}_{i}\end{array}$

Do you have a similar question?