kolutastmr

Answered

2022-07-01

Clustering data using mixed integer linear programming

I am trying to understand if it is possible to use mixed integer linear programming (MILP) in order to perform a basic clustering operation to a dataset D. I know there exists standard algorithms for it but I am particularly interested in MILP.

Assume one has the dataset $D={D}_{1}\cup {D}_{2}$ with:

${D}_{1}=[1,2,2,3,4,3,1,0,2,2,1]\phantom{\rule{0ex}{0ex}}{D}_{2}=[7,6,5,8,7,7,4,7,9,8,6]$

Is there some MILP algorithm that can cluster the concatenated and shuffled data SD, with S some permutation acting on D, which can cluster each element of D to either ${D}_{1}$ or ${D}_{2}$?

I am trying to understand if it is possible to use mixed integer linear programming (MILP) in order to perform a basic clustering operation to a dataset D. I know there exists standard algorithms for it but I am particularly interested in MILP.

Assume one has the dataset $D={D}_{1}\cup {D}_{2}$ with:

${D}_{1}=[1,2,2,3,4,3,1,0,2,2,1]\phantom{\rule{0ex}{0ex}}{D}_{2}=[7,6,5,8,7,7,4,7,9,8,6]$

Is there some MILP algorithm that can cluster the concatenated and shuffled data SD, with S some permutation acting on D, which can cluster each element of D to either ${D}_{1}$ or ${D}_{2}$?

Answer & Explanation

Dobermann82

Expert

2022-07-02Added 15 answers

Step 1

Let continuous variables ${x}_{1}$ and ${x}_{2}$ represent the centers of the two clusters. Let binary variable ${y}_{i}$ indicate whether element index i (with value ${d}_{i}$) is in ${D}_{1}$. Now minimize $\sum _{i}|{d}_{i}-{c}_{1}|{y}_{i}+\sum _{i}|{d}_{i}-{c}_{2}|(1-{y}_{i}),$, which you can linearize as follows. Introduce nonnegative continuous variables ${z}_{i,1}$ and ${z}_{i,2}$ to represent the summands, and enforce ${y}_{i}=1\phantom{\rule{thickmathspace}{0ex}}\u27f9\phantom{\rule{thickmathspace}{0ex}}{z}_{i,1}\ge |{d}_{i}-{c}_{1}|$ and ${y}_{i}=0\phantom{\rule{thickmathspace}{0ex}}\u27f9\phantom{\rule{thickmathspace}{0ex}}{z}_{i,1}\ge |{d}_{i}-{c}_{2}|$.

Step 2

The problem is to minimize $\sum _{i}{z}_{i,1}+\sum _{i}{z}_{i,2}$

subject to linear big-M constraints

$\begin{array}{rl}({d}_{i}-{c}_{1})-{z}_{i,1}& \le {M}_{1}(1-{y}_{i})\\ -({d}_{i}-{c}_{1})-{z}_{i,1}& \le {M}_{1}(1-{y}_{i})\\ ({d}_{i}-{c}_{2})-{z}_{i,2}& \le {M}_{2}{y}_{i}\\ -({d}_{i}-{c}_{2})-{z}_{i,2}& \le {M}_{2}{y}_{i}\end{array}$

Let continuous variables ${x}_{1}$ and ${x}_{2}$ represent the centers of the two clusters. Let binary variable ${y}_{i}$ indicate whether element index i (with value ${d}_{i}$) is in ${D}_{1}$. Now minimize $\sum _{i}|{d}_{i}-{c}_{1}|{y}_{i}+\sum _{i}|{d}_{i}-{c}_{2}|(1-{y}_{i}),$, which you can linearize as follows. Introduce nonnegative continuous variables ${z}_{i,1}$ and ${z}_{i,2}$ to represent the summands, and enforce ${y}_{i}=1\phantom{\rule{thickmathspace}{0ex}}\u27f9\phantom{\rule{thickmathspace}{0ex}}{z}_{i,1}\ge |{d}_{i}-{c}_{1}|$ and ${y}_{i}=0\phantom{\rule{thickmathspace}{0ex}}\u27f9\phantom{\rule{thickmathspace}{0ex}}{z}_{i,1}\ge |{d}_{i}-{c}_{2}|$.

Step 2

The problem is to minimize $\sum _{i}{z}_{i,1}+\sum _{i}{z}_{i,2}$

subject to linear big-M constraints

$\begin{array}{rl}({d}_{i}-{c}_{1})-{z}_{i,1}& \le {M}_{1}(1-{y}_{i})\\ -({d}_{i}-{c}_{1})-{z}_{i,1}& \le {M}_{1}(1-{y}_{i})\\ ({d}_{i}-{c}_{2})-{z}_{i,2}& \le {M}_{2}{y}_{i}\\ -({d}_{i}-{c}_{2})-{z}_{i,2}& \le {M}_{2}{y}_{i}\end{array}$

Most Popular Questions