RedKMeans k-means clustering



group points in space.  similar to KMeans quark by Dan Stowell but this one is implemented slightly differently and also uses RedVector.


*new(k, max)

k is the number of centroids.  the default is 5.

max is the maximum number of tries in undecided/borderline situations.  the default is 15.

reset

this will reset any previously calculated centroids on next update.

classify(vec)

test only one vector on the current centroids.  rarely needed.

update(vectors)

vectors should be an array of RedVector objects.  the vectors can be of any dimension.

<>k

<>max

<>centroids

an array containing the result after the update method is called.

it holds k number of RedVector objects representing the found centroids.

<classifications

an array containing the result after the update method is called.

it holds indices and have the same size as the number of items in the vectors array.

each index represents which centroid each vector belongs to.

so to look up the closest centroid for vector i, do centroids[classifications[i]].



a= RedKMeans(5);

b= {RedVector2D[640, 480].rand}.dup(1000); //create some test 2d points

a.update(b); //calculate.  results will be in variables centroids and classifications

a.centroids; //returns a list with 5 centroids

a.classifications; //returns a list with 1000 centroid indices.  one for each vector


(

w= Window("k-means plot", Rect(100, 100, 640, 480), false);

w.view.background= Color.black;

w.drawHook= {

var colors= {|i| Color.hsv(i/a.k, 1, 1)}.dup(a.k);

b.do{|vec, i|

Pen.fillColor= colors[a.classifications[i]];

Pen.fillRect(Rect.aboutRedVector2D(vec, 1));

};

Pen.fillColor= Color.white;

a.centroids.do{|cent, i|

Pen.strokeColor= colors[i];

Pen.strokeOval(Rect.aboutRedVector2D(cent, 8));

Pen.stringAtPoint(i.asString, Point(cent[0], cent[1]));

};

};

w.front;

)


a.k= 8; //add some more centroids

a.update(b); //call update on the same data to recalculate and find the new centroids

a.centroids.size //will be 8

w.refresh; //now plot again.  3 new centroids added and the previous ones made room


b= {RedVector2D[640, 480].rand}.dup(1000); //replace the data completely

a.update(b); //call update again

w.refresh; //data changed and old centroids adapted by moving a bit


b= b.copyRange(0, 499); //strip away half the data

a.update(b); //call update again

w.refresh; //data loss and old centroids adapted by moving a tiny bit


a.reset; //forget centroids

a.update(b);

w.refresh; //same data but new centroids found



a.classify(RedVector2D[320, 240]); //test middle point and see which class it would belong to


w.close