Inspired by the partitioning problem I set about to implement a well known algorithm, k-means clustering, from memory, for fun! ... and, for science... Interestingly, this is somehow the opposite of the partitioning problem. In the partitioning problem we tried to maximize variation of categorical variables within groups, whereas here we're trying to find groups of elements that are the most "similar" to each other in a n-dimensional continuous space. The main idea of k-means is the following - if you have a bunch of elements and a given number of clusters: Create the initial clusters randomly within the space spanned by the elements (typically (always?) you would pick randomly from your elements). Lob all elements into the cluster with the nearest center (using some Euclidean distance metric typically). Recenter each cluster on the average of its elements, If necessary move the elements to their now nearest clusters. Repeat the "re-centering" and mov
A series of notes to myself.