Skip to main content

Partitioning - or, students into n groups - in Ruby

A while back a friend of mine asked if I could automate the creation of groups in a class of students - to maximize the variation within each group, but minimize the difference between them. This sounded like an interesting problem, so I set about to solve it in my own naive experimental Monte Carlo (inspired) way - without looking into ways this has been solved (surely elegantly) before. The result was this little Ruby script:
First I just require some code I have previously written. (I guess I really should make them into gems or something instead of copying code around...) names.rb tries to guess sex from a person's name and/or title, and countries.rb maps countries to continents, regions etc with some fuzzy matching of names. (I'm looking at you Democratic Republic of Kongo, Koreas etc.) Easy-peasy.

Then I set some standard variables. (Maybe I'll make this dynamic in the future, why not?) The most interesting entry here is the classifiers hash. This maps the columns we want to use later on to maximize the difference of to their respective weights. (This should of course be updated to match whatever is in the input file...) Another semi-useful parameter is number_of_runs. This is the number of simulations the partitioner will perform before keeping the best partitioning. And of course number_of_groups - the number of groups that the students will be split over.
Høst i Tete D'Or #1

count is just a variant of my count function already covered here in the blog (link).

I slurp up the list of participants as an array of hashes, from the file name specified above, converting the headers as specified in the variable_names_map hash and generating some standard variables in the process - unless they are already present.

The important function in my naive approach is the scoring_function. This basically looks at the extremes one variable at the time. That is, it checks to see if any value (or category) of a variable is over or underrepresented, based on the expected number of elements with this value. samples is a count of number of elements in the group. I iterate over all categories of the variable to find the worst sinner (squared). So if we are looking at sex, it will check male, and then female and generate a score based on the category that is most off from the expected value (The number of participants with that characteristics divided by number of participants in this group).

The penalty_function simply assigns scores to each group in the current partition using the scoring_function per classifier multiplied by the weight of the respective classifiers.

The main loop of the program runs the simulations number_of_runs times. I clone the participants list and sample from that one so that I can delete participants as I add them to the partitions as I generete them. (Maybe this step is unecessary if Ruby's sample function doesn't loop before all the elements have been exhausted...) Update: I halved the runtime by rather sampling once - a list of the same length as the original list and popping elements from that sampled list. The way I then instansiatiate the empty partitions hash is quite a neat Ruby feature/trick/hack. The default value of a hash (or array) can take blocks as constructors! This means that if you try to pull something non-existant from a hash a new element will be generated based on that block. In this case I generate an empty array when that occurs. (An alternative to that would be to use the ||= operator while adding elements, but that is maybe slightly less elegant...)
Høst i Tete D'Or #2
After this loop I just write out the best partition.

Voila - it takes about 10 5 seconds on my current computer(s) to generate these groups when number of runs is set to 10000. Not sure if the algorithm is scientifically sound (and it is quite possible rather an overkill), but it seems to actually do the job when assigning a flock of students to groups and was the first thing that sprang to my mind when wanting to create something quickly (It took way longer to write up this blogpost than the code) and... dirtily... 

The group work got done and groups seemed fairly balanced.

- Penalizes small groups more heavily than larger groups (if some groups has more participants than other)?
- Variables with more categories dominates the score? (There's probably a useful statistical tool to solve this... (Normalisation?))
- Variables have to be categoric (age has to be grouped).

Potential future work:
- Split out the scoring function (and possible the penalty function) as a stand alone library.
- Let the script take arguments for useful parameters (filename, number_of_runs, group_size, variable_names_map, classifiers etc...)
- Could it be interesting to look at continuos variables (ie. age) as well as categorical ones? 
- Rather than discarding previous partitions one could let new ones evolve from a certain number of branches? (Local minimum problems?)

(BTW; God jul!)


Popular posts from this blog

Fix your rapid blinking Marantz SR-6004 using nothing but 3 fingers - and a thumb

A couple of years ago my (most of the time excellent) Maranz SR6004 acted up. It did't want to turn itself on. Properly. Just stood there and blinked rapidly. Its little red light that is. At me. The solution was so simple that I didn't bother to write it down as I was sure to remember it. Alas, no. Some weeks ago it did it again. (Can it be the heat?) Just stood there blinking rapidly at me. The manual just said - as it said last time around - that it was time to return the unit to it's maker. Or similar. Some googling led me to this page:  The technical term for what I had experienced seems to be "The Pop of Death". Aïe. But!, humongous letters said: YOU CAN SOMETIMES RESET THE UNIT BY PRESSING SURR MODE, CLEAR AND EXIT SIMULTANEOUSLY And so I did. And so it was fixed. And all was well. (And now I have written it down for the next time.)

Fix upside down Skype video in Ubuntu 12.10 [UPDATED]

When launching Skype in 64-bit Ubuntu 12.10 on my Asus U35J the webcam image was all topsy-turvy. Since I don't live in Australia, or something (tsk-tsk), this was not really cutting it for me.  Some quick googling led me to this forum post:   After making sure that the necessary packages was installed (notably  libv4l-0) I adapted the command from the forum post to: LD_PRELOAD=/usr/lib/i386-linux-gnu/libv4l/ skype and voila, the image was OK. Next step is for this to be set to default, which seems to be outlined here (in steps 2 and 3):  (Actually this post seems to cover most of what is useful from the forum post above...) UPDATE (19/04/2013): Since my laptop was working fine, I decided it was about time to fix it. Also I wanted to

Using a Raspberry Pi as a MIDI USB/5-pin bridge

In my constant... need... to get everything music instrument related to communicate with each other, I wanted to look into ways to get some of my keyboards/synths with only MIDI over USB to talk to devices with regular good old-fashioned 5-pin MIDI ports from the eighties. Cables! First I had a quick look at off the shelf solutions. The most interesting one being the Kenton MIDI USB Host – providing MIDI host functionality for USB devices as well as regular MIDI in and out in a small box. Unfortunately it is rather expensive (~125 €) and a reliable online source warned me that it was not entirely stable in collaboration with my OP-1, so I started thinking of more... home-grown solutions. I decided to try to use my old Raspberry Pi and see if that would serve as a USB host with a borrowed MIDI USB adapter. (Thanks Simon.) A cheaper, and, as an added boon, a nerdier solution. Step 1: Get the USB MIDI device up and running This was the easy part. The device I have been lent