A few weeks ago, the 2010 rendition of Google’s Summer of Code came to an end, and with it, the official (summer-wise, at least) end of my spectral clustering project for Apache Mahout.
I wasn’t 100% successful, as you can read about in greater detail on my project blog, but I did pretty well: Mahout now has two additional clustering algorithms, specialized to boot with a focus on the spectral properties of the underlying graphs that can comprise data. One of the algorithms works flawlessly (but still needs a bit of optimizing), while the other needs only slight tweaking to bring up to code.
All in all, pretty freaking successful.
I have to consider at my project’s conclusion, though, how it was that I was given a passing mark on my final evaluation. In order to do so, I want to take a step back and view the project from a more general perspective, including the open source movement in more general terms.
The open source vs closed source argument notwithstanding, open source has its own demons to handle. I realize I was extremely fortunate this past summer not only to have a fantastic mentor in the form of Isabel Drost – co-founder of and committer with Apache Mahout – but also to work with a community as cohesive as that surrounding the Mahout project, and indeed Apache in general. Those folks don’t mess around; they’re more than happy to answer any questions we newbies have (though I’m fairly sure I tested the patience of those on the list…I probably averaged one question per day), but there isn’t any off-topic chatter whatsoever (a problem I’ve looked at previously).
Open Source projects tend to follow the same heuristics as those that memory garbage collectors follow: namely, those which have been around tend to stay around, and those which are new tend to die out quickly. Nowhere else is this more apparent than SourceForge, or any other open source project hosting for that matter. There are certainly very popular and well-maintained projects on these hosts, but the odds of a new project gaining ground are, frankly, minuscule.
And it isn’t just for lack of interest, either.
My guess is this: after simple lack of interest or direction, internal strife is the biggest reason for open source projects failing. I’ve seen similar situations in other open source communities as to how I would often characterize my fraternity: you have a few dozen highly intelligent individuals under the same roof, with a few dozen extremely sound – but mutually exclusive – opinions on every issue.
Which, unless you possess unearthly powers of moderation, will often lead to internal division and the formation of camps rallying around specific philosophies. Queue: project forking (if you’re lucky), or project shutdown (probably more common).
I wish there was an easy answer; though if there was, I suspect the current political climate in the United States would not be what it is, either.
As such, I count my blessings that I was not only given a challenging and enjoyable summer project, but also that I was surrounded by such a healthy and drama-free environment. I am sure Mahout and Apache in general have had their fare share of conflict, strife, and plain-old trollz. But as I suspected from the start, they’re the Big Man on the Open Source Campus.
Plus it helps that the Mahout project lends itself to more academically-minded individuals with extensive backgrounds in statistical machine learning 😛