April showers do indeed bring May flowers…but only if you pass your exams. Which usually requires being indisposed to the point of completely missing the transition from rain to flowers.

Oh, academia. ANYWHO.

My apologies for the lack of updates lately. We’re in the middle of crunch time here in the college town of Pittsburgh, and while my academic schedule is on the downhill, there’s still plenty going on that requires my full and undistracted attention.

The reason for the downhill is the fact that, last Thursday, I delivered and successfully defended my master’s thesis. It was roughly a 25-minute presentation, followed by several questions from the (rather large, at least for a master’s thesis…had a good 15 people there who weren’t my committee) audience. Once the audience finished, the committee brought me in for some more thorough grilling. I’ll post the presentation slides, the thesis write-up, and the overview of the thesis itself on my wiki soon enough, but the upshot was that the questions weren’t surprising. I was aware of the soft spots in my thesis, but it was theoretically sound enough for them to pass it with relative ease.

Just about everyone who proofread my thesis / helped me fix my slides / had anything to do with helping me flesh out ideas said that it was an excellent idea, regardless of the results I got. The problem was, my results actually disproved my hypothesis. As in, there wasn’t any correlation between the things I was trying to correlate. That was difficult to handle at first…yeah ok, I never did quite handle it at all.

But this is the point of science. Even when the hypothesis is wrong, it still tells us something. As a consequence from this thesis, I learned that protein homology is nowhere near as good a method for determining protein location as I’d previously thought (seriously: if proteins are structurally/sequentially similar, wouldn’t you think that they’d behave similarly? evidently, if there is a correlation, it’s nowhere near as strong as I’d have thought). And so, without any further adieu, the biggest roadblock between me and a master’s degree in computational biology is gone.

…and the rest of the world, which I’ve kept at bay until now, clamors for my attention once more. 😛

  • Google Summer of Code: Yes, ladies and gentlemen, I am participating in it once again, and I can say without reservation that this will be my most enjoyable, and by far most challenging, project yet. I’m working on the Apache Mahout project, which is a machine learning toolbox built on the Hadoop map/reduce framework, essentially allowing algorithms that are typically very computationally intense to be parallelized. My project is to implement a new clustering algorithm, EigenCuts, on the framework. The professor who published the algorithm is a member of the UPitt faculty and a co-researcher with several of the PhD students in my lab, so I will have very experienced hands to help with the hardcore theory of the algorithm, plus the wonderful Apache Mahout community to help with the implementation 🙂 I’m implementing something that is highly coveted by some of my coworkers, and it’s extremely relevant to the work I’ll be doing in the fall (and far, far beyond) as a PhD student m’self. Basically, it’s going to be an awesome summer.
  • Finish the drill: With GSoC on the horizon, it’s tough to stay focused on the fact that I still have a machine learning final exam on Thursday evening. It’s all the basics of machine learning, and I’m comfortable with a good deal of it – I have an A in the class as of right now. However, there are several concepts I’m still iffy about: Information Theory (basically, the properties and intuitions of entropy and mutual information), Hypothesis Testing (one-sided and two-sided), Expectation Maximization (the equations really mess with my mind), Reinforcement [Q] Learning (same problem as EM), and Genetic Algorithms (probably the least problematic for me than anything else on this list). Intuitively, I understand just about everything on this list, but in getting into some of the details it gets a little hazy.
  • Getting m’self organized: Then there’s lots of other little tasks that really, really need attending to. For one, I need to start packing. Commencement is in 13 days, and immediately upon the conclusion of the ceremony, my family and I will be departing Pittsburgh for the summer (yep, I’m living in the ATL for the whooooole summer!). For another, Google Codejam‘s qualification round is this Friday evening, so I need to be ready for that – starts at 7pm, and could go who knows how long. Finally, I need to find time to start setting up my development environment for GSoC. I’ve been meeting with professors, reading up on documentation, even poking through the code, but I need to be able to modify the codebase and rebuild the project on my own (obviously).
  • The Lady: I’m traveling to New York next week to see my beautiful girlfriend graduate after four brutal – and incredibly productive – years at NYU. It’ll be the last time (at least in the foreseeable future) we’ll get to be in New York, so it’s definitely going to be something to savor. Congratulations to her on a job amazingly well done!

I’m riding the downhill, but the end is still quite a ways in the distance, so this is no time to downshift. Got some pretty exciting stuff going on, so stay tuned 🙂


