If you’re reading the title of this post and wondering what the crap I’m going on about, I’d like to share with you one of my favorite quotes from the wonderful, awesome, and fantastic PBS TV show Arthur:
D.W.: “Speaking of my birthday…”
Arthur: “We’re weren’t talking about your birthday.”
D.W.: “Well now that we are…”
Pretty much the best segue ever. So before I get to new year’s shenanigans, I’d like to share a little about what I spent my time working on in the weeks leading up to the holiday season.
It’s not working, it’s not working…HOLY CRAP IT’S WORKING
I may or may not have mentioned previously (and I’m too lazy to look) that I spent most of November and early December drawing up a massive grant proposal with CMU@Qatar to unify their cloud computing hardware with our software engineering / computer vision / statistics / biomedical expertise. Part of that proposal was a small venture–a “technical report” (TR), as the jargon goes–that implements a very small slice of the overall proposal, and more subjectively shows that we do, in fact, know what we’re talking about.
The TR was a source of unending frustration. After pulling numerous all-nighters just to get the original 20+ page proposal rolled out, we had no time to rest before diving into this thing. To add insult to injury, while the amount of writing required was less than that of the proposal, we actually had to code something and show meaningful results.
So we got to work: our goal was to have a MapReduce chunk of code working on a standard Mahout and Hadoop setup. We’d written some code awhile back in Matlab and Python which could take a video of moving cilia, do some freaking cool optical flow calculations, and make a very good estimate of the frequency at which the cilia were beating.
In fact, if you’re interested, here’s a link to the Python code (scroll down to where it says “STEP 7b” in the comments–that explicitly generates the ciliary beat frequency, or “CBF”). The goal was to rewrite this Python code in a MapReduce form that could be run in a distributed fashion on a Hadoop cluster.
Wellllllll, it didn’t start off particularly well. In fact, of the 14 days we had between the submission of the proposal and the deadline for the TR, we spent the first 10 days barking up not only the wrong tree, but within the wrong continent. Without [hopefully!] getting too technical, we wanted to use autoregressive (AR) models to represent the cilia’s behavior, and use the parameters of the models to calculate the frequency. Apparently, AR models suck at this use-case for anything but perfectly periodic data.
This took us 10 days to figure out, and explained why the results we were getting seemed to have absolutely no correlation with the “true” answers whatsoever. At which point, we were stuck: what do we do now?
I remember this vividly: my advisor and I met in his office, and we settled on a significantly-less-elegant algebraic solution, opting more or less to port the Python code we’d written to MapReduce: imagine simply having each individual node calculate the frequency of a single pixel. That was our approach.
It was just a few days to deadline, and we decided I’d spend the next 4-6 hours hacking away, and if at the end I still had zip, we’d throw in the towel and move on to better things.
In order to justify any results I got, they had to compare to the results we got when using our Matlab and Python pipelines, as we’d been assured by the folks who supply us with the data that they were spot on with manual conclusions. With that in mind, here are the results graphs for our Matlab and Python pipelines (respectively):
Pretty much identical. Imagine the depths of my shock when, after only a 4.5 hours of hacking away in Java, creating an entirely new and self-contained Mahout driver, and successfully running it on the second try (first try failed out with some sort of input format failure; a very basic and stupid mistake that was easily corrected), the result it spat out was this:
And to answer the inevitable pedantic question: yes, I know there’s only ever 1 Reducer, it’s not exactly production-level code 😛 But it works, and really freaking fast, too.
Holidays, traveling, new year’s, and GETTING FAAAAT
This year’s Christmas was a little different. With my older sister in Morocco, and my parents having moved from Atlanta to Athens, the circumstances weren’t exactly familiar. But to be incredibly sappy for one second: my only requirement for a fantastic holiday season is the people. And even though we were missing a big part of our family (we’re all pretty big parts…our family is somewhat energetic), it was still fantastic.
Following Christmas, my family and I piled into a van and drove 14+ hours to Illinois to visit our relatives for a few days. As always, it was great to see everyone again; particularly as the years pass–and especially now that I’m down to only one grandparent–that drive, while long and somewhat stressful, will always be worthwhile.
Well…I suppose I should qualify that final statement with one minor caveat: I don’t think anyone will disagree with me when I implore everyone to put family first during the holidays. I have no doubt that some situations are more difficult than others, but if there was ever a time to set aside petty bickering, it’s the holiday season.
Trying to convert your extended family to fundamentalist faiths, when already fully aware said family won’t budge, would fall under the category of “petty bickering”. That’s all I’m going to say about that.
I departed from Illinois out of Midway and arrived in Pittsburgh just in time to hitch a ride with the Lady to her place in Ohio for some good-old-fashioned New Year’s celebrations involving tons of junk food, lots of video games, and most importantly, NERTZ!
For anyone who has never played this game: PLAY IT NAOW. In particular, once you start exceeding two or three people at once, this game gets insane. Think “multiplayer speed solitaire with a swearing problem.” You’ll curse about as often as you do when you play Mario Kart. It’s the ultimate New Year’s game: for the past two years in a row, we’ve played for at least 5 hours straight. It’s addicting, it’s frustrating, it’s invigorating, and it’s unbelievably fun.
I’ve been back in the ‘burgh since this past Sunday, and I have a to-do list that takes up my entire marker board. Every time I scratch one item off, I add another one. But I’m making progress! I specifically set up my vacation to not involve anything research-related, fully aware that I’d pay for it later, but man it was nice not to worry about that stuff for a couple weeks. And frankly, after the breather, it’s nice to be back to work again.
Here’s to 2012!…even though it’ll all come to a screeching halt on Dec. 21!