It’s the most wonderful time of the summer

I’ve noticed an interesting trend in this blog’s visiting statistics of late: quite a few more hits as a result of programming searches – and, specifically, Google’s Summer of Code – are referring people to my blog.  It makes sense, given that Google just completed their application period for mentoring organizations, and will be posting the list of open source companies participating in this year’s rendition on March 18 (tomorrow!), followed very quickly with the official opening of the application period for students on March 23.  Go visit the site for more details! [shameless plug]

Still, I thought I’d respect the wishes of the viewers and post about my latest undertaking over spring break: I’m well into the process of converting my WordPress Synchronizer into an official WordPress plugin.

This project started a few years back.  I’ve been running a local install of WordPress on my MacBook since May of 2006 as a little place to jot down notes, funny observations, reminders…just about anything.  Obviously, this sort of mobility makes it very easy to keep notes – I don’t even need an internet connection.  The problem arises when I want to show particular entries to…particular people. 😉  I didn’t really have a way of doing that, aside from opening up my blog to the world, which I wasn’t really interested in doing.

Since then, I’ve come up with and implemented a few solutions that have worked fairly well up to now.  The idea behind solving this problem has generally remained the same: set up a remote blog at a static URL, implement some sort of authorization system for viewing that blog, and then selectively sync entries from my local blog to my remote blog.

Seems simple enough.  What has changed enormously is the method of implementation.  As you may gather from reading my Wiki entry, the first version of the synchronizer was very crude and extremely error-prone.  It made raw SELECT queries to the database, inserted the information into an HTML form, and POST-ed the data to a specially crafted PHP page on the other end that would parse out the data and feed it via INSERT statements into the remote blog.  Robustness wasn’t exactly what this first attempt was known for, but it worked at the time.

(Un?)fortunately, what killed this iteration was a cataclysmic hard drive failure.  Rather than go through the arduous process of inventing custom POST fields and data validation, I took another route in re-implenting this functionality: XML-RPC.  WordPress implements a large range of APIs – Blogger, MetaWeblog, MoveableType, etc – and utilizing this interface would be a great deal simpler, more secure, and much more robust.  Plus it wouldn’t require any specialized receiving functionality on the remote blog.

This worked decently well for awhile.  Unfortunately, a recent release of WordPress broke this functionality.  I had made the (apparently fatal) assumption that posts accumulate linearly-discernible post IDs; that is, a new post will have a post ID of the previous post, plus 1.  A recent upgrade of WordPress changed that, assigning unique post IDs incrementally to drafts and post edits, leaving large post ID gaps between final, published entries.  This threw my synchronizer out of alignment, as the post IDs of the remote blog no longer matched up with those of the local one, so posts started duplicating themselves remotely.

So rather than continue to wrestle with third-party woes, I decided to take it one step further: integrate it as a full-fledged WordPress plugin. This would allow use of the internal API and security checks, further alleviating detail-induced headaches and giving me a very nice sandbox to play in.  The only downside has been the learning curve for familiarizing myself with the WordPress insides, and how “actions” and “hooks” are registered with the system.

Oh yes, I spent a few hours wading through the WordPress Codex, learning how plugins are “hooked” into the system:

add_action($action_name, $callback_function);
add_filter($filter_name, $callback_function);

They function very similarly. “actions” are performed after WordPress itself takes some specific action – posting an entry, updating a page, etc – while “filters” are essentially data intermediaries that sit between the browser and the database. Yes, their names speak for themselves. 😛

Beyond that, there was still the issue of creating menu items in the administration panel, which can be done by:

add_menu_page($pagetitle, $menutitle, $accesslevel, $file);
add_submenu_page($parent, $pagetitle, $menutitle, $accesslevel, $file);

There are also some specialized functions which are equivalent add_submenu_page() with specific existing menus (like Options, Management, or Themes) in the place of the $parent parameter.

So now I can hook functionality into WordPress through the hook callbacks, plus provide menu options in the administration panel for configuring the plugin itself. At the moment, I’ve completed the configuration setup page, and have the templates around the pages where the actual data synchronization is performed. Unfortunately, there are three major hurdles left to scale before implementation can be completed.  To wit:

Data Representation.  Coming up with a sound way of representing synchronized entries in the database is proving diffiult.  Previously, I had simply stored the date of the most recent synchronization, and any entries with timestamps occurring afterward were prepped for remote sync’ing.  However, this doesn’t allow for selective synchronization – picking and choosing which entries are sync’ed, and which are not.
Process Efficiency.  The easiest way to synchronize two data sets is to compare all the elements of both sets.  The problem here is that, with blogs, the data sets are collections of entries, and with time those sets will only increase in size.  This means the synchronization process will suffer progressive performance degradation.  At present, I already have over 3000 entries, and the XML-RPC package crashes when I try to retrieve every single one at once.
Remote Rules.  What assumptions can I logically make about the remote blog?  Should I assume it won’t be touched other than when synchronizations occur?  Should I assume it’s just another regular blog that is regularly updated, but which occasionally receives entries from another blog as well?  These sorts of questions, depending on their answers, introduce a whole host of implementation concerns – most notably, how do I classify a “unique” entry when synchronizing it?

These are challenges that are helping shape plans for my content management system, which I am planning on “breaking ground” with during the summer at IBM in Raleigh (on weekends, of course!).  Since I will be at ExtremeBlue, I will not be participating in this year’s Summer of Code (much as I wish I could).  I mean, I could, but I suspect I wouldn’t have time to sleep or eat.  Perhaps next summer will feature my return to open source programming – Apache, Joomla!, Drupal, Pidgin, and WordPress are doing some pretty amazing things with which I’d love to become involved.

Apply for Google Summer of Code!  You won’t regret it.  Particularly if you apply for Joomla! and your mentor is Amy. 🙂 Leslie Hawthorn and the rest of the Google team is spectacular.

As for me…time to get back to coalescent theory, computational learning theory, and tagalongs (yes, the Girl Scout cookie, buahaha).


About Shannon Quinn

Oh hai!
This entry was posted in Blogging, GSoC, Internet, Programming and tagged , , , , , , , , , , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s