A case for unit testing

So my project of late is to get the 2.0-iteration of the vaunted Twitterbot off the ground. It’s gone through a complete revamp of the codebase and is (I think) a much, much better architectural approach making such a thing extensible and robust.

I think the two greatest improvements, from a high level, are 1) moving the data-aggregation out of specific Actions and into its own independent Action that runs continuously, and 2) implementing process management so each Action can run in its own “sandbox”, so to speak.

The major drawback to these improvements, however, is that this stuff is not trivial to debug, which is where I’m currently stuck. The basic structure is that there’s a main and fairly transparent “wrapper” that more or less just loops indefinitely, paging through each Action and asking when it’s supposed to run. If there’s an Action that’s supposed to run, the wrapper forks off a new child process for it. Once the Action completes its run, it ends and the wrapper cleans up the process.

Here’s the main loop (in twitterbot.php) :

    while (1) {
      // do we exit?
      if ($this->exit) {
        return;
      }

      // determine the next action that should fire
      $now = time();
      $action = $this->next();
      if ($action == null) {
        // in this case, just the aggregator is running,
        // so we can in fact safely quit!
        $this->exit = true;
        continue;
      }
      if ($now < $action->getNextAttempt()) {
        // sleep until the next action has to fire
        sleep($action->getNextAttempt() - $now);
        continue;
      }
      $action->setNextAttempt();
      if ($pid = pcntl_fork()) {
        // parent process
        $this->current[$pid] = $action;
      } else {
        // child process
        pcntl_alarm($action->getTimeout());
        exit($action->run());
      }
    }

As you can see, this loops indefinitely and checks for any Action that is ready to run (if none are ready, it sleeps until they are). When an Action is ready, it forks off a child process for that Action, and then returns to indefinite looping. There are signal handlers prior to this loop set up to catch any exit signals, at which point this loop gracefully terminates via the $this->exit boolean.

The aggregator–responsible for constantly reading and saving Twitter posts–also has its own signal handlers, since it runs outside the indefinite loop’s process.

So…how do I make unit tests for these sorts of things? How can I unit test the process forking and signal handling? How can I implement a mock MySQL database so as to have total control over the state of the database (PHPUnit seems to do something along these lines, but the documentation is near-nonexistent)? How can I provide a fake stream for the aggregator to siphon from?

There’s still a bunch of other stuff that isn’t implemented terribly well in this bot, but it’s lower priority than getting unit tests up and running, since all the other major functionality is implemented at this point.

  • Database connection management for every Action. Since each Action (most likely!) depends on tweets stored by the aggregator, it needs to establish a connection with the database to read those tweets. Unfortunately, due to the nature of process forking, these database connections have to be made after the process is forked. Which makes for an awkward situation, given that the main process has to first construct all the child Actions so they can provide the loop with information about when they’re going to execute next. My current fix is to have each Action check, at the start of its run() method, whether or not the connection has been initialized, and if not, to do so. But that’s awkward at best.
  • Aggregator process management. This thing is in a somewhat awkward position as well. It is the Phirehose project, and it works exceptionally well and is very stable, but fitting it into the overall architecture of this program was a little difficult. It fit best as yet another entirely separate daemon process, but doing so introduces the difficulty of shutting it down. Currently, when started, the run.php scripts stores both process IDs of the loop and the aggregator in a text file so it can kill them off later, but this feels a bit awkward, particularly from an error-handling perspective. I would prefer the text file store only a single process ID, which when killed destroys all the other processes as well. But since the aggregator is forked off before the loop even begins, there’s no way for the loop to handle the destruction of the aggregator unless its constructor is explicitly fed the aggregator process ID. Which would be odd.
  • Other random stuffs…

Also, I love stories like this one:

Advertisements

About Shannon Quinn

Oh hai!
This entry was posted in Programming, twitterbot and tagged , , , , , , , , , , , . Bookmark the permalink.

6 Responses to A case for unit testing

  1. Steve says:

    When I built the USPS calculator, I had to mock the calls to the web service. I used simpletest to do it (http://www.simpletest.org/) and it worked out nicely. I had to refactor a bunch of the code so upon calling, I could inject the values it was expecting.

    • magsol says:

      Ahhh, mock objects! Could very well be the answer here. The only problem is I don’t really use curl, at least not explicitly; I use a TwitterOAuth implementation to interact with Twitter (another git project), and Phirehose (yet another git project) uses sockets to access Twitter’s Streaming API to aggregate posts. Making mock objects out of TwitterOAuth would be ideal; Phirehose I need to explicitly test, specifically in the case of sending a signal to quit the application.

      Anyway, just spitballing here. These are really useful resources; simpletest seems to have significantly more complete documentation than PHPUnit, so I may switch to that. Thanks very much d00d!

      • Steve says:

        FYI, CurlService is just a wrapper for the actual call to curl_exec(). You could look at wrapping the TwitterOAuth, Phirehose and the DBStorage objects and creating them when you instantiate the script. It could then be used as part of the function, runAggregator(oauth, phire, db) or something like that. Because the loop() is more structural and dependent on the overall structure, you’ll have to abstract that out into something where you can fake it’s value.

      • Steve says:

        I’ve updated the gist to show the CurlService.php

  2. I don’t know very much about PHP or MySQL (read: I’m a blank canvas for painting), but I have done a fair amount of unit testing along the lines of breaking apart very complex tasks into very easy to understand and unit-testable code, so here are a few questions for you (in response to your other questions):

    How can I unit test the process forking and signal handling?
    So, you’ve got 2 tasks: Process Forking and Signal Handling. It seems very possible to create a dummy process that, when the aggregator forks it off as a child process, puts out a certain signal. Think simple: the first thing this process does after it’s been forked is send a signal to you that says “Hey, I Succeeded!” in the most stupid way possible. Remember: it does not need to be complex at all. These are unit tests, they are not what you’re delivering to a client, they’re what you use to verify functionality.

    How can I implement a mock MySQL database so as to have total control over the state of the database?
    Again: think simple. Then, think stupidly simple. I don’t know much about databases (they’re bases…for you data….right?), but I guessed that loading a database from a flat text file could very well be an option. If you need something complex in your database, start simple, make a unit test for it, and then add more complexities and unit tests together so you can be sure everything’s working as it should. Keep separate files of your tests (no need to ruin previous work!), and, as always, use source control (I’m assuming that’s already the case).

    Hopefully that helped. I’ve found that, when creating unit tests, it pays to act like the really stupid developer that has no mind for higher level details and can only do one thing at a time. If you have to think too hard on any one detail, the structure is too complex.

    • magsol says:

      The database I should be able to manage, I think: most PHP unit-testing frameworks have built-in some sort of mechanism for creating (like Under mentioned) mock database connections that won’t actually change the internal state of the *real* database, just test that connections are created and destroyed correctly and that certain conditions based on the connection’s return value are tested. The hard part is the process management.

      The reason it’s a hard part is because all the issues I’ve run into with implementing the process management have been consequences of the database itself; apparently database handles don’t like being shared across processes, so much so that they literally die the moment they’re accessed by a child process. Which makes things highly coupled and very difficult to debug, much less unit test. I like the general approach of “stupid simple”, but I’d have to sit down and think about exactly what that would entail under these circumstances. Right now, the bot *seems* to function correctly, but until I have unit test coverage I’m not considering it a robust application.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s