I’ve been somewhat out of the loop (mentally and blog-ily) for the last several days on account of this operating systems project that continues to be a veritable black hole into which all my time is sucked.
I’m making substantial progress. But for the last 18-20 hours, I have been stymied by the same error that I cannot, for the life of me, fix.
For this project, I’m writing a multithreaded HTTP proxy (yes, in C…I know, I know, commence drooling). I’ve got it working almost perfectly…except for this one scenario where it CRASHES.
A la Windows.
Actually, it’s not even a scenario I can pinpoint. I’m making some small, minute mistake in my memory calculations, and eventually it kills the proxy. To illustrate this a bit more specifically, a small introduction to HTTP and its message structure is in order.
When a client issues an HTTP request (say, you type “www.google.com” into your browser), the request that is sent (the essential parts of it, anyway) looks like this:
GET / HTTP/1.0
The “host” indicates…the host. The “/” indicates the file to be retrieved (in this case, the main file, or whatever google.com decides what the main file is).
And for those of you don’t habla español, el niño is spanish for…the niño!
Ok I confess, I’m operating off four hours of sleep. And five the night before that. So bear with me.
The response the server sends back (in addition to the actual webpage itself) looks like this:
HTTP/1.0 200 OK
Simple enough. Now, to make things a little more complicated, throw a proxy into the mix. This sits in between the client (you) and the server (Google) and routs the requests between them. From the server’s perspective, the requests don’t look any different. But that’s because the proxy (ahem) “tampers” with the requests. Kind of. Sort of.
Here’s what the same request as before looks like when it goes from the client to the proxy:
There’s an absolute URL after the “GET” instruction. It’s the proxy’s job to strip this out before passing the request on to the server, so the server sees a header that looks just like the first one.
Enter my problem. *whine*
I use straight memory manipulation to cut out that absolute URL (memcpy(), to be exact), and even though I’ve quadruple-checked the arithmetic, I still think the calculations are incorrect: it will (at fairly random points but overall very early on) core dump (with a SIGABRT) with a stack trace and notifications from glibc that vary between “double free()” and “corrupted malloc()”.
The culprit, I think, is the function I wrote to hack the absolute URL out of the header, stripAbsURL(). It essentially works by physically duplicating every bit of the original header, except for the absolute URL, effectively stripping it out. It then returns this result.
For anyone who is interested, here’s the portion of code that Valgrind (a memory checker) keeps flagging as problematic:
start = strchr((char*)header, ' ');
end = ((char*)start) + strlen(toFind) + 1;
memcpy(newHeader, header, ((char*)start - (char*)header + 1));
memcpy(((char*)newHeader + ((char*)start - (char*)header + 1)), end, (oldLength - (strlen(toFind) - 1)));
“header” points to the response header from the server. “toFind” is a duplicate string of the absolute URL so its length can be determined without having to search “header” for it. “oldLength” is the number of bytes (or characters) in “header”. The rest should be (eventually) self-explanatory.
The proxy crashes shortly after this function is called. It looks something like this:
newHeader = stripAbsURL(header, oldLength, newLength);
free(newHeader); /* CRAAAAASHHHHSDfjskfjga;;dsfkfk/////~~~splat */
Meanwhile, with all the sleep I’m losing on this project’s behalf, my own memory is segfaulting left and right…
Oh. It’s my 23rd birthday in two days. Where’s my cane?
You know who you are. 😉