Process of Elimination

(this is a continuation of the post about computer issues I made last month)

I am still having problems.  The same problems, in fact: random freezes that tend to correlate with periods of intense computational activity (gaming, stress testing, multiple virtual machines, etc).  After a CPU replacement, memory tests, and two motherboard RMAs, I am wiping the slate clean and starting from scratch in trying to determine this problem.  I’m laying out all the culprits that could be causing freezes like this, and I’ll examine the plausibility of each one given the tests I have performed so far.

I’m posting this here so that anyone with any suggestions can toss out their $0.02.

Software

  • Windows XP
  • Windows Vista
  • Hardware driver
  • Other (video games, applications, etc)

Hardware

Now. Let’s dissect these one by one, examining the plausibility of each given what has been done so far.

Windows XP: Negative

This was the first operating system I installed after assembling Ronon, and the freezes were present here. Plus, I have not used XP since mid-fall.

Windows Vista: Very Unlikely

The freezes have continued to plague this operating system as well as XP, implying that the problem is operating system-independent. Still, tests with another operating system like Ubuntu may be prudent.

Hardware Driver: Unlikely

This is always a possibility. There could be some driver that is misbehaving and causing the freezes. Still, the pattern of freezes – the fact that they disappeared for months, then reappeared without any changes in the software configuration, plus the fact that freezes are more prevalent during periods of high activity – implies a cause that is elsewhere.

Other: Negative

There is a very, very slight possibility that this is the case, but it’s extremely remote. The fact that the freezes have occurred while various programs have been running eliminates any single program as the cause. Freezes have occurred with and without web browsing, with and without VMs running, and so on.

Power Supply: Possible

It’s possible that a loose cable, or even microsurges from a faulty power supply, are the causes of the problems. This could be examined by tightening all the power supply connections to the motherboard, and accessing the BIOS to ensure that proper voltages are being sent to all the other components (RAM and CPU in particular). Still, a faulty power supply does not usually result in infrequent and unpredictable freezes; it usually causes total system failures.

Memory: Very Unlikely

This is a possibility, as freezes are a classic symptom of faulty RAM. However, in all likelihood, if this was the case it would been highlighted after running memtest86 for hours straight on the memory; those tests revealed nothing. Furthermore, using only one of the two RAM sticks, and swapping both out in between crashes, indicated nothing – either both are faulty or neither of them is. Adjusting voltages manually in the BIOS has also not yielded any change. Still, I’ll be returning these as per Corsair’s generous RMA offer.

Graphics Cards: Very Unlikely

Like memory, a classic symptom of misbehaving graphics cards is random freezes and crashes. Swapping out both cards and using one at a time has been inconclusive; crashes have occurred with both cards individually, both cards together, and both cards in parallel (SLI). Either both cards are faulty, or it is a rare case of graphics card / motherboard incompatibility.  The latter seems far-fetched, given that the motherboard uses an nVidia chipset.

CPU: Possible

This was my first culprit, due to cooling issues I’d had early on. However, Core 2 Quads typically run hotter than most CPUs, and I also upgraded my heatsink in addition to RMAing my CPU with Intel. The replacement had no effect. The only symptom that points to a possible CPU issue now is the fact that dropping the FSB in the BIOS from its stock 1333MHz to 1066MHz stabilizes the system; raising the FSB to stock speeds with one DIMM results in immediate BSODs, and with two DIMMs results in the freezing that has characterized this whole issue.

IDE Components: Negative

This is an unlikely scenario anyway, since IDE drives typically don’t cause freezes like this. Still, even after unplugging their cables from the motherboard, the freezes continued.

Hard Drive: Negative

This is a weak possibility, as no clicking or grinding has been observed from the main hard drive. Testing could be done with the spare hard drive (also testing the OS postulate from before). Still, other than the freezes, everything operates normally, which is not typical hard drive misbehavior. Slowdowns would occur, and hangs would be much more frequent and random. As it is, freezes are random but gravitate toward periods of high processing activity, not necessarily high hard drive activity.

Motherboard: Possible

This was my chief culprit until the motherboard came back from Asus and the problem persisted. The motherboard was fixed (according to Asus) and returned. I find it unlikely that they would have returned it to me with the same problem, so either any problem was fixed, or there never was a problem. Still, the symptoms – freezing and BSODs – point to hardware problems, and since these can be incited specifically through removal of DIMMs and tweaking the CPU’s FSB, the common thread in those two actions is the motherboard.

Case: Very Unlikely

I’m actually not even sure what the case itself could be doing to crash the system, but given its role in housing the hardware, I suppose it’s possible that a short somewhere might be behind these issues.

Advertisements

About Shannon Quinn

Oh hai!
This entry was posted in Blogging, Technology and tagged , , , , , , , , , , , . Bookmark the permalink.

13 Responses to Process of Elimination

  1. magsol says:

    Update:

    I manually changed the voltages in the BIOS from AUTO to 2.11V (stock is 2.1V) for the DRAM. Freezes still occurred, so I set the timings as well, changing them from AUTO to 5-5-5-15 (stock for my memory). Freezes still occurred.

    At this point, it is very unlikely that the problem lies with the graphics cards, as freezing occurs with more frequency when I have VMs running, and those do not put additional stress the GPUs.

    Things still to try:
    -change the DRAM command rate from AUTO to 2T (stock for my memory)
    -change VCore voltage from AUTO to Q9300 stock
    -change DRAM tRFC from AUTO to 51-60 (stock for my motherboard)

    More suggestions are welcome!

  2. Colin says:

    Man you’ve got more patience than I do. I would have taken a baseball bat to it by now.

    • magsol says:

      I would say it’s because this was my GT graduation gift, but that would be a non sequitur. I would say it’s because I love computers, but nobody loves computers this much.

      If I’m honest with myself, there’s really no other logical explanation other than that I’m pretty much crazy. 😛

    • Matt Roe says:

      Yeah, my “process of elimination” would have been to eliminate the computer… violently.

  3. Matt Roe says:

    Hmm… Now that I think about it I think I had similar issues with the first computer I ever built back in high school. Random freezes and such. Fortunately before I could get too pissed or try any process of elimination the power supply got excited and let all it’s smoke loose.
    Now granted that was a really cheap PSU, but it seems that most of my subsequent weird computer issues could be tied back to the power supply. So much so that I keep an extra on hand for this sort of testing. I’d let you borrow it but… yeah.
    Good luck!

    • magsol says:

      I’m really leaning heavily towards the PSU myself. Corsair was kind enough to offer to replace the DIMMs simply for the sake of eliminating them as a potential candidate. That would leave only the motherboard and PSU suspect, and since Asus insists the motherboard is fine, well…

      I just did a hell of a lot of BIOS voltage tweaking last night, to no avail. Tweaked the DRAM, the CPU, the timings, the tRFC (didn’t even know what that was before now)…nothing. All I did was either drastically increase the frequency of freezing (as in, “Welcome to Win–“), or dial it back to 30 minutes successful stress testing before it conked out again.

      I got some advice from a Belgian friend of mine who is a hardware expert, and I have both a motherboard and PSU lined up for purchase by his recommendation on NewEgg, should I definitively prove one of them bunk, or I just get fed up and replace them both. So I guess I can rest assuredly, knowing that at the very least, in another month and $350 later, I’ll have a functioning desktop…

  4. Strully says:

    I’ma go with power supply- that’s been it for me (on desktops) in at least two separate cases. Also, I’m not sure what the voltage discrepancy to the RAM is all about, is it that not enough is being supplied? That looks like a thing too.

    • magsol says:

      The thing is, with faulty PSUs, you usually get complete failure to even boot, not intermittent freezes or crashes. If it is the PSU, it’d be the first time I (and numerous other experts to boot) have ever heard of such behavior.

      Most RAM requires 1.8V to function at stock speeds, so most motherboards supply that by default. My memory requires 2.1V (it’s uber high-performance stuffs). While motherboards these days are technically smart enough to figure out what voltages each component requires, sometimes it’s nice to help the board out a bit by giving it specific voltage levels, hence I set the RAM’s voltage manually to 2.11V.

      It had no effect, so the point is moot anyway, but just FYI.

  5. magsol says:

    Update:

    As per the requests of the members of the Corsair support forums, I have run multiple memtest+ runs on various memory configurations, all passing. I have also changed the voltages in the BIOS for HT and NB to 1.22V and 1.4V, respectively. The tests yielded no failures, and the voltage changes did not halt the freezings.

    However! Once I dropped the CPU’s FSB from its stock 1333MHz to 1066MHz (to match the memory speed 1:1), the system stabilized. I’ve been running it for days at full load and no freezes. Following some of the memory tests, whenever I would raise the FSB back to its stock 1333, the system would BSOD shortly after POSTing, just before reaching the Windows welcome screen.

    This is the first time I’ve been met with a BSOD. Furthermore, the system is perfectly stable with the FSB at 1066MHz (an effective 2.0GHz processor). Raising the voltage for the CPU does nothing.

    Furthermore, last night I installed an update to my nVidia graphics driver (182.06 to 182.08). For some reason, this stabilized my system somewhat at the stock 1333MHz FSB, but only temporarily. It will still freeze after several hours, but the BSODs are gone again, and it takes much longer for the system to freeze up.

    I am so confused.

  6. Pingback: Why I will never join the IT industry « Theatre of Consciousness

  7. Pingback: Why I will never join the IT industry » lolcat.us

  8. Pingback: One final technical appeal | theatre of consciousness

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s