[Twisted-Python] Re: strange server crash

Mon Mar 23 15:23:32 EDT 2009

"Alec Matusis" <matusis at yahoo.com> writes:

> I have tested for bad RAM when the server was installed 3 month ago,
> I did a memtest that run for a day.  This is an 8 core server, and I
> run one twistd process per core, and surprisingly, only one crashed
> out of 8. Would you think that the effect of bad RAM would be
> confined to just once server out of 8?

Sure, since it totally depends on just what bit of memory was faulty.
Just about any behavior you can think of can happen.  It could be a
simple, quick SEGV in a single process, an inexplicable failure at
some point post-corruption if the corrupted memory wasn't in code, up
to a full kernel crash if it just happened to hit a key kernel
structure or code path.  And if the memory isn't a hard failure you
can then keep running until the system just happens to use it again.

Of course, that's not to say the problem has to be memory or hardware
related, but the more inexplicable the behavior and/or system state at
the point of crash, the more I'd be inclined to consider it.

I will also say, that for some reason, I've run into more memory chip
failures in systems in the past year or so than in at least the prior
5+ years, using consistent, name brand, sources for the chips.  Maybe
the increasing densities (as well as most non-server systems still
non-ECC) or heavier use systems may make of memory might be at fault,
but regardless I'm more likely nowadays to consider faulty memory and
run a scan than I would have been a few years ago.

-- David