Hello Sebastien,
Ah, I'm afraid you need to enable some debugging flags in your kernel
(CONFIG_FRAME_POINTER, CONFIG_DEBUG_SPINLOCK_SLEEP,
CONFIG_DEBUG_SPINLOCK, and of course CONFIG_DEBUG_KERNEL).
Greping .config produces the following :
# CONFIG_FRAME_POINTER is not set
CONFIG_DEBUG_SPINLOCK_SLEEP=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
How can that be ??
CONFIG_DEBUG_KERNEL=y
Well, maybe you can't recompile your kernel but then you have to hook up
a serial console to your crashing machine. Also make sure you're not
running in X while this happens. But I assume someone running a server
would not install X anyways ;).
A bit long, sent privately ...
I've seen it and I don't really think there's anything special in it.
o biosdecode
command not found
Not so important, I saw the relevant pieces in the dmidecode output.
o .config (evt. zcat /proc/config.gz)
file not found
You must have some .config file for your kernel configuration.
Don't know, I've only seen one stack trace ... and still hoping it won't
crash again.
I thought you had multiple crashes already?
First problem, Alt-SysRq-t produces tons of output. I can only copy the
few ending lines. Second problem, I don't know how to use ksysoops.
You then must hook up a serial connection to your machine. ksymoops is
self-explanatory, just dump your output into a file and run ksymoops < file.
Yes, for me the following line in the bootloader configuration helped:
append="pci=noapic"
I'll try that if it crashes again (which is likely as I haven't change
anything kernel related since the last crash).
The reason for this is that ACPI itself is rather flaky and having the
PCI routing go over the APIC can cause major havoc on newer
motherboards. Intel is working on providing the necessary patches but as
it seems that ACPI specification is not exactly a piece of cake, plus
there's probably not two motherboard manufacturers that interprete it's
implementation in the same way. Since you've got a NIC which has shown
major issues of stability in the past (as also noted by others in this
thread) I suspect this could be the problem.
The bit of you saying that the end of the trace showed IPVS related
information is another indication that on the top stack it would have
been a call to the networking API and then onto the NIC driver's hooks.
It's all speculation of course but we have given you a few suggestions
on how you can narrow down the cause of those misfortunate events.
Best regards and good luck,
Roberto Nibali, ratz
--
echo
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
|