On 8/18/06, Graeme Fowler <graeme@xxxxxxxxxxx> wrote:
Beyond ensuring that the machines' network settings are good, that
they're not accumulating errors at the hardware level (check ifconfig
output), and that they're not interrupting themselves off the planet
(/proc/interrupts is a good place to start), I have no idea.
Hi. Sorry for the late reply. Work work work and no play.
I've checked ifconfig output and see this:
eth2 Link encap:Ethernet HWaddr 00:02:A5:08:E3:73
inet addr:172.16.0.102 Bcast:172.16.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:76727063 errors:3044 dropped:0 overruns:0 frame:3044
TX packets:76774485 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3882941796 (3703.0 Mb) TX bytes:3900571779 (3719.8 Mb)
eth2 Link encap:Ethernet HWaddr 00:02:A5:09:79:CD
inet addr:172.16.0.101 Bcast:172.16.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1432209 errors:156 dropped:0 overruns:0 frame:156
TX packets:1432784 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:232381530 (221.6 Mb) TX bytes:230872325 (220.1 Mb)
Now i don't know for sure where the errors come from, or what 'frame' means,
but i'm sure it's not very good. I've looked into /proc/interrupts and i see
that on one box all nics are sharing int15, on the other int11. But there's
a huge number in front of the interrupt that keeps changing (increasing). I
suppose that's not very good either:
11: 369029512 XT-PIC eth2, eth0, eth1
15: 3131945 XT-PIC eth2, eth0, eth1
It still sounds to me like the fault lies below the application layer.
Speaking of interrupts; you say you have eth0/1 bonded. Please make sure
that you haven't got several hundred megs worth of traffic looping
I would love to, but i don't know how.
around your ethernet because of that. If you have you could be dropping
packets simply because your kernels cannot keep up with the traffic load
- a layer 2 loop somewhere could cause an effective DoS condition like
this quite trivially.
What mode is your bond interface in?
active-slave
I've never used heartbeat, so I can't really suggest anything else.
Anyone else got any clever ideas?
Graeme
Thanks,
Sebastian
|