Re: Lost packets and dead/warntime

To:	"LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject:	Re: Lost packets and dead/warntime
From:	"Sebastian Vieira" <sebvieira@xxxxxxxxx>
Date:	Fri, 1 Sep 2006 10:27:28 +0200

On 8/18/06, Graeme Fowler <graeme@xxxxxxxxxxx> wrote:


Beyond ensuring that the machines' network settings are good, that
they're not accumulating errors at the hardware level (check ifconfig
output), and that they're not interrupting themselves off the planet
(/proc/interrupts is a good place to start), I have no idea.



Hi. Sorry for the late reply. Work work work and no play.

I've checked ifconfig output and see this:

eth2      Link encap:Ethernet  HWaddr 00:02:A5:08:E3:73
         inet addr:172.16.0.102  Bcast:172.16.255.255  Mask:255.255.0.0
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:76727063 errors:3044 dropped:0 overruns:0 frame:3044
         TX packets:76774485 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:3882941796 (3703.0 Mb)  TX bytes:3900571779 (3719.8 Mb)


eth2      Link encap:Ethernet  HWaddr 00:02:A5:09:79:CD
         inet addr:172.16.0.101  Bcast:172.16.255.255  Mask:255.255.0.0
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:1432209 errors:156 dropped:0 overruns:0 frame:156
         TX packets:1432784 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:232381530 (221.6 Mb)  TX bytes:230872325 (220.1 Mb)


Now i don't know for sure where the errors come from, or what 'frame' means,
but i'm sure it's not very good. I've looked into /proc/interrupts and i see
that on one box all nics are sharing int15, on the other int11. But there's
a huge number in front of the interrupt that keeps changing (increasing). I
suppose that's not very good either:

11:  369029512          XT-PIC  eth2, eth0, eth1

15:    3131945          XT-PIC  eth2, eth0, eth1


It still sounds to me like the fault lies below the application layer.


Speaking of interrupts; you say you have eth0/1 bonded. Please make sure
that you haven't got several hundred megs worth of traffic looping



I would love to, but i don't know how.

around your ethernet because of that. If you have you could be dropping

packets simply because your kernels cannot keep up with the traffic load
- a layer 2 loop somewhere could cause an effective DoS condition like
this quite trivially.

What mode is your bond interface in?



active-slave

I've never used heartbeat, so I can't really suggest anything else.

Anyone else got any clever ideas?

Graeme



Thanks,

Sebastian

<Prev in Thread]	Current Thread	[Next in Thread>
Re: Lost packets and dead/warntime, Sebastian Vieira <= Re: Lost packets and dead/warntime, Sebastian Vieira Re: Lost packets and dead/warntime, Michael Gale Re: Lost packets and dead/warntime, Sebastian Vieira Re: Lost packets and dead/warntime, Sebastian Vieira

Previous by Date:	Re: max persistance time?, Joseph Mack NA3T
Next by Date:	Re: Lost packets and dead/warntime, Sebastian Vieira
Previous by Thread:	Re: max persistance time?, Joseph T. Duncan
Next by Thread:	Re: Lost packets and dead/warntime, Sebastian Vieira
Indexes:	[Date] [Thread] [Top] [All Lists]