I've seen several 2.2.12/2.2.13 machines lose their network connections after
a long period of fine operation. Tonight our main LVS box fell off the net. I
visited the box, it had not crashed at all. However, it was not communicating
via its (Intel eepro100) ethernet port.
The evil evidence:
eth0 Link encap:Ethernet HWaddr 00:90:27:50:A8:DE
inet addr:172.16.0.20 Bcast:172.16.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:15 errors:288850 dropped:0 overruns:0 frame:0
TX packets:2147483647 errors:1 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
Interrupt:10 Base address:0xd000
Check out the TX packets number! That's 2^31-1.
Prior to the rollover, In-and-out packets were roughly equal.
I think this has happened to non-LVS systems as well, on 2.2 kernels.
ifconfigging eth0 down-and-up did nothing. A reboot (ugh) was necessary.
This is absolutely dreadful. Is this 'feature' a property of the eepro100
driver, or something in the kernel itself? Either way, it's a killer.
|