Hi Nicolas,
> > Unfortunately, the web servers are freezing for some seconds from time
> > to time. The phenomenon occurs in unregular intervals (1-5 minutes)
> > indepentently on each server.
> I guess this issue is not related to LVS.
Noone knows ;) so I posted my questions to all lists related to the
cluster parts. But I was sure that answers on LVS list could be more
useful than answers on the Apache-users list ;)
> Linux generally freezes because of I/O operations, that is to say
> network I/O or more probably disk I/O.
>
> First thing is to try to identify which process(es) is(are) blocked on
> some I/O. There are many possible issues with apache+php+mysql trio.
I've tried to do an strace on the apache one time, but due to the
performance impact, the freezes occured less times then. I catched one
with 41 active apache processes, there were freezing
- 28 in select() on sockets
- 3 in read() from sockets,
- 1 in writev() to socket,
- 1 in shutdown(send) of a socket
- 1 in lseek() of a file,
- remaing tasks in userspace.
I've attached one of the straces below. Up to my understanding,
everything points to a network issue, either hardware or driver, doesn't
it?
> A 'vmstat 1' and 'pstree -p', before and during freeze, would be very
> useful to diagnose you problem. Also, please include some basic hardware
> information about your realservers.
Here's a 'vmstat 1' output, with timestamps added and formated suitable
for mail:
timestamp procs -----------memory----------
r b swpd free buff cache
---swap-- -----io---- --system-- ----cpu----
si so bi bo in cs us sy id wa
12:55:15: 0 0 8004 20632 179640 513564
0 0 140 0 2488 1487 25 12 63 0
12:55:16: 1 1 8004 21160 179656 513644
0 0 92 0 2859 1795 26 14 60 0
12:55:17: 0 0 8004 20976 179688 513692
0 0 76 0 3110 1760 35 9 56 0
12:55:36: 72 76 8004 21424 179740 513720
0 0 44 2892 2805 1421 25 16 59 0
12:55:37: 44 0 8004 7596 178164 504112
0 0 980 12792 32140 14528 16 6 78 0
12:55:38: 2 0 8004 8248 177952 503232
0 0 192 0 6297 2909 65 17 18 0
12:55:39: 4 0 8004 8360 177876 503392
0 0 340 0 4019 2312 36 17 48 0
I've noticed the decreasing cache memory usage every freeze.
Realservers hardware
====================
CPU: Intel P4 2.7GHz
Board: Tyan (?)
Mem: 1GB (2x512MB PC266 CL2)
HD: IDE Maxtor 6Y080P0, 80GB, UDMA 5, readahead 8, 16 bit IO
Network:
private: Intel 82540EM Gigabit (e1000 driver)
public: Intel 82801BD PRO/100 VE (e100 driver)
Kernel: 2.4.20 (Debian testing)
libc6: 2.3.2 (Debian testing)
Apache: 1.3.29 (vanilla)
PHP: 4.3.3 (vanilla)
Apache DocumentRoot and LogDir on separate ext3 partitions.
A NFS mount is also rarely accessed by PHP for some centralized stuff
(but I've seen no related access before, during or after the freezes)
I use the noarp module to prevent real servers arp reply to the VIPs.
strace output
=============
During the freeze, most of the apache processes freezed for about 20
seconds between 14:01:22 to 14:01:43, but this one was a bit longer:
14:01:06 write(3, "HTTP/1.1 200 OK\r\nDate: Wed, 29 O"..., 2465)
= 2465 <0.000032>
14:01:06 gettimeofday({1067432466, 38726}, NULL)
= 0 <0.000006>
14:01:06 times({tms_utime=77, tms_stime=34, tms_cutime=0, tms_cstime=0})
= 7367521 <0.000006>
14:01:06 shutdown(3, 1 /* send */)
= 0 <0.000009>
14:01:06 select(4, [3], NULL, NULL, {2, 0})
= 1 (in [3], left {1, 470000}) <37.634582>
14:01:43 read(3, "", 512)
= 0 <0.000013>
14:01:43 close(3) = 0 <0.000022>
14:01:43 rt_sigaction(SIGUSR1, {0x4004a600, [],
SA_RESTORER|SA_INTERRUPT, 0x401b35f8}, {SIG_IGN}, 8)
= 0 <0.000008>
14:01:43 close(3)
= -1 EBADF (Bad file descriptor) <0.000006>
14:01:43 semop(327681, 0x8086f74, 1)
= 0 <0.000010>
14:01:43 select(24, [19 20 21 22 23], NULL, NULL, NULL)
= 1 (in [22]) <0.000014>
14:01:43 accept(22, {sa_family=AF_INET, sin_port=htons(2600),
sin_addr=inet_addr("xxx.xxx.xxx.xxx")}, [16])
= 3 <0.000017>
I hope you or someone else can see something useful out of this
information... I'm at my wits' end and pressure from boss is increasing
from freeze to freeze...
Thanks,
Jan
|