LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: LVS/Apache cluster freezes from time to time

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: LVS/Apache cluster freezes from time to time
From: Jan Abraham <jan_abraham@xxxxxxx>
Date: Fri, 31 Oct 2003 14:25:29 +0100
Hi Nicolas,

> > Unfortunately, the web servers are freezing for some seconds from time
> > to time. The phenomenon occurs in unregular intervals (1-5 minutes)
> > indepentently on each server.
> I guess this issue is not related to LVS.

Noone knows ;) so I posted my questions to all lists related to the
cluster parts. But I was sure that answers on LVS list could be more
useful than answers on the Apache-users list ;)


> Linux generally freezes because of I/O operations, that is to say 
> network I/O or more probably disk I/O.
> 
> First thing is to try to identify which process(es) is(are) blocked on
> some I/O. There are many possible issues with apache+php+mysql trio. 

I've tried to do an strace on the apache one time, but due to the
performance impact, the freezes occured less times then. I catched one
with 41 active apache processes, there were freezing
- 28 in select() on sockets
-  3 in read() from sockets,
-  1 in writev() to socket,
-  1 in shutdown(send) of a socket
-  1 in lseek() of a file,
- remaing tasks in userspace.
I've attached one of the straces below. Up to my understanding,
everything points to a network issue, either hardware or driver, doesn't
it?


> A 'vmstat 1' and 'pstree -p', before and during freeze, would be very 
> useful to diagnose you problem. Also, please include some basic hardware 
> information about your realservers.

Here's a 'vmstat 1' output, with timestamps added and formated suitable
for mail:

timestamp procs -----------memory----------
           r  b   swpd   free   buff  cache
                        ---swap--  -----io----  --system-- ----cpu----
                          si   so     bi    bo    in    cs us sy id wa

12:55:15:  0  0   8004  20632 179640 513564    
                           0    0    140     0  2488  1487 25 12 63  0
12:55:16:  1  1   8004  21160 179656 513644
                           0    0     92     0  2859  1795 26 14 60  0
12:55:17:  0  0   8004  20976 179688 513692
                           0    0     76     0  3110  1760 35  9 56  0
12:55:36: 72 76   8004  21424 179740 513720
                           0    0     44  2892  2805  1421 25 16 59  0
12:55:37: 44  0   8004   7596 178164 504112
                           0    0    980 12792 32140 14528 16  6 78  0
12:55:38:  2  0   8004   8248 177952 503232
                           0    0    192     0  6297  2909 65 17 18  0
12:55:39:  4  0   8004   8360 177876 503392
                           0    0    340     0  4019  2312 36 17 48  0

I've noticed the decreasing cache memory usage every freeze. 


Realservers hardware
====================

CPU:     Intel P4 2.7GHz
Board:   Tyan (?)
Mem:     1GB (2x512MB PC266 CL2)
HD:      IDE Maxtor 6Y080P0, 80GB, UDMA 5, readahead 8, 16 bit IO
Network:
private: Intel 82540EM Gigabit (e1000 driver)
 public: Intel 82801BD PRO/100 VE (e100 driver)

Kernel: 2.4.20 (Debian testing)
libc6:  2.3.2  (Debian testing)
Apache: 1.3.29 (vanilla)
PHP:    4.3.3  (vanilla)

Apache DocumentRoot and LogDir on separate ext3 partitions.

A NFS mount is also rarely accessed by PHP for some centralized stuff
(but I've seen no related access before, during or after the freezes)

I use the noarp module to prevent real servers arp reply to the VIPs.



strace output
=============
During the freeze, most of the apache processes freezed for about 20
seconds between 14:01:22 to 14:01:43, but this one was a bit longer:

14:01:06 write(3, "HTTP/1.1 200 OK\r\nDate: Wed, 29 O"..., 2465)
         = 2465 <0.000032>
14:01:06 gettimeofday({1067432466, 38726}, NULL)
         = 0 <0.000006>
14:01:06 times({tms_utime=77, tms_stime=34, tms_cutime=0, tms_cstime=0})
         = 7367521 <0.000006>
14:01:06 shutdown(3, 1 /* send */)
         = 0 <0.000009>
14:01:06 select(4, [3], NULL, NULL, {2, 0})         
         = 1 (in [3], left {1, 470000}) <37.634582>
14:01:43 read(3, "", 512)
         = 0 <0.000013>
14:01:43 close(3)         = 0 <0.000022>
14:01:43 rt_sigaction(SIGUSR1, {0x4004a600, [],
         SA_RESTORER|SA_INTERRUPT, 0x401b35f8}, {SIG_IGN}, 8)
         = 0 <0.000008>
14:01:43 close(3)
         = -1 EBADF (Bad file descriptor) <0.000006>
14:01:43 semop(327681, 0x8086f74, 1)
         = 0 <0.000010>
14:01:43 select(24, [19 20 21 22 23], NULL, NULL, NULL)
         = 1 (in [22]) <0.000014>
14:01:43 accept(22, {sa_family=AF_INET, sin_port=htons(2600),
         sin_addr=inet_addr("xxx.xxx.xxx.xxx")}, [16])
         = 3 <0.000017>


I hope you or someone else can see something useful out of this
information... I'm at my wits' end and pressure from boss is increasing
from freeze to freeze...

Thanks,
Jan


<Prev in Thread] Current Thread [Next in Thread>