LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

LVS and NFS

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: LVS and NFS
From: Steven Lang <slang@xxxxxxxxxxx>
Date: Wed, 26 Sep 2001 13:44:38 -0700
I know this has been discussed before, but has anybody done this much?

The setup I have is as follows:

Clients (4):
        PIII 933
        3x100mbit ethernet, bonded
Director (1):
        PIII 933
        1x100mbit ethernet
Realservers (2):
        PIII 933
        1x100mbit ethernet
        1gb Fibrechannel
        GFS shared filesystem cluster running on a hardware RAID array

The primary protocol I am interested in here is NFS.  I have the director 
setup with DR with LC scheduling, no persistence, with UDP connections timing 
out after 5 seconds.  I figured the time it would need to be accessing the 
same host would be when reading a file, so they are not all in contention for 
the same file, which seems to cost preformance in GFS.  That would all come 
in a series of accesses.  So there is not much need to keep the traffic to 
the same host beyond 5 seconds.

Now initially it was fine.  Because they have the same idea of the filesystem 
on the same device, NFS is perfectly happy serving from the two hosts even 
though only one mounted.

However, I am starting to see a few funny things which I don't know if they 
are caused by GFS, NFS, or the fact that I am load balancing both.  The first 
problem I started to see was when I started a stress test to see how the 
preformance was.  As expected, running through LVS match preformance with 
when I manually split the traffic between the NFS servers.  However, I was 
also writing the log file from this across LVS onto the GFS filesystem.  And 
it was in this log file that I started to see problems.  As the file was 
being written, occasionally a chunk of data would be replaced with all zeros. 
 Now my initial thought was that it was data going to different GFS hosts, 
and that one did not have the previously written data yet.  But AFAIK GFS is 
pretty good about synchronizing data.  And in computer terms, 5 seconds is a 
long time, and that is how long the data would need to be out of sync for it 
to switch to the other server for the next write.

So I stopped that test and eliminated NFS and LVS from the picture, instead 
running the stress test coordinator on one of the realservers, writing the 
log directly to NFS, to a different file.  However, now the server I *had* 
been using to write the log is reporting a stale NFS handle on the file it 
had been writing.  All other hosts see it fine.

I'm tempted to think this is a GFS bug, but I am not ruling anything out at 
the moment.

On another note, has anybody else had success with NFS?  Any config 
recommendations?  Perhaps an alternative to GFS that would work well under 
the kind of load I am putting it under?  (We are probably going to try CODA 
next)


<Prev in Thread] Current Thread [Next in Thread>