LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Distributed file system, which one would you consider?

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Distributed file system, which one would you consider?
From: Jan Klopper <janklopper@xxxxxxxxx>
Date: Fri, 22 Apr 2005 14:05:18 +0200
Hi,

Im currently running a uber high usage LAMP cluster,
This cluster does a few thousand small reads per Sec, and thus i use ReiserFS for local reads, since this makes sure the reads are as fast as possible.

I use Unison to replicate all changes from the FTP server (not inside the cluster) to all of the cluster nodes and back each 2 minutes, This gives me two problems and also some advantages.
Problems:
2 minute wait time, (more if the 3th server updates the FTP server trough unison and the second server only gets the file the next round) If the server updates the files, or create new ones, (for example a cache file) they won;t get propogated to the other servers for a fwe minutes. Unison is pretty cpu and bandwidth hungry, hence the 2 minute interval decided upon.
Advantages:
All servers use the same mysql, and thus they will create the same cache files anyway. All servers use their own super speedy local ReiserFS storage. (as if they were not in any cluster at all)

Now i looked at coda, but it doens't support 2 way replication properly.
(And i want it to do multiple updates, to elliminate the center master node. (eg,update to both of its neigbours)) GFS, needs special storage hardware, and i don't have that. nor do i think its the way forward for linux clusters.

Some others don't look production ready, or don't look designed for this work.

What i would see as the prefect solution would be something like this:

Place hooks in the filesystem, to run the cluster tool on write.
The cluster tool propogates the changes to both neighbouring servers, and also sends an unique ID. The tool stores the ID, as handled for a while. (100.000 max to ensure dos attack is not possible?)

The other servers cluster tools listen and receive the ID, check to see if they handled it previously, and since they didn't ask for the files. The servers write the files, without triggering their own update mechanism, but trigger a propgation tool with the received ID. This would update files around the entire cluster, would have configurable paths, and would not give any problems with servers updating in an endless loop.

I know this doesn't handle file locks etc. But i think it handles most simple scenarios. (LAMP apps for example)

Any toughts on this?

greets
Jan

<Prev in Thread] Current Thread [Next in Thread>