On Thu, 15 Dec 2005, Graeme Fowler wrote:
On Thu, 2005-12-15 at 12:39 -0800, Joseph Mack NA3T wrote:
Whenever I've had NFS mounts and machines crash, I wind up
with stale file handles, which I can only fix by rebooting
both the server and the client.
Have talked to the local expert about this and have a better
understanding now.
You get a stale file handle when the client has a
file||directory open and the server stops serving the
file||directory. This error is part of the protocol.
client server
export /home/user/
ls /home/user
foo
mount /home/user
cd /home/user/
ls ./
foo
cd foo
ls
..listing of files in foo
unexport /home/user/
ls
stale file handle
df and mount will hang (or possibly return
after a long timeout). The error goes away
when the server comes back.
export /home/user
ls
.. listing of files in foo
The stale file handle will mess up the client till
the server comes back. Since foo is on the same
piece of disk real-estate, it comes back with the same
file handle when the server reappears
An irrecoverable problem:
export /home/user
mount /home/user
cd /home/user
ls ./
foo
(ie as before so far)
do something different,
an irreversible failure on the server
rmdir /home/user/foo
ls ./
stale file handle
mkdir /home/user/foo
ls ./
stale file handle
Now when /home/user/foo is recreated, it's on a new piece of
disk real-estate and will have a different file handle. The
client is hung and you can't umount /home/user (maybe you
can with umount -f). If you can't umount /home/user, you
will have to reboot the client (in this case the
realserver).
In my experience so far, I very occasionally get stale file handles
reported in the logs (Courier IMAP server) but they're nicely shut off
by the NFS client when detected and forgotten about.
hmm, depends on what happens to get the stale file handle. I
guess you can be careful deleting files/directories that
clients have open.
The servers (the
NetApp filer appliances) don't ever, ever complain about them.
the error doesn't occur at that end
You say your setup is realiable, so maybe you don't have to
deal with the problem, but would your setup survive pulling
a few power cables, waiting 30mins and plugging them back
in?
Unfortunately we've suffered two total power outages (only short, but
still total) in the last twelve months - long, long story - and
everything survived. Even the fact that the realservers came up before
the filers didn't cause a problem, once the filers reappeared the
realservers remounted the filesystems and the show continued.
OK what if the servers went down and the clients
(realservers) stayed up, all with open file handles.
Presumably they just wait till the servers come up again.
Maybe the Linux NFS client is better now than it used to be? Correction
- it definitely is better than it used to be.
the problem I'm describing is part of the protocol.
Joe
--
Joseph Mack NA3T EME(B,D), FM05lw North Carolina
jmack (at) wm7d (dot) net - azimuthal equidistant map
generator at http://www.wm7d.net/azproj.shtml
Homepage http://www.austintek.com/ It's GNU/Linux!
|