[Linux-HA] Filesystem agent hangs because of fuser
Martin Fick
mogulguy at yahoo.com
Thu May 18 09:56:41 MDT 2006
Hi,
I have been trying to test a heartbeat scenario setup
of a drbd backed nfs system. I have read most of the
various pages out there for doing nfs setups and have
been mostly succesful with the results.
I am also mounting the nfs dirs on both the primary
and secondary machines. I have read that this is not
a very good idea someplaces, but I think that I read
that other people are also doing this successfully. I
am however running into problems, but it is more of a
heartbeat issue I think, one that might be a problem
for other people with other scenarios I believe too.
The issue is with the Filesystem resource agent: when
it is being shutdown "gracefully" by heartbeat it uses
fuser which can easily hang if there are nfs mounted
dirs on the current machine. I have not really
investigated what the fuser is being used for (I
imagine to see if the filesystem is being used before
attempting to unmount it), but it seems to be locking
up and causing heartbeat to not be able to shut down.
While this is going to be particularly noticeable with
my setup since the shutdown scenario consists of:
1) shutdown nfs-server
2) shutdown IP
3) shutdown (unmount) Filesystem
HANG
4) shutdown drbd (make secondary) -> Never reached
All while the filesystem served by the nfs-server is
still mounted on the same machine.
So if the answer is simply don't do this, I still
think that there is a potential problem for any user
of the Filesystem ResourceAgent which happens to have
any (even unrelated) nfs directories mounted on the
same machine. To confirm this, I performed a simple
test:
1) fuser a file in a local drive: fuser /tmp/a
-> returns right away.
2) kill (remote) nfs-server for a locally mounted nfs
filesystem
3) fuser a file in a local drive: fuser /tmp/a
-> returns sometimes right away / sometimes locks up
Now I would expect fuser to lock up if I were checking
a file in the nfs filesystem, but it sometimes locks
up even when checking a local file. It appears that
if anything on the system, another fuser or even a df
locks up waiting on the nfsfilesystem, the local fuser
(/tmp/a) will lock up too! This seems like a bug in
fuser which could prevent services from failing over
properly using heartbeat.
-Martin
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
More information about the Linux-HA
mailing list