[Linux-HA] Stale NFS File Handles, even with fsid=1234
pk at q-leap.com
Tue Jul 10 01:49:59 MDT 2007
Stefan Lasiewski wrote:
> OS: RHEL4 u4 , x86_64
> Heartbeat version: heartbeat-2.0.8-2.el4.centos
> Two servers: fs1 is 'primary'. fs2 is 'standby'.
> Client name is app1 , running RHEL4 u4 i386
What kernel version is that, make sure it is not vulnerable
to this: http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2006-3468
> - Each server exports /export1 . I use BakBone::Replicator to keep the
> fs1:/export1 & fs2:/export2 in sync.
I don't know this Replicator, does it keep the inodes in sync?
If it does not, then it's not the right product for you.
Why are you not using DRBD?
> - On both fs1 & fs2, /var/lib/nfs is a symlink to /export1/nfs .
> fs1:/export1/nfs is copied to fs2:/export1/nfs .
> - fs1:/etc/exports is identical to fs2:/etc/exports . Each server is
> should be exporting similar filesystems with identical values.
The problem is, it should be identical filesystems as well.
> When I shut down fs1 (primary NFS server), the following happens:
> - fs1 removes bond0
> - fs1 shuts down nfslock
> - fs1 shuts down nfs
> - fs2 brings up bond0
> - fs2 starts up nfslock
> - fs2 starts up nfs
This doesn't look right to me, shouldn't it first bring up nfs,
and finally start bond0? Otherwise, as soon as the address
is available again, the clients will continue to request
the nfs service but the server hasn't started nfs yet
so the clients will receive an error.
> However, the NFS clients get the following errors:
> - I get a 'Stale NFS File Handle' on all NFS files and directories
> - /var/log/messages prints errors like the following:
> Jul 9 18:51:34 app902 kernel: nfs_update_inode: inode number mismatch
> Jul 9 18:51:34 app902 kernel: expected (0:13/0x2b30003), got
> Jul 9 18:51:35 app902 kernel: nfs_update_inode: inode number mismatch
> Jul 9 18:51:35 app902 kernel: expected (0:18/0x3970001), got
Exactly that could result in the error mentioned in above CVE.
> These errors go away if I simply remount the filesystems with a
> 'umount -a -t nfs && mount -a -t nfs'.
Possibly, but not what you want, you want the application using
the nfs share to continue.
More information about the Linux-HA