[LinuxFailSafe] Problems with interface monitoring

Martin Bene martin.bene@icomedias.com
Tue, 6 Aug 2002 10:17:34 +0200


Hi,

There seems to be a problem with the interface monitoring in failsafe: =
I've
had it declare up and working eth interfaces as dead and cause all kinds =
of
trouble because of that..

Ok, looking further I find that the /usr/lib/failsafe/bin/ping_wrapper =
script
that comes with ha_ifd is missing from the rpm specfile and thus also =
from
the production environment.

This means that if there's no inbound packets on a box for some time,
failsafe can't generate inbounds by doing a broadcast ping and the =
interface
gets marked as sick unnecessarily.

Of course it needs rather special circumstances to run  into this: =
normal
cluster operation will cause some traffic on all interfaces configured =
for
heartbeat and/or control. So, this will only bite you when
 * there's just one node left in the cluster
 * the other nodes are completely down (running with just inetd is =
sufficient
to get RPC answers..)
 * there's a (fairly short) period without traffic on an interface.

Fix - not tested, but should be ok:


diff -ur FailSafe_old/build/spec FailSafe/build/spec
--- FailSafe_old/build/spec     Thu Aug  1 18:51:10 2002
+++ FailSafe/build/spec Tue Aug  6 10:04:30 2002
@@ -401,6 +401,7 @@

 # CI/cmd/interface/ifd
 %attr(755, root, root) $(DIST_PREFIX)/bin/ha_ifd
+%attr(755, root, root) $(DIST_PREFIX)/bin/ping_wrapper

 # CI/cmd/interface/ifdadmin
 %attr(755, root, root) $(DIST_PREFIX)/bin/ha_ifdadmin

Bye, Martin
********************************************************
Martin Bene,                 CTO
icomedias GmbH,              A-8020 Graz, Entenplatz 1b
t +43 (316) 721671-14,       f +43 (316) 721671-26
e martin.bene@icomedias.com, i http://www.icomedias.com
********************************************************