juri at koschikode.com
Mon Apr 23 06:15:48 MDT 2001
Lionel Cottin wrote:
> Hi all,
> I search the archive to find some script to check the network status, and
> initiate a take over if a failure is detected; but without success. So I
> will write it !!
though it might be ok to have a light weight monitoring script to check
the network connectivity I think most other people use mon for this (and
other monitoring) purposes. But go ahead if you feel like it!
> The idea is to create a HA ressource (netmon for example), which will ping a
> gateway or any other IP address, to ensure the active node is reachable from
> the public network or another network.
> For example, haressources will look like:
> node-x IPaddr::192.168.0.x Service::Argument ipmon::192.168.168.25
> So, if i have unsderstand how heartbeat works:
> 1/ Heartbeat starts and execute "ipmon 192.168.168.254 status"; the script
> is supposed to write "stopped" or "running" on STDOUT, Ok??
AFAIK heartbeat only calls the _first_ resources - in your example
IPaddr::192.168.0.x - with the status parameter.
> 2/ The result is "stopped", then heartbeat executes "ipmon 192.168.168.254
> start"; otherwise, it writes a line in ha.log and starts ipmon; am I Ok??
Partially. As above, only if ipmon is the first resource. Assuming it is,
then you are right for the "stopped" case. If the answer is "running"
then it just does nothing because heartbeat assumes it already has the
> 3/ Each x seconds, hearbeat executes "ipmon 192.168.168.254 status" and if
> the result is "running", it continues; but if the result is "stopped", it
> initiate a take away (we are on the active node), and the passive node
> initiate its take over.... Ok??
Nope. All resources are only called _once_ during failover. After that,
they are never called untill the next failover/transition.
> Any commemts on the 3 steps will be very appreciated;
After thinking about it for a moment, I don't think it's a good idea to
implement this as a resource controlled by heartbeat, because _this_
resource should control heartbeat - not vice versa.
I did this with mon. Have mon monitoring the gateway; if the gateway is
reachable start heartbeat; if the gateway is not reachable stop
This could be run on both nodes - so if _one_ node loses the network
connectivity (say the active one) then mon stops heartbeat and the
secondary will take over.
If the network fails completely (e.g. broken gateway) heartbeat will be
stopped on both nodes - no races (I hope).
In the latter case your script would lead to an alternating starting and
stopping of the nodes. Not good IMO.
More information about the Linux-HA