IP Fail Over-Back Prog's
jalmasi at partner-banka.tel.hr
Wed Sep 9 04:13:28 MDT 1998
> 1. It Should be Practical
> 2. Fool proof. & stable
> 3. Should NOT be resource hungry.
> 4. Flexible/precise & Easy to install/configure
> 5. Should have more options like manual & scheduled failover.
> Working: My Personal thinking
> Check Weather Master IP is in use by directly connecting to some port
> on that IP.
> |- No ----Become Primary
> |-Yes--Are you configured to be primary
> |--Yes--contact secondary & Schedule for nearest even
> time(30Sec to 1-5min.)for failback and be primary
> |--No--Run sniffer and check(30sec to 5 min) for network load
> No load--Create load and check for success
> |--Yes--OKOK, return
> |--No--Retry--and next time FailOver.
This should be modular. Let's start with this main loop:
Master up? <-------+
+-- No: Takeover |
+-- Yes: wait a litle -+
Master up checking method should be configurable, even may be shell script... we
could provide more shell scripts and checking programs. I like your sniffer
idea, but: cluster nodes are connected to the same scsi bus. The most reliable
way to determine if a node is up is probably via scsi bus... if scsi bus dies
cluster will not work anyway. So, I think this IS a good idea but as a node-up
> Primary: a) Create alias to get server up.
> b) Take control of disk.
> c) Fork Sniffer Application and run command/scripts as req.
For all nodes, cluster software should initialize in the same manner. First step
for a node should be to determine who is master. I'd like to be able to start
cluster without original master, and with no reconfiguration... so I suggest
priorities. Each node should have priority... when cluster starts, node with
higher priority becomes the master. Note that this is actualy takeover
situation... a node says I'm the boss, other nodes respond OK. THEN primary node
mounts shared disks. That is, sniffer application should be started first.
More information about the Linux-HA