[Linux-ha-dev] heartbeat gmain source priority inversion with rexmit and dead node detection

Lars Ellenberg lars.ellenberg at linbit.com
Fri Apr 27 08:11:37 MDT 2012


On Thu, Apr 26, 2012 at 10:56:30AM +0900, renayama19661014 at ybb.ne.jp wrote:
> Hi All,
> 
> We gave test that assumed remote cluster environment.
> And we tested packet lost.

You may be interested in this patch I have lying around for ages.

It may be incomplete for one corner case:
On a seriously misconfigured and overloaded system,
I have seen reports for a single send_local_status()
(that is basically one single send_cluster_msg())
which took longer to execute than deadtime
(without even returning to the mainloop!).

This cornercase should be handled with a watchdog.
But without a watchdog, and without stonith,
the CCM was confused, because one node saw a
leave then re-join after partition event, while the other node did not
even notice it had left and rejoined the membership...
and pacemaker ended up being DC on both :-/

So I guess send_local_status() could do with an explicit call to
check_for_timeouts(), but that may need recursion protection.


I should really polish and push my queue some day soon...

Cheers,


diff --git a/heartbeat/hb_rexmit.c b/heartbeat/hb_rexmit.c
--- a/heartbeat/hb_rexmit.c
+++ b/heartbeat/hb_rexmit.c
@@ -168,6 +168,7 @@ send_rexmit_request( gpointer data)
 	if (STRNCMP_CONST(node->status, UPSTATUS) != 0 &&
 	    STRNCMP_CONST(node->status, ACTIVESTATUS) !=0) {
 		/* no point requesting rexmit from a dead node. */
+		g_hash_table_remove(rexmit_hash_table, ri);
 		return FALSE;
 	}
 
@@ -243,7 +244,7 @@ schedule_rexmit_request(struct node_info
 	ri->seq = seq;
 	ri->node = node;
 	
-	sourceid = Gmain_timeout_add_full(G_PRIORITY_HIGH - 1, delay, 
+	sourceid = Gmain_timeout_add_full(PRI_REXMIT, delay, 
 					  send_rexmit_request, ri, NULL);
 	G_main_setall_id(sourceid, "retransmit request", config->heartbeat_ms/2, 10);
 	
diff --git a/heartbeat/heartbeat.c b/heartbeat/heartbeat.c
--- a/heartbeat/heartbeat.c
+++ b/heartbeat/heartbeat.c
@@ -1585,7 +1585,7 @@ master_control_process(void)
 
 	send_local_status();
 
-	if (G_main_add_input(G_PRIORITY_HIGH, FALSE, 
+	if (G_main_add_input(PRI_POLL, FALSE, 
 			     &polled_input_SourceFuncs) ==NULL){
 		cl_log(LOG_ERR, "master_control_process: G_main_add_input failed");
 	}
diff --git a/include/hb_api_core.h b/include/hb_api_core.h
--- a/include/hb_api_core.h
+++ b/include/hb_api_core.h
@@ -40,6 +40,12 @@
 #define	PRI_READPKT		(PRI_SENDPKT+1)
 #define	PRI_FIFOMSG		(PRI_READPKT+1)
 
+/* PRI_POLL is where the timeout checks on deadtime happen.
+ * Better be sure rexmit requests for lost packets
+ * from a now dead node do not preempt detecting it as being dead. */
+#define PRI_POLL		(G_PRIORITY_HIGH)
+#define PRI_REXMIT		PRI_POLL
+
 #define PRI_CHECKSIGS		(G_PRIORITY_DEFAULT)
 #define PRI_FREEMSG		(PRI_CHECKSIGS+1)
 #define	PRI_CLIENTMSG		(PRI_FREEMSG+1)


More information about the Linux-HA-Dev mailing list