[Linux-ha-dev] total ordering protocol in multicast
gshi at ncsa.uiuc.edu
Thu Jun 10 16:43:21 MDT 2004
At 10:38 AM 6/10/2004 +0800, Zhao, Forrest wrote:
>I have two questions about these two familis of protocols:
>1 Which protocol do you think has a better scalability?
IMHO the trans/total or transis has better scalability since it is genuinely distributed.
Totem have a multi-ring protocol whose latency is claimed to be much better than single-protocol when
there are lots of nodes.
>2 Is there any performance (e.g. throughput) comparison data between these protocols?
They gives some performance data but it is not comparable due to different configurations.
Totem claims it can "operate at higher throughput than other protocol" because it has accurate flow control due to its token-ring protocol.
Transis claims "comparably good performance results", and a crucial reason is that it has a novel flow control mechanism.
>From: linux-ha-dev-bounces at lists.linux-ha.org [mailto:linux-ha-dev-bounces at lists.linux-ha.org] On Behalf Of Guochun Shi
>Sent: Thursday, June 10, 2004 3:00 AM
>To: linux-ha-dev at lists.linux-ha.org
>Cc: Spencer, Bob
>Subject: [Linux-ha-dev] total ordering protocol in multicast
>In Monday's meeting we discussed Spread ,TIPC, and the possibility they can be used in heartbeat. Here I want to continue discussion on total ordering protocol
>Generally there are two familis of protocols
>The first one is so-called "pre-transmission" protocols in that one node needs to contend for an ordering capability to order messages. The Amoeba system, Totem protocol and Isis all belongs to this category.
>The Amoeba system has a central controller, every message is sent to it via point to point communication and the central controller put a global sequence number in it and
>multicast it. The Totem protocol uses a revolving token that holds a sequence-number for messages. The holder of the token can emit one or more multicast messages and update
>the token sequence number accordingly. The Isis ABCAST protocol also employs a token holder within each group of communication processes. ABCAST messages are multicast at will, and their delivery is delayed by all receiving processes except for the token holder. Periodically, the token holder sends a message indicating its order of delivery for all received ABCAST messages, and all the other processes comply with it.
>Spread is similar to Totem protocol according to Bob's description.
>The advantage of "pre-transmission" protocol is that once the ordering capability is obtained, it is simple and efficient to order all messages. The disadvantage is that it has a bottleneck, either the token or the central-controller. ABCAST and Abmoeba also require extra messages to place an order to messages.
>The other category is so-called "post-transmission" protocols that are completely distributed algorithms that build a total order from the local information and reach agreement. Trans/total and Transis belong to this. In Trans protocol ACKs are added into the next message one node is to multicast, thus eliminate pure ACK messages. ACK for a message is only sent once. If one node get an ACK for a message but have not get the message, an NACK will be send out. The causal ordering is constructed by chains of ACKs and NACKs. A simple example:
>A1 is send out(A denotes process, 1 is the local seq number)
>B1, a1 is send out (a1 denotes ACK for messageA1)
>upon receiving the message C1,b1 and no NACK is send out, we can be sure that this node have received all A1 B1 C1 messages because C1 ACKs B1 and B1 ACKs A1. Besides that we have a casual ordering of the message A1->B1->C1
>Totol is the protocol to compute a total order based the causal order computed in the Trans, it is complicated but requires no extra messages.
>Transis is a derived protocol from the Trans/total; it slightly modifies trans protocol and uses a different method (still complicated) to compute a total order.
>The advantage of post-transmission is that every node can transmit at will and compute the total order based on the local received messages. No extra message is required thus save bandwidth. The disadvantage is that it needs messages from some number of nodes ( this number is portion of the total node numbers) to compute the total order thus there is a delay between receive and delivery of a message
>comments are welcome
>Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
>Home Page: http://linux-ha.org/
More information about the Linux-HA-Dev