[Linux-ha-dev] Lost packets

Andrew lists at beekhof.homeip.net
Mon Aug 9 11:56:59 MDT 2004


Hello HA messaging gurus,

I'm getting a bunch of "Irretrievably lost" packets when I dont believe 
I should be.

On first inspection, it appears that large messages are being silently 
dropped by either the sender or receiver.

Several times the receiver asks for a retransmit and each time the 
sender complies until finally the message is marked "Irretrievably 
lost".

I suspect this is new behavior (IIRC the old behavior for messages that 
were too large was to complain about authorization) and would 
appreciate any guidance in resolving it.


Andrew

Here is an extract from the logs:

[ all messages are unordered broadcast messages that were sent while 
test2 was alive.  the crmd (sender) is only ever running on test1 and 
is started after test2 joined the membership ]

Aug  9 19:34:25 test1 heartbeat[2207]: info: Retransmitting pkt 677
Aug  9 19:34:25 test1 heartbeat[2207]: info: MSG: Dumping message with 
13 fields
Aug  9 19:34:25 test1 heartbeat[2207]: info: MSG[0] : [t=CRM]
Aug  9 19:34:25 test1 heartbeat[2207]: info: MSG[1] : [info=A CRM xml 
message]
Aug  9 19:34:25 test1 heartbeat[2207]: info: MSG[2] : [xml=<crm_message 
timestamp="1092072865" version="0.5" message_type="request" 
sys_to="crmd" sys_from="dc" 
crm_msg_reference="replace-dc-1092072865-7"><options 
op="replace"/><cib_fragment section="status"><cib 
timestamp="1092072865" version="1" generated="false" last_written="Mon 
Aug  9 19:34:25 2004&#10;" generation="24"><status 
timestamp="1092072865"><node_state id="test1" uname="test1" 
crmd="online" join="member" expected="member" timestamp="1092072865" 
in_ccm="true"><lrm replace="lrm_agents"><
Aug  9 19:34:25 test1 heartbeat[2207]: info: MSG[3] : [from_id=crmd]
Aug  9 19:34:25 test1 heartbeat[2207]: info: MSG[4] : [to_id=crmd]
Aug  9 19:34:25 test1 heartbeat[2207]: info: MSG[5] : [src=test1]
Aug  9 19:34:25 test1 heartbeat[2207]: info: MSG[6] : 
[(1)srcuuid=0x80c7174]
Aug  9 19:34:25 test1 heartbeat[2207]: info: MSG[7] : [seq=2a5]
Aug  9 19:34:25 test1 heartbeat[2207]: info: MSG[8] : [hg=3a]
Aug  9 19:34:25 test1 heartbeat[2207]: info: MSG[9] : [ts=4117b5a1]
Aug  9 19:34:25 test1 heartbeat[2207]: info: MSG[10] : [ld=0.00 0.00 
0.00 4/48 2360]
Aug  9 19:34:25 test1 heartbeat[2207]: info: MSG[11] : [ttl=5]
Aug  9 19:34:25 test1 heartbeat[2207]: info: MSG[12] : [auth=1 1cf50cfd]
Aug  9 19:34:25 test1 heartbeat[2207]: info: Rexmit STRING conversion: 
[>>> t=CRM info=A CRM xml message xml=<crm_message 
timestamp="1092072865" version="0.5" message_type="request" 
sys_to="crmd" sys_from="dc" 
crm_msg_reference="replace-dc-1092072865-7"><options 
op="replace"/><cib_fragment section="status"><cib 
timestamp="1092072865" version="1" generated="false" last_written="Mon 
Aug  9 19:34:25 2004&#10;" generation="24"><status 
timestamp="1092072865"><node_state id="test1" uname="test1" 
crmd="online" join="member" expected="member" timestamp="109
Aug  9 19:34:27 test1 heartbeat[2207]: info: Retransmitting pkt 677

[ for the sake of clarity I'll not print the dump every time ]

Aug  9 19:34:27 test1 heartbeat[2207]: info: Retransmitting pkt 678
Aug  9 19:34:28 test1 heartbeat[2207]: info: Retransmitting pkt 677
Aug  9 19:34:28 test1 heartbeat[2207]: info: Retransmitting pkt 678
Aug  9 19:34:29 test1 heartbeat[2207]: info: Retransmitting pkt 677
Aug  9 19:34:29 test1 heartbeat[2207]: info: Retransmitting pkt 678

[you get the idea...]

Aug  9 19:35:12 test1 heartbeat[2207]: info: Retransmitting pkt 677
Aug  9 19:35:13 test1 heartbeat[2207]: info: Retransmitting pkt 678
Aug  9 19:35:13 test1 heartbeat[2207]: info: Retransmitting pkt 707
Aug  9 19:35:13 test1 heartbeat[2207]: info: Retransmitting pkt 708
Aug  9 19:35:13 test1 heartbeat[2207]: info: Retransmitting pkt 677
Aug  9 19:35:14 test2 heartbeat[3646]: ERROR: Irretrievably lost 
packet: node test1 seq 677
Aug  9 19:35:14 test1 heartbeat[2207]: ERROR: Cannot rexmit pkt 677: 
seqno too low
Aug  9 19:35:14 test1 heartbeat[2207]: ERROR: Irretrievably lost 
packet: node test1 seq 677



More information about the Linux-HA-Dev mailing list