[Linux-ha-dev] Lost packets
Andrew
lists at beekhof.homeip.net
Mon Aug 9 11:56:59 MDT 2004
Hello HA messaging gurus,
I'm getting a bunch of "Irretrievably lost" packets when I dont believe
I should be.
On first inspection, it appears that large messages are being silently
dropped by either the sender or receiver.
Several times the receiver asks for a retransmit and each time the
sender complies until finally the message is marked "Irretrievably
lost".
I suspect this is new behavior (IIRC the old behavior for messages that
were too large was to complain about authorization) and would
appreciate any guidance in resolving it.
Andrew
Here is an extract from the logs:
[ all messages are unordered broadcast messages that were sent while
test2 was alive. the crmd (sender) is only ever running on test1 and
is started after test2 joined the membership ]
Aug 9 19:34:25 test1 heartbeat[2207]: info: Retransmitting pkt 677
Aug 9 19:34:25 test1 heartbeat[2207]: info: MSG: Dumping message with
13 fields
Aug 9 19:34:25 test1 heartbeat[2207]: info: MSG[0] : [t=CRM]
Aug 9 19:34:25 test1 heartbeat[2207]: info: MSG[1] : [info=A CRM xml
message]
Aug 9 19:34:25 test1 heartbeat[2207]: info: MSG[2] : [xml=<crm_message
timestamp="1092072865" version="0.5" message_type="request"
sys_to="crmd" sys_from="dc"
crm_msg_reference="replace-dc-1092072865-7"><options
op="replace"/><cib_fragment section="status"><cib
timestamp="1092072865" version="1" generated="false" last_written="Mon
Aug 9 19:34:25 2004 " generation="24"><status
timestamp="1092072865"><node_state id="test1" uname="test1"
crmd="online" join="member" expected="member" timestamp="1092072865"
in_ccm="true"><lrm replace="lrm_agents"><
Aug 9 19:34:25 test1 heartbeat[2207]: info: MSG[3] : [from_id=crmd]
Aug 9 19:34:25 test1 heartbeat[2207]: info: MSG[4] : [to_id=crmd]
Aug 9 19:34:25 test1 heartbeat[2207]: info: MSG[5] : [src=test1]
Aug 9 19:34:25 test1 heartbeat[2207]: info: MSG[6] :
[(1)srcuuid=0x80c7174]
Aug 9 19:34:25 test1 heartbeat[2207]: info: MSG[7] : [seq=2a5]
Aug 9 19:34:25 test1 heartbeat[2207]: info: MSG[8] : [hg=3a]
Aug 9 19:34:25 test1 heartbeat[2207]: info: MSG[9] : [ts=4117b5a1]
Aug 9 19:34:25 test1 heartbeat[2207]: info: MSG[10] : [ld=0.00 0.00
0.00 4/48 2360]
Aug 9 19:34:25 test1 heartbeat[2207]: info: MSG[11] : [ttl=5]
Aug 9 19:34:25 test1 heartbeat[2207]: info: MSG[12] : [auth=1 1cf50cfd]
Aug 9 19:34:25 test1 heartbeat[2207]: info: Rexmit STRING conversion:
[>>> t=CRM info=A CRM xml message xml=<crm_message
timestamp="1092072865" version="0.5" message_type="request"
sys_to="crmd" sys_from="dc"
crm_msg_reference="replace-dc-1092072865-7"><options
op="replace"/><cib_fragment section="status"><cib
timestamp="1092072865" version="1" generated="false" last_written="Mon
Aug 9 19:34:25 2004 " generation="24"><status
timestamp="1092072865"><node_state id="test1" uname="test1"
crmd="online" join="member" expected="member" timestamp="109
Aug 9 19:34:27 test1 heartbeat[2207]: info: Retransmitting pkt 677
[ for the sake of clarity I'll not print the dump every time ]
Aug 9 19:34:27 test1 heartbeat[2207]: info: Retransmitting pkt 678
Aug 9 19:34:28 test1 heartbeat[2207]: info: Retransmitting pkt 677
Aug 9 19:34:28 test1 heartbeat[2207]: info: Retransmitting pkt 678
Aug 9 19:34:29 test1 heartbeat[2207]: info: Retransmitting pkt 677
Aug 9 19:34:29 test1 heartbeat[2207]: info: Retransmitting pkt 678
[you get the idea...]
Aug 9 19:35:12 test1 heartbeat[2207]: info: Retransmitting pkt 677
Aug 9 19:35:13 test1 heartbeat[2207]: info: Retransmitting pkt 678
Aug 9 19:35:13 test1 heartbeat[2207]: info: Retransmitting pkt 707
Aug 9 19:35:13 test1 heartbeat[2207]: info: Retransmitting pkt 708
Aug 9 19:35:13 test1 heartbeat[2207]: info: Retransmitting pkt 677
Aug 9 19:35:14 test2 heartbeat[3646]: ERROR: Irretrievably lost
packet: node test1 seq 677
Aug 9 19:35:14 test1 heartbeat[2207]: ERROR: Cannot rexmit pkt 677:
seqno too low
Aug 9 19:35:14 test1 heartbeat[2207]: ERROR: Irretrievably lost
packet: node test1 seq 677
More information about the Linux-HA-Dev
mailing list