[Linux-HA] Message hist queue is filling up
Matt Wilder
grewaru at gmail.com
Tue Dec 5 11:25:43 MST 2006
FYI I am running heartbeat 2.0.7 with the patch listed above under
FreeBSD 6.1-RELEASE-p3
On 12/5/06, ha at ew.nsci.us <ha at ew.nsci.us> wrote:
> On Tue, 5 Dec 2006, Matt Wilder wrote:
>
> > Greetings,
> >
> > I applied the patch pointed to above with no issue. I have installed
> > the patched version and restarted heartbeat on both nodes and the 99%
> > cpu issue appears to be gone. However, I am still getting the
> > following messages in syslog and It seems as if resource handover isnt
> > working quite right. Can anyone point me to what these messages mean?
> > I can provide more logs if necessary.
> >
>
> I am posting a me-too. We had the same problem with a node doing this and
> have not found a resolution. The node ran out of disk space and hung.
> Ultimately I ripped out anything heartbeat related I could find and
> deleted anything that was left which might be heartbeat related on that
> node. Next I removed the 2.0.5 rpm, and reinstalled with 2.0.7. After
> reinstalling, we had the same error and the node would not see the
> cluster. The only thing I can think of is to stop the entire cluster,
> upgrade to 2.0.7, and start again. Unfortunately we have not had a moment
> to restart the cluster to do this over the past month or so; the node with
> problems is still offline. Originally the entire cluster was 2.0.5. Now
> the cluster is all 2.0.5 except for the node which was having trouble,
> which is now 2.0.7 'cause yum installed the latest version (FC5).
>
> Any thoughts?
>
> -Eric
>
>
> > Thanks.
> >
> > Primary Node (active):
> > Dec 5 12:25:49 glider1 lrmd: [886]: WARN: G_SIG_dispatch: Dispatch
> > function for SIGCHLD was delayed 1000 ms (> 100 ms) before being
> > called (GSource: 0x522418)
> > Dec 5 12:25:49 glider1 crmd: [888]: WARN:
> > do_dc_join_finalize:join_dc.c join-2: We are still in a transition.
> > Delaying until the TE completes.
> > Dec 5 12:25:49 glider1 crmd: [888]: WARN:
> > do_dc_join_finalize:join_dc.c join-2: We are still in a transition.
> > Delaying until the TE completes.
> > Dec 5 12:25:51 glider1 tengine: [899]: notice: run_graph:graph.c
> > Transition 1: (Complete=18, Pending=0, Fired=0, Skipped=2,
> > Incomplete=0)
> > Dec 5 12:29:52 glider1 heartbeat: [837]: ERROR: Message hist queue is
> > filling up (151 messages in queue)
> > Dec 5 12:29:54 glider1 heartbeat: [837]: ERROR: Message hist queue is
> > filling up (152 messages in queue)
> > Dec 5 12:29:56 glider1 heartbeat: [837]: ERROR: Message hist queue is
> > filling up (153 messages in queue)
> > Dec 5 12:29:58 glider1 heartbeat: [837]: ERROR: Message hist queue is
> > filling up (154 messages in queue)
> >
> > Secondary node:
> > Dec 5 12:30:03 glider2 heartbeat: [559]: ERROR: Irretrievably lost
> > packet: node glider1.domainit.com seq 135
> > Dec 5 12:30:03 glider2 heartbeat: [559]: ERROR: Irretrievably lost
> > packet: node glider1.domainit.com seq 135
> > Dec 5 12:30:18 glider2 heartbeat: [559]: ERROR: Irretrievably lost
> > packet: node glider1.domainit.com seq 143
> > Dec 5 12:30:28 glider2 heartbeat: [559]: ERROR: Irretrievably lost
> > packet: node glider1.domainit.com seq 148
> > Dec 5 12:30:34 glider2 heartbeat: [559]: ERROR: Irretrievably lost
> > packet: node glider1.domainit.com seq 151
> > Dec 5 12:30:39 glider2 heartbeat: [559]: ERROR: Irretrievably lost
> > packet: node glider1.domainit.com seq 153
> >
> >
> >
> > On 11/30/06, Matt Wilder <grewaru at gmail.com> wrote:
> >> I will look into this, as I am also having the 99% cpu issue.
> >>
> >> Any ideas as to if this will make it into a release?
> >>
> >>
> >> On 11/30/06, Oren Nechushtan <oren at forescout.com> wrote:
> >> > Hi,
> >> > We've encountered something like that in the past.
> >> > Check out the messages titled "[Linux-HA] RE: 99% CPU heartbeat & rexmit
> >> (seqno too low)"
> >> > from September 2006. The (unofficial) patch there solved it for us
> >> thought it may require minor changes to date.
> >> >
> >> > Best,
> >> > Oren.
> >> >
> >> > > -----Original Message-----
> >> > > From: linux-ha-bounces at lists.linux-ha.org
> >> > > [mailto:linux-ha-bounces at lists.linux-ha.org]On Behalf Of Matt Wilder
> >> > > Sent: Thursday, November 30, 2006 8:03 PM
> >> > > To: General Linux-HA mailing list
> >> > > Subject: Re: [Linux-HA] Message hist queue is filling up
> >> > >
> >> > >
> >> > > What would cause this to happen? There are no network connectivity
> >> > > issues between the two nodes.
> >> > >
> >> > > On 11/30/06, Serge Dubrouski <sergeyfd at gmail.com> wrote:
> >> > > > Lost packets between nodes in cluster.
> >> > > >
> >> > > > On 11/30/06, Matt Wilder <grewaru at gmail.com> wrote:
> >> > > > > Can anyone tell me what the cause of the following
> >> > > messages showing up
> >> > > > > in syslog from heartbeat? I have checked network
> >> > > connectivity between
> >> > > > > the two machines in my cluster and everything looks fine. These
> >> > > > > messages are occurring on a semi-frequent basis and do
> >> > > not seem to be
> >> > > > > stopping.
> >> > > > >
> >> > > > > Node1 syslog (currently serving all resources):
> >> > > > > Nov 28 18:06:36 glider1 heartbeat: [80229]: ERROR:
> >> > > Message hist queue
> >> > > > > is filling up (196 messages in queue)
> >> > > > > Nov 28 18:06:38 glider1 heartbeat: [80229]: ERROR:
> >> > > Message hist queue
> >> > > > > is filling up (197 messages in queue)
> >> > > > > Nov 28 18:06:40 glider1 heartbeat: [80229]: ERROR:
> >> > > Message hist queue
> >> > > > > is filling up (198 messages in queue)
> >> > > > > Nov 28 18:06:42 glider1 heartbeat: [80229]: ERROR:
> >> > > Message hist queue
> >> > > > > is filling up (199 messages in queue)
> >> > > > > Nov 28 18:06:44 glider1 heartbeat: [80229]: ERROR:
> >> > > Message hist queue
> >> > > > > is filling up (200 messages in queue)
> >> > > > > Nov 28 18:06:50 glider1 last message repeated 3 times
> >> > > > > Nov 28 18:06:50 glider1 heartbeat: [80229]: ERROR: Cannot
> >> > > rexmit pkt
> >> > > > > 614508 for glider2.domainit.com: seqno too low
> >> > > > > Nov 28 18:06:52 glider1 heartbeat: [80229]: ERROR:
> >> > > Message hist queue
> >> > > > > is filling up (200 messages in queue)
> >> > > > > Nov 28 18:06:56 glider1 last message repeated 2 times
> >> > > > > Nov 28 18:06:56 glider1 heartbeat: [80229]: ERROR: Cannot
> >> > > rexmit pkt
> >> > > > > 614511 for glider2.domainit.com: seqno too low
> >> > > > > Nov 28 18:06:58 glider1 heartbeat: [80229]: ERROR:
> >> > > Message hist queue
> >> > > > > is filling up (200 messages in queue)
> >> > > > > Nov 28 18:07:06 glider1 last message repeated 4 times
> >> > > > >
> >> > > > >
> >> > > > > Node2 syslog:
> >> > > > > Nov 28 18:05:05 glider2 heartbeat: [568]: ERROR:
> >> > > Irretrievably lost
> >> > > > > packet: node glider1.domainit.com seq 614508
> >> > > > > Nov 28 18:05:11 glider2 heartbeat: [568]: ERROR:
> >> > > Irretrievably lost
> >> > > > > packet: node glider1.domainit.com seq 614511
> >> > > > > _______________________________________________
> >> > > > > Linux-HA mailing list
> >> > > > > Linux-HA at lists.linux-ha.org
> >> > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> > > > > See also: http://linux-ha.org/ReportingProblems
> >> > > > >
> >> > > > _______________________________________________
> >> > > > Linux-HA mailing list
> >> > > > Linux-HA at lists.linux-ha.org
> >> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> > > > See also: http://linux-ha.org/ReportingProblems
> >> > > >
> >> > > _______________________________________________
> >> > > Linux-HA mailing list
> >> > > Linux-HA at lists.linux-ha.org
> >> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> > > See also: http://linux-ha.org/ReportingProblems
> >> > >
> >> > _______________________________________________
> >> > Linux-HA mailing list
> >> > Linux-HA at lists.linux-ha.org
> >> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> > See also: http://linux-ha.org/ReportingProblems
> >> >
> >>
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA at lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list