[Linux-HA] Seeing forced LRM refreshes

Christian Rishøj christian at rishoj.net
Tue Nov 6 06:04:03 MST 2007


On 11/5/07, Andrew Beekhof <beekhof at gmail.com> wrote:
> On 11/3/07, Christian Rishøj <christian at rishoj.net> wrote:
> > On 11/3/07, Andrew Beekhof <beekhof at gmail.com> wrote:
> > >
> > > On Nov 3, 2007, at 5:01 AM, Christian Rishøj wrote:
> > >
> > > >
> > > > On 2 Nov 2007, at 14:36, Andrew Beekhof wrote:
> > > >
> > > >>
> > > >> On Nov 1, 2007, at 1:32 AM, Christian Rishøj wrote:
> > > >>
> > > >>> Hi,
> > > >>>
> > > >>> Running 2.1.2 on Ubuntu, 2.6.23 x86_64.
> > > >>>
> > > >>> I am seeing a lot of "failed to get the value of field lrm_opstatus
> > > >>> from a ha_msg" in the syslog.
> > > >>>
> > > >>> These seem to give rise to "do_lrm_invoke: Forcing a local LRM
> > > >>> refresh", which in turn seem to restart the services, causing
> > > >>> interuption.
> > > >>>
> > > >>> What may be the cause of this? Syslog extract, CIB and
> > > >>> configuration attached.
> > > >>
> > > >> Basically, it's this option:
> > > >>
> > > >>   <nvpair id="remove-after-stop" name="remove-after-stop"
> > > >> value="true"/>
> > > >>
> > > >> I'd not enable this option.  Is there a particular reason you
> > > >> enabled it?
> > > >
> > > > Yes. I was seeing resources being restarted whenever I made changes
> > > > to seemingsly unrelated resources in the CIB. After a while I
> > > > tracked the problem down to some leftover state in the CIB/LRM from
> > > > previously deleted resources. Hoping to prevent state being left
> > > > behind deleting resources in the future, I added
> > > >
> > > >   <nvpair id="remove-after-stop" name="remove-after-stop"
> > > > value="true"/>
> > > >
> > > > as well as
> > > >
> > > >   <nvpair id="remove-after-stop" name="stop-orphan-resources"
> > > > value="true"/>
> > > >
> > > > to the cluster options.
> > > >
> > > > I am removing the former now, but would appreciate a hint on "best
> > > > practices" to avoid the problem I was seeing at the time.
> > >
> > > That depends on what the "seemingly unrelated" changes were
> >
> > As I remember the situation at that time, I had an IPaddr2 resource by
> > itself, with no dependencies specified. I believe I changed the
> > cidr_netmask.
>
> That would be enough to restart the IP and anything that depended on it...

Right. However, at the time, independent resources were restartet as
well. It turned out to be some leftover state from previously defined
resources (same parameters, different ids). Heartbeat would report
something like "nothing known about orphan resource XX running on YY"
and "making sure orphan resource XX is stopped". Pruning this leftover
state solved the problems of unexpected restarts.

Regards
Christian


More information about the Linux-HA mailing list