[Linux-ha-dev] pgsql RA improvements

Serge Dubrouski sergeyfd at gmail.com
Fri Feb 23 11:31:15 MST 2007


Sorry, I just found that my version won't work properly on Solaris.
Attached is the corrected  one. Sorry for creating so many messages
:-)

On 2/23/07, Serge Dubrouski <sergeyfd at gmail.com> wrote:
> Attached is the patch in the way that I like it to be.
>
> On 2/23/07, Serge Dubrouski <sergeyfd at gmail.com> wrote:
> > And I don't like the idea of removing PID in "start" function. The
> > standard approach if to remove it after stopping application. Other
> > way it could lead to attempt of starting a second copy of application.
> >
> > On 2/23/07, Serge Dubrouski <sergeyfd at gmail.com> wrote:
> > > I like the idea of the patch, but honestly I don't like how it's
> > > implemented. It shall call (as Andrew suggested) "monitor" function to
> > > check that pgsql is up or down instead of spreading the same code all
> > > around the script. I'd like to review the idea and prepare another
> > > patch if everybody is agree.
> > >
> > > On 2/23/07, Keisuke MORI <kskmori at intellilink.co.jp> wrote:
> > > > Hi,
> > > >
> > > > We have found a several problems with pgsql RA through our testing.
> > > > It 'fails to failover' in some scenarios.
> > > > I'm proposing a patch to fix them.
> > > >
> > > > Problem description:
> > > >
> > > > 1) The first 'monitor' may fail even if the postmaster was
> > > >   successfully launched.
> > > >
> > > >   This is because 'start' of the pgsql may return before the
> > > >   postmaster gets ready to answer to a psql query issued by
> > > >   'monitor', since it only checks the existance of postmaster
> > > >   process. The postmaster can take a few minitues to get ready
> > > >   to answer, particularly when it needs to recover the database
> > > >   after a crash. Even if no recovery is necessary, we observed
> > > >   that it sometimes fails in some of our test cases.
> > > >
> > > > 2) The postmaster fails to startup when 'postmaster.pid' file
> > > >   was left over from the previous crash.
> > > >
> > > > 3) 'stop' doest not execute the fast mode shutdown effectively,
> > > >   because it executes the immediate mode shutdown at the very
> > > >   next moment.  The fast mode shutdown can take a few minutes
> > > >   to complete to flush the database log.
> > > >
> > > >   This isn't a critical problem, but it may result to take a
> > > >   time longer to complete the failover (according to our
> > > >   database team). It is preferable to wait to complete the fast
> > > >   mode shutdown as long as possible.
> > > >
> > > >
> > > > Proposals to fix:
> > > >
> > > > 1) In 'start', wait until the postmaster gets ready to answer by
> > > >   checking as same as 'monitor' does.
> > > >   The maximum wait time to complete to startup can be
> > > >   customized by an additional parameter 'start_wait'.
> > > >
> > > > 2) Add a cleanup code for 'postmaster.pid' when stop and before starting.
> > > >
> > > > 3) In 'stop', wait until the postmaster completes to the fast
> > > >   mode shutdown.
> > > >   The maximum wait time to complete to shutdown can be
> > > >   customized by an additional parameter 'stop_wait.
> > > >
> > > >
> > > > The attached patch is for the latest -dev.
> > > >
> > > > Regards,
> > > >
> > > > Keisuke MORI
> > > > NTT DATA Intellilink Corporation
> > > >
> > > >
> > > > _______________________________________________________
> > > > Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > > > Home Page: http://linux-ha.org/
> > > >
> > > >
> > > >
> > >
> >
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgsql.in.patch
Type: application/octet-stream
Size: 2718 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha-dev/attachments/20070223/dbfbaf37/pgsql.in-0001.obj


More information about the Linux-HA-Dev mailing list