[Linux-ha-dev] [PATCH] manage PostgreSQL 9.0 streaming replication using Master/Slave
Takatoshi MATSUO
matsuo.tak at gmail.com
Tue Feb 8 06:03:14 MST 2011
Hi
Thank you for comments.
> First of all thanks for doing this. It's definitely a neat option to
> have. Unfortunately I haven't yet worked with ProstgreSQL 9.0. Could
> you please elaborate little bit more on how it works? What happens if
> master dies? Who is responsible for creating/deleting recovery.conf
> file?
This patch does not setup recovery.conf for hot standby automatically,
so you need setup it manually before starting hot standby.
If recovery.conf exists, resource agent will become slave.
Die of master causes fail-over correctly.
> Also please be more careful with quoting variables in your if statements.
I checked and quoted.
I elaborate.
**************** Condition *********************************
1. Streaming replication works correctly.
2. Streaming replication works correctly if you swap
primary server for hot standby server.
FYA I wrote my configuration at the end of the post.
See "Example of configuration".
**************** Setup and start ***************************
1. You decide which server do you want to be master.
In this instance I assume that you choice node1.
2. node1# rm /var/lib/pgsql/9.0/data/recovery.conf
3. node1# /etc/init.d/heartbeat start
Wait for "ONLINE node1".
4. node1# crm configure load update pgsql.crm
5. Wait for master started as below
node1# crm_mon -1 -fA
------------------------------------------------------------
============
Last updated: Tue Feb 8 18:51:00 2011
Stack: Heartbeat
Current DC: node1 (0c140f90-7de3-438f-b1b5-3b9722bbde21) -
partition with quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, unknown expected votes
3 Resources configured.
============
Online: [ node1 ]
vip (ocf::heartbeat:IPaddr2): Started node1
Master/Slave Set: msPostgresql
Masters: [ node1 ]
Stopped: [ postgresql:1 ]
Clone Set: clnPingd
Started: [ node1 ]
Stopped: [ pingd:1 ]
Node Attributes:
* Node node1:
+ master-postgresql:0 : 1000
+ pgsql-status : PRI
+ ping_set : 100
Migration summary:
* Node node1:
------------------------------------------------------------
If start operation fails, please check PostgreSQL state using "crm_mon -A".
"pgsql-status" line shows something.
6. Backup node1's data and restore to node2.
In my case I use rsync command as below.
node2# psql -h 192.168.2.107 -U postgres -c "SELECT pg_start_backup('label')"
node2# rsync -avr --delete --exclude=postmaster.pid
192.168.2.107:/var/lib/pgsql/9.0/data/ /var/lib/pgsql/9.0/data/
node2# psql -h 192.168.2.107 -U postgres -c "SELECT pg_stop_backup()"
7. Put recovery.conf for node2.
node2# cp recovery.conf /var/lib/pgsql/9.0/data/
node2# chown postgres:postgres /var/lib/pgsql/9.0/data/recovery.conf
8. node2# /etc/init.d/heartbeat start
9. Wait for slave started as below.
node2# crm_mon -1 -fA
------------------------------------------------------------
============
Last updated: Tue Feb 8 19:41:00 2011
Stack: Heartbeat
Current DC: node1 (0c140f90-7de3-438f-b1b5-3b9722bbde21) -
partition with quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, unknown expected votes
3 Resources configured.
============
Online: [ node1 node2 ]
vip (ocf::heartbeat:IPaddr2): Started node1
Master/Slave Set: msPostgresql
Masters: [ node1 ]
Slaves: [ node2 ]
Clone Set: clnPingd
Started: [ node1 node2 ]
Node Attributes:
* Node node1:
+ master-postgresql:0 : 1000
+ pgsql-status : PRI
+ ping_set : 100
* Node node2:
+ master-postgresql:1 : 100
+ pgsql-status : HS
+ ping_set : 100
Migration summary:
* Node node1:
* Node node2:
------------------------------------------------------------
If master doesn't exist in other node or recovery.conf doesn't
exist in node2,
start operation fails.
********************** Other Operation ***************************************
1. If node1's master dies, node2 becomes a master.
node1# killall -9 postgres
node1# crm_mon -1 -fA
------------------------------------------------------------
============
Last updated: Tue Feb 8 20:12:34 2011
Stack: Heartbeat
Current DC: node1 (0c140f90-7de3-438f-b1b5-3b9722bbde21) -
partition with quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, unknown expected votes
3 Resources configured.
============
Online: [ node1 node2 ]
vip (ocf::heartbeat:IPaddr2): Started node2
Master/Slave Set: msPostgresql
Masters: [ node2 ]
Stopped: [ postgresql:0 ]
Clone Set: clnPingd
Started: [ node1 node2 ]
Node Attributes:
* Node node1:
+ master-postgresql:0 : -INFINITY
+ pgsql-status : STOP:with errors
+ ping_set : 100
* Node node2:
+ master-postgresql:1 : 1000
+ pgsql-status : PRI
+ ping_set : 100
Migration summary:
* Node node1:
postgresql:0: migration-threshold=1 fail-count=1
* Node node2:
Failed actions:
postgresql:0_monitor_9000 (node=node1, call=14, rc=7,
status=complete): not running
------------------------------------------------------------
2. To recover node1, backup node2's data and restore to node1.
node1# psql -h 192.168.2.207 -U postgres -c "SELECT pg_start_backup('label')"
node1# rsync -avr --delete --exclude=postmaster.pid
192.168.2.207:/var/lib/pgsql/9.0/data/ /var/lib/pgsql/9.0/data/
node1# psql -h 192.168.2.207 -U postgres -c "SELECT pg_stop_backup()"
3. Put recovery.conf for node1.
node1# cp recovery.conf /var/lib/pgsql/9.0/data/
node1# chown postgres:postgres /var/lib/pgsql/9.0/data/recovery.conf
4. Remove flag file of repressing start.
node1# rm /var/lig/pgsql/PGSQL.5432.repress_start
5. node1# crm resource cleanup msPostgresql
6. Node1 becomes slave as below.
node1# crm_mon -1 -fA
------------------------------------------------------------
============
Last updated: Tue Feb 8 20:32:04 2011
Stack: Heartbeat
Current DC: node1 (0c140f90-7de3-438f-b1b5-3b9722bbde21) -
partition with quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, unknown expected votes
3 Resources configured.
============
Online: [ node1 node2 ]
vip (ocf::heartbeat:IPaddr2): Started node2
Master/Slave Set: msPostgresql
Masters: [ node2 ]
Slaves: [ node1 ]
Clone Set: clnPingd
Started: [ node1 node2 ]
Node Attributes:
* Node node1:
+ master-postgresql:0 : 100
+ pgsql-status : HS
+ ping_set : 100
* Node node2:
+ master-postgresql:1 : 1000
+ pgsql-status : PRI
+ ping_set : 100
Migration summary:
* Node node1:
* Node node2:
------------------------------------------------------------
********************** Example of configuration ****************************
* postgresql.conf (common configuration for node1 and node2)
- Please mkdir /var/lib/pgsql/9.0/data/pg_archive preliminarily
- To cause fail-over correctly, set listen_addresses = '*'
if you use virtual IP together.
---------------------------------------------------------------------------
listen_addresses = '*'
port = 5432
wal_level = hot_standby
max_wal_senders = 3
wal_keep_segments = 16
archive_mode = on
archive_command = 'cp %p /var/lib/pgsql/9.0/data/pg_archive/%f'
---------------------------------------------------------------------------
* pg_hba.conf (common configuration for node1 and node2)
- 192.168.2.0/24 is LAN for replication.
- Node1 has 192.168.2.107 and Node2 has 192.168.2.207.
---------------------------------------------------------------------------
local all all trust
host all all 127.0.0.1/32 trust
host all all ::1/128 trust
host all all 192.168.2.0/24 trust
host replication all 192.168.2.0/24 trust
---------------------------------------------------------------------------
* recovery.conf for node1
---------------------------------------------------------------------------
standby_mode = 'on'
primary_conninfo = 'host=192.168.2.207 port=5432 user=postgres'
restore_command = 'cp /var/lib/pgsql/9.0/data/pg_archive/%f %p'
trigger_file = '/var/lib/pgsql/PGSQL.5432.trigger'
---------------------------------------------------------------------------
* recovery.conf for node2
---------------------------------------------------------------------------
standby_mode = 'on'
primary_conninfo = 'host=192.168.2.107 port=5432 user=postgres'
restore_command = 'cp /var/lib/pgsql/9.0/data/pg_archive/%f %p'
trigger_file = '/var/lib/pgsql/PGSQL.5432.trigger'
---------------------------------------------------------------------------
* pgsql.crm for crm command
---------------------------------------------------------------------------
property \
no-quorum-policy="ignore" \
stonith-enabled="false" \
startup-fencing="false" \
rsc_defaults \
resource-stickiness="INFINITY" \
migration-threshold="1"
ms msPostgresql postgresql \
meta \
master-max="1" \
master-node-max="1" \
clone-max="2" \
clone-node-max="1" \
notify="true"
clone clnPingd pingd \
meta \
clone-max="2" \
clone-node-max="1"
primitive vip ocf:heartbeat:IPaddr2 \
params \
ip="172.20.24.127" \
nic="eth0" \
cidr_netmask="16" \
op start timeout="60s" interval="0s" on-fail="restart" \
op monitor timeout="60s" interval="10s" on-fail="restart" \
op stop timeout="60s" interval="0s" on-fail="block"
primitive postgresql ocf:heartbeat:pgsql \
params \
pgctl="/usr/pgsql-9.0/bin/pg_ctl" \
psql="/usr/pgsql-9.0/bin/psql" \
pgdata="/var/lib/pgsql/9.0/data/" \
start_opt="-p 5432" \
rep_mode="async" \
op start timeout="3600s" interval="0s" on-fail="restart" \
op monitor timeout="60s" interval="10s" on-fail="restart" \
op monitor timeout="60s" interval="9s" on-fail="restart"
role="Master" \
op promote timeout="3600s" interval="0s" on-fail="restart" \
op demote timeout="60s" interval="0s" on-fail="block" \
op stop timeout="60s" interval="0s" on-fail="block"
primitive pingd ocf:pacemaker:pingd \
params \
name="ping_set" \
host_list="172.20.24.103" \
multiplier="100" \
dampen="0" \
op start timeout="60s" interval="0s" on-fail="restart" \
op monitor timeout="60s" interval="10s" on-fail="restart" \
op stop timeout="60s" interval="0s" on-fail="ignore"
location rsc_location-1 vip \
rule -inf: not_defined ping_set or ping_set lt 100
location rsc_location-2 msPostgresql \
rule -inf: not_defined ping_set or ping_set lt 100 \
rule -inf: defined fail-count-vip
colocation rsc_colocation-1 inf: vip clnPingd
colocation rsc_colocation-2 inf: msPostgresql clnPingd
colocation rsc_colocation-3 inf: msPostgresql:Master vip
order rsc_order-1 0: clnPingd msPostgresql
order rsc_order-2 0: msPostgresql vip
order rsc_order-3 0: vip msPostgresql:promote
---------------------------------------------------------------------------
Regards,
Takatoshi MATSUO
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgsql.patch
Type: application/octet-stream
Size: 14764 bytes
Desc: not available
Url : http://lists.linux-ha.org/pipermail/linux-ha-dev/attachments/20110208/afeaf52f/attachment-0001.obj
More information about the Linux-HA-Dev
mailing list