It is planned in PostgreSQL 9.2 to have support for cascading replication, which is the ability to add slaves under other slaves. In 9.2, slave-slave connections are only asynchronous.

This feature has been introduced by this commit.
commit 5286105800c7d5902f98f32e11b209c471c0c69c
Author: Simon Riggs
Date: Tue Jul 19 03:40:03 2011 +0100
 
Cascading replication feature for streaming log-based replication.
Standby servers can now have WALSender processes, which can work with
either WALReceiver or archive_commands to pass data. Fully updated
docs, including new conceptual terms of sending server, upstream and
downstream servers. WALSenders terminated when promote to master.
 
Fujii Masao, review, rework and doc rewrite by Simon Riggs

So let’s try to use it with the following configuration.

Cascading replication graph

Data files of master, slaves and archive folder are located in $HOME/bin/postgres.
You need to checkout the master branch of PostgreSQL GIT to try this feature.

The settings are pretty similar to what you can find here.
However, you have to take care of the following settings in postgresql.conf:

  • Master
    wal_level = hot_standby
    archive_mode = on
    archive_command = 'cp -i %p $HOME/bin/postgres/archive/%f'
    max_wal_senders = 10
  • Slave 1
    hot_standby = on
    port = 15432
  • Slave 2
    hot_standby = on
    port = 15433
  • Slave 1-1
    hot_standby = on
    port = 25432
  • Slave 1-2
    hot_standby = on
    port = 25433

You can setup slave1, and slave2 as synchronously with master by using synchronous_standby_names in postgresql.conf of master.
This is set a little bit later after initializing the master backup.

You need to change recovery.conf of slave 1 and slave 2 with those parameters:
standby_mode = on
primary_conninfo = 'host=localhost port=5432 application_name=$SLAVE_NAME'
restore_command = 'cp -i $HOME/bin/postgres/archive/%f %p'

$SLAVE_NAME being slave1 for slave 1, and slave2 for slave 2.

Then as, slave 1-1 and 1-2 have to connect to slave 1, use the same values as above for standby_mode and restore_command, but setup primary_conninfo like this:
primary_conninfo = 'host=localhost port=15432 application_name=$SLAVE_NAME'
$SLAVE_NAME being slave11 for slave 1-1, and slave12 for slave 1-2.

Here is the dirty script used to start up the system. This has been written quickly, so sorry for the bad quality :) .
#!/bin/bash
#Master has port 5432
#Slave1 has port 15432
#Slave2 has port 15433
#Slave1 has port 25432
#Slave2 has port 25433
 
PSQL_FOLDER=$HOME/bin/postgres
PSQL_BIN=$PSQL_FOLDER/bin #Binary folder
PSQL_CONFIG=$PSQL_FOLDER/config #Folder containing all the configuration files
PSQL_MASTER=$PSQL_FOLDER/master #Master data folder
 
PSQL_SLAVE1=$PSQL_FOLDER/slave1 #Slave 1 data folder
PSQL_SLAVE2=$PSQL_FOLDER/slave2 #Slave 2 data folder
PSQL_SLAVE11=$PSQL_FOLDER/slave11 #Slave 11 data folder
PSQL_SLAVE12=$PSQL_FOLDER/slave12 #Slave 12 data folder
PSQL_ARCHIVE=$PSQL_FOLDER/archive #Archive folder
 
#clean up, take down violently all the processes
killall postgres
rm -rf $PSQL_MASTER $PSQL_SLAVE1 $PSQL_SLAVE2 $PSQL_SLAVE11 $PSQL_SLAVE12 $PSQL_ARCHIVE
mkdir $PSQL_MASTER $PSQL_SLAVE1 $PSQL_SLAVE2 $PSQL_SLAVE11 $PSQL_SLAVE12 $PSQL_ARCHIVE
sleep 1
 
#Initialize master
$PSQL_BIN/initdb -D $PSQL_MASTER
cp $PSQL_CONFIG/postgresql.conf.master $PSQL_MASTER/postgresql.conf
cp $PSQL_CONFIG/pg_hba.conf.master $PSQL_MASTER/pg_hba.conf
 
#Start master
$PSQL_BIN/postgres -D $PSQL_MASTER &
#Wait a little before server start up, let it finish initialization
echo "Master started"
sleep 2
 
#Initialize slave
#This is used to start the backup so as slave does not have to recover from
#scratch when being build. It definitely accelerates standby start up
$PSQL_BIN/psql postgres -c "select pg_start_backup('backup')"
cp -pr $PSQL_MASTER/* $PSQL_SLAVE1/
cp -pr $PSQL_MASTER/* $PSQL_SLAVE2/
cp -pr $PSQL_MASTER/* $PSQL_SLAVE11/
cp -pr $PSQL_MASTER/* $PSQL_SLAVE12/
$PSQL_BIN/psql postgres -c "select pg_stop_backup()"
echo "Backup performed"
sleep 1
 
#Setup synchronous commit mode on master
echo "synchronous_standby_names='slave1,slave2'" >> $PSQL_MASTER/postgresql.conf
$PSQL_BIN/pg_ctl reload -D $PSQL_MASTER
echo "Force master to synchronize mode with slave1 (priority 1) and slave 2 (priority 2)"
sleep 1
 
#Then finish by copying all the configuration files for slaves
cp $PSQL_CONFIG/postgresql.conf.slave1 $PSQL_SLAVE1/postgresql.conf
cp $PSQL_CONFIG/recovery.conf.slave1 $PSQL_SLAVE1/recovery.conf
cp $PSQL_CONFIG/postgresql.conf.slave2 $PSQL_SLAVE2/postgresql.conf
cp $PSQL_CONFIG/recovery.conf.slave2 $PSQL_SLAVE2/recovery.conf
cp $PSQL_CONFIG/postgresql.conf.slave11 $PSQL_SLAVE11/postgresql.conf
cp $PSQL_CONFIG/recovery.conf.slave11 $PSQL_SLAVE11/recovery.conf
cp $PSQL_CONFIG/postgresql.conf.slave12 $PSQL_SLAVE12/postgresql.conf
cp $PSQL_CONFIG/recovery.conf.slave12 $PSQL_SLAVE12/recovery.conf
 
#Delete unnecessary xlog files and postmaster pid files
rm $PSQL_SLAVE1/postmaster.pid
rm -r $PSQL_SLAVE1/pg_xlog/*
rm $PSQL_SLAVE2/postmaster.pid
rm -r $PSQL_SLAVE2/pg_xlog/*
rm $PSQL_SLAVE11/postmaster.pid
rm -r $PSQL_SLAVE11/pg_xlog/*
rm $PSQL_SLAVE12/postmaster.pid
rm -r $PSQL_SLAVE12/pg_xlog/*
 
#Start Slave 1
chmod 700 $PSQL_SLAVE1
$PSQL_BIN/postgres -D $PSQL_SLAVE1 &
echo "Slave 1 started"
 
#Start Slave 2
chmod 700 $PSQL_SLAVE2
$PSQL_BIN/postgres -D $PSQL_SLAVE2 &
echo "Slave 2 started"
 
#Start Slave 11
chmod 700 $PSQL_SLAVE11
$PSQL_BIN/postgres -D $PSQL_SLAVE11 &
echo "Slave 11 started"
 
#Start Slave 12
chmod 700 $PSQL_SLAVE12
$PSQL_BIN/postgres -D $PSQL_SLAVE12 &
echo "Slave 12 started"
 
exit 0

Configuration files and scripts can be found here.

OK now let’s check if it works well.
$ psql postgres
postgres=# select application_name,state,sync_priority,sync_state from pg_stat_replication;
application_name | state | sync_priority | sync_state
------------------+-----------+---------------+------------
slave1 | streaming | 1 | sync
slave2 | streaming | 2 | potential
(2 rows)

On master, slave1 has priority 1 for synchronization (synchronous_standby_nodes has been set up ‘slave1,slave2′). It looks to be correctly synchronized.

Then let’s do the same check from slave1.
$ psql -p 15432 postgres
postgres=# select application_name,state,sync_priority,sync_state from pg_stat_replication;
application_name | state | sync_priority | sync_state
------------------+-----------+---------------+------------
slave12 | streaming | 0 | async
slave11 | streaming | 0 | async
(2 rows)

slave11 and slave12 are correctly linked to slave1. Yippee.

Why not some additional check with some data…
$ #Connection from master
$ psql -p 5432 postgres
postgres=# create table aa (a int);
CREATE TABLE
postgres=# insert into aa values (1),(2);
INSERT 0 2
postgres=# select * from aa;
a
---
1
2
(2 rows)
postgres=# \q

Has the slave of a slave been updated? Connection to let’s say… slave12
$ psql -p 25433 postgres
postgres=# select * from aa;
a
---
1
2
(2 rows)

OK, that rocks. I let you imagine then how to use that as an HA solution ;) .

The post presents a proposal to implement an HA solution based on PostgreSQL streaming replication and Standby node structure. This solution is still in construction, so the final implementation design that will be chosen for Postgres-XC may slightly change.

Before reading this post and if you are not experienced with PostgreSQL 9.0/9.1 features, you should refer to the background about PostgreSQL Master/Slave fallback.

Synchronous streaming replication for Postgres-XC

This part specifies how synchronous streaming replication will be implemented in Postgres-XC. Even if functionalities in PostgreSQL 9.1 already implemented are pretty stable, some extensions related to node communication control have to be designed to have a real HA solution.

Some background

    Postgres-XC is a multi-master database cluster based on PostgreSQL.

  • It is made of a unique global transaction manager which feeds consistently transaction IDs and snapshots in the cluster to each node.
  • Nodes are made of 2 kinds of nodes: Coordinator and Datanode.
    • Coordinators are holding other node information. A coordinator is able to communicate with other Coordinators and other Datanodes through a connection pooler. This connection pooler saves all the connection parameters to nodes (host name, port number) to be able to distribute connection with a libpq string protocol depending on database name and user name. Coordinators also have all the catalog information, and primarily the distribution information of each table in the database. With this information, Coordinator is able to push down SQL queries to correct Datanodes and then merge results that are sent back to application according a new plan type called RemoteQuery. A coordinator does not hold table data, and all the data contained each Coordinator is the same. So one Coordinator is the clone of another one.
    • Datanodes react more or less like a PostgreSQL normal instance. They hold database data. What has been added is an interface to permit Datanodes to receive from Coordinators transaction IDs, snapshots, timestamp values instead of requesting them locally.

Limitations

  • Postgres-XC does not support yet tuple relocation from one node to another (impossible to update for instance column that holds the distribution key), so this design is limited to the case where the cluster has a fixed number of master nodes (for Datanodes and Coordinators (?)).
  • It is not thought here about trying to add or delete a Datanode on the fly. By that, it means that cluster configuration is not changed in a way that it modifies the node number and data distribution.
  • With those assumptions what remains is a cluster with a fixed size
  • This specification is based on PostgreSQL 9.1, but design is though to take into account as many replication features as possible.

Specifications

Here are the list of functionalities that will be added for the support of synchronous streaming replication. Most of them concern node management, master/slave identification and slave promotion system.

  • In no way Postgres-XC nodes are thought as being able to make system calls to be able to kick a slave initdb or something like this. NO WAY!
Catalog addition

A catalog called pgxc_nodes will be added with the following columns:

  • node type
  • node ID number
  • node host name
  • node port number
  • node immediate master ID
  • connection type: replication or not?

This table has as a primary key constraint on node number and node type.
“node immediate master ID” is the node ID that a slave is using to connect to a master or another slave (case of cascade replication, not implemented in PostgreSQL 9.1 though). This catalog is created on Coordinator.

Cluster startup
  • As a limitation, all the configuration files of postgres-XC coordinator nodes only contain master Coordinator numbers.
    With that, the initialization of the catalog table pgxc_nodes is made only with data of master nodes. In the case of a master node, “node immediate master ID” is filled with 0.
  • Once the cluster is up with a fixed number of nodes, the administrator has he possibility to update pgxc_nodes catalog with slaves already on that have already there configuration files set correctly to connect to the wanted node.
SQL interface
Adding a slave node after cluster start up

Following SQL is sent to Coordinators:

CREATE [COORDINATOR | DATANODE] SLAVE
IDENTIFIED BY ID $id_num
WITH CONNECTION ('$host1:$port1',...,'$hostN:$portN')
ON MASTER $master_number.
{ REPLICATION TO [SYNCHRONOUS | ASYNCHRONOUS] }

If only 1 host/port couple is specified, the same values are applied for all the coordinators. In case multiple host/port values are specified, they have to correspond to the number of Coordinators in the cluster. The following SQL is sent to all the coordinators. If replication option is not specified, slave is contacted to get the information.

Promoting a slave as a master

Following SQL is sent to Coordinators:

ALTER [COORDINATOR | DATANODE] SLAVE id PROMOTE TO MASTER
{WITH CONNECTION ( [:$port | $host: | $host:$port] )};

This will modify pgxc_nodes like this for a coordinator for example:

  • Former tuples:
    • Master: C, ID: 1, host:localhost, port:5432, master ID: 0
    • Slave: C, ID: 4, host:localhost, port:5433, master ID: 1
  • New tuples:
    • Former master: erased
    • Slaved promoted: C, ID: 1, host:localhost, port:5432, master ID: 0

The following restrictions apply at promotion

  • Before promoting the slave as a new master it is necessary to restart slave new parameters. Postgres-XC does not take responsabilities in kicking new nodes.
  • Promotion can be made on a synchronous slave only, this check is made on pgxc_nodes.
  • Before promoting, a check is made on slave to be sure that it has not been modified from synchronous to asynchronous mode when beginning the promotion. This check is done locally on pgxc_nodes.
  • When promoting, make a check on slave node to be sure that its standby mode is off. This has to be kicked from an external utility and not by XC itself.
Changing a slave status

Following SQL is sent to Coordinators:

ALTER [COORDINATOR | DATANODE] SLAVE $id REPLICATION TO [SYNCHRONOUS | ASYNCHRONOUS];
ALTER [COORDINATOR | DATANODE] SLAVE $id ID TO $new_id;

The following rules are applied:

  • Take an exclusive lock on pgxc_table to make other backends waiting on pgxc_nodes
  • The lock is taken externally with LOCK TABLE and sent to all the Coordinators first. Then the table is updated. Then lock is released on all the Coordinators from remote Coordinator
  • When changing replication mode, connect to slave node and check if mode has effectively been changed correctly by an external application kick.
Disabling a slave node from cluster

Following SQL is Sent to Datanodes:

DROP [COORDINATOR | DATANODE] SLAVE $id;

The following rules are applied:

  • Take an exclusive lock (SHARE ROW EXCLUSIVE MODE?) on pgxc_nodes and send this lock to all the Coordinators before performing the deletion from pgxc_nodes
  • Lock is released once deletion on all the nodes is completed
Pooler process modification

Connection pooling has to be modified with following guidelines:

  • At initialization phase, Pooler fills in the catalog table pgxc_nodes with initial value in global memory context with values found in postgresql.conf: pgxc_nodes has to be accessible from postmaster and child processes.
  • When a new slave is added with a new code ID, Pooler caches this new connection data on each Coordinator once pgxc_nodes has been updated on Coordinator associated with new node ID.
  • When a slave is dropped, Pooler information cached is updated also.
  • Pooler saves in shared memory information related to master nodes at cluster initialization.

Important:

Pooler only remains in charge in distributing connections. It does not have to know if connection is to a slave or a master. This is the reponsability of postgres child as it takes a row shared lock on pgxc_nodes when beginning a transaction on certain nodes.

Postmaster child process modification
  • A child returns an error to application in case it cannot read pgxc_nodes.
  • When a postmaster child determines the list of nodes for transaction, it needs to know if current transaction is read-only or write depending on SQL. Then node list is built from information in pgxc_nodes when requesting new connections.
  • When a child postmaster uses connection information of a slave/master when taking new connections, it takes a row shared lock on the associated tuples of pgxc_nodes where it took connections. This preserves catalog modification when running a transaction on those nodes.
    • If the transaction is read-only, connections to master/slave are both possible. Choice is made with round-robin.
    • If the transaction first requested read-only connections, but launches on the way a DML, new connections are requested from pooler to necessary masters.
    • If transaction was first write, then does read operations, keep going with connections to master.
©2010-2013 Michael Paquier All content is ©Copyright of Otacoo.com 2010-2013. Privacy Policy - Terms of Use