It is planned in PostgreSQL 9.2 to have support for cascading replication, which is the ability to add slaves under other slaves. In 9.2, slave-slave connections are only asynchronous.

This feature has been introduced by this commit.
commit 5286105800c7d5902f98f32e11b209c471c0c69c
Author: Simon Riggs
Date: Tue Jul 19 03:40:03 2011 +0100
 
Cascading replication feature for streaming log-based replication.
Standby servers can now have WALSender processes, which can work with
either WALReceiver or archive_commands to pass data. Fully updated
docs, including new conceptual terms of sending server, upstream and
downstream servers. WALSenders terminated when promote to master.
 
Fujii Masao, review, rework and doc rewrite by Simon Riggs

So let’s try to use it with the following configuration.

Cascading replication graph

Data files of master, slaves and archive folder are located in $HOME/bin/postgres.
You need to checkout the master branch of PostgreSQL GIT to try this feature.

The settings are pretty similar to what you can find here.
However, you have to take care of the following settings in postgresql.conf:

  • Master
    wal_level = hot_standby
    archive_mode = on
    archive_command = 'cp -i %p $HOME/bin/postgres/archive/%f'
    max_wal_senders = 10
  • Slave 1
    hot_standby = on
    port = 15432
  • Slave 2
    hot_standby = on
    port = 15433
  • Slave 1-1
    hot_standby = on
    port = 25432
  • Slave 1-2
    hot_standby = on
    port = 25433

You can setup slave1, and slave2 as synchronously with master by using synchronous_standby_names in postgresql.conf of master.
This is set a little bit later after initializing the master backup.

You need to change recovery.conf of slave 1 and slave 2 with those parameters:
standby_mode = on
primary_conninfo = 'host=localhost port=5432 application_name=$SLAVE_NAME'
restore_command = 'cp -i $HOME/bin/postgres/archive/%f %p'

$SLAVE_NAME being slave1 for slave 1, and slave2 for slave 2.

Then as, slave 1-1 and 1-2 have to connect to slave 1, use the same values as above for standby_mode and restore_command, but setup primary_conninfo like this:
primary_conninfo = 'host=localhost port=15432 application_name=$SLAVE_NAME'
$SLAVE_NAME being slave11 for slave 1-1, and slave12 for slave 1-2.

Here is the dirty script used to start up the system. This has been written quickly, so sorry for the bad quality :) .
#!/bin/bash
#Master has port 5432
#Slave1 has port 15432
#Slave2 has port 15433
#Slave1 has port 25432
#Slave2 has port 25433
 
PSQL_FOLDER=$HOME/bin/postgres
PSQL_BIN=$PSQL_FOLDER/bin #Binary folder
PSQL_CONFIG=$PSQL_FOLDER/config #Folder containing all the configuration files
PSQL_MASTER=$PSQL_FOLDER/master #Master data folder
 
PSQL_SLAVE1=$PSQL_FOLDER/slave1 #Slave 1 data folder
PSQL_SLAVE2=$PSQL_FOLDER/slave2 #Slave 2 data folder
PSQL_SLAVE11=$PSQL_FOLDER/slave11 #Slave 11 data folder
PSQL_SLAVE12=$PSQL_FOLDER/slave12 #Slave 12 data folder
PSQL_ARCHIVE=$PSQL_FOLDER/archive #Archive folder
 
#clean up, take down violently all the processes
killall postgres
rm -rf $PSQL_MASTER $PSQL_SLAVE1 $PSQL_SLAVE2 $PSQL_SLAVE11 $PSQL_SLAVE12 $PSQL_ARCHIVE
mkdir $PSQL_MASTER $PSQL_SLAVE1 $PSQL_SLAVE2 $PSQL_SLAVE11 $PSQL_SLAVE12 $PSQL_ARCHIVE
sleep 1
 
#Initialize master
$PSQL_BIN/initdb -D $PSQL_MASTER
cp $PSQL_CONFIG/postgresql.conf.master $PSQL_MASTER/postgresql.conf
cp $PSQL_CONFIG/pg_hba.conf.master $PSQL_MASTER/pg_hba.conf
 
#Start master
$PSQL_BIN/postgres -D $PSQL_MASTER &
#Wait a little before server start up, let it finish initialization
echo "Master started"
sleep 2
 
#Initialize slave
#This is used to start the backup so as slave does not have to recover from
#scratch when being build. It definitely accelerates standby start up
$PSQL_BIN/psql postgres -c "select pg_start_backup('backup')"
cp -pr $PSQL_MASTER/* $PSQL_SLAVE1/
cp -pr $PSQL_MASTER/* $PSQL_SLAVE2/
cp -pr $PSQL_MASTER/* $PSQL_SLAVE11/
cp -pr $PSQL_MASTER/* $PSQL_SLAVE12/
$PSQL_BIN/psql postgres -c "select pg_stop_backup()"
echo "Backup performed"
sleep 1
 
#Setup synchronous commit mode on master
echo "synchronous_standby_names='slave1,slave2'" >> $PSQL_MASTER/postgresql.conf
$PSQL_BIN/pg_ctl reload -D $PSQL_MASTER
echo "Force master to synchronize mode with slave1 (priority 1) and slave 2 (priority 2)"
sleep 1
 
#Then finish by copying all the configuration files for slaves
cp $PSQL_CONFIG/postgresql.conf.slave1 $PSQL_SLAVE1/postgresql.conf
cp $PSQL_CONFIG/recovery.conf.slave1 $PSQL_SLAVE1/recovery.conf
cp $PSQL_CONFIG/postgresql.conf.slave2 $PSQL_SLAVE2/postgresql.conf
cp $PSQL_CONFIG/recovery.conf.slave2 $PSQL_SLAVE2/recovery.conf
cp $PSQL_CONFIG/postgresql.conf.slave11 $PSQL_SLAVE11/postgresql.conf
cp $PSQL_CONFIG/recovery.conf.slave11 $PSQL_SLAVE11/recovery.conf
cp $PSQL_CONFIG/postgresql.conf.slave12 $PSQL_SLAVE12/postgresql.conf
cp $PSQL_CONFIG/recovery.conf.slave12 $PSQL_SLAVE12/recovery.conf
 
#Delete unnecessary xlog files and postmaster pid files
rm $PSQL_SLAVE1/postmaster.pid
rm -r $PSQL_SLAVE1/pg_xlog/*
rm $PSQL_SLAVE2/postmaster.pid
rm -r $PSQL_SLAVE2/pg_xlog/*
rm $PSQL_SLAVE11/postmaster.pid
rm -r $PSQL_SLAVE11/pg_xlog/*
rm $PSQL_SLAVE12/postmaster.pid
rm -r $PSQL_SLAVE12/pg_xlog/*
 
#Start Slave 1
chmod 700 $PSQL_SLAVE1
$PSQL_BIN/postgres -D $PSQL_SLAVE1 &
echo "Slave 1 started"
 
#Start Slave 2
chmod 700 $PSQL_SLAVE2
$PSQL_BIN/postgres -D $PSQL_SLAVE2 &
echo "Slave 2 started"
 
#Start Slave 11
chmod 700 $PSQL_SLAVE11
$PSQL_BIN/postgres -D $PSQL_SLAVE11 &
echo "Slave 11 started"
 
#Start Slave 12
chmod 700 $PSQL_SLAVE12
$PSQL_BIN/postgres -D $PSQL_SLAVE12 &
echo "Slave 12 started"
 
exit 0

Configuration files and scripts can be found here.

OK now let’s check if it works well.
$ psql postgres
postgres=# select application_name,state,sync_priority,sync_state from pg_stat_replication;
application_name | state | sync_priority | sync_state
------------------+-----------+---------------+------------
slave1 | streaming | 1 | sync
slave2 | streaming | 2 | potential
(2 rows)

On master, slave1 has priority 1 for synchronization (synchronous_standby_nodes has been set up ‘slave1,slave2′). It looks to be correctly synchronized.

Then let’s do the same check from slave1.
$ psql -p 15432 postgres
postgres=# select application_name,state,sync_priority,sync_state from pg_stat_replication;
application_name | state | sync_priority | sync_state
------------------+-----------+---------------+------------
slave12 | streaming | 0 | async
slave11 | streaming | 0 | async
(2 rows)

slave11 and slave12 are correctly linked to slave1. Yippee.

Why not some additional check with some data…
$ #Connection from master
$ psql -p 5432 postgres
postgres=# create table aa (a int);
CREATE TABLE
postgres=# insert into aa values (1),(2);
INSERT 0 2
postgres=# select * from aa;
a
---
1
2
(2 rows)
postgres=# \q

Has the slave of a slave been updated? Connection to let’s say… slave12
$ psql -p 25433 postgres
postgres=# select * from aa;
a
---
1
2
(2 rows)

OK, that rocks. I let you imagine then how to use that as an HA solution ;) .

This study is made with PostgreSQL 9.1.1, released a couple of days before this post is written.
Unlogged tables are a new performance feature of PostgreSQL 9.1, created by Robert Hass. So, by guessing from this feature name, those tables are not logged in the database system :) . More precisely, those tables do not use at all WAL (Write ahead log) that insure a safe database crash.
Those tables are a good performance gain to contain data that do not especially need to survive from a crash, they are truncated automatically after a crash or unclean shutdown.
Unlogged tables are shared among sessions, and are not deleted when a session ends. Autovacuum runs on them.

So, in what cases could you use it.

  • On web applications, for session parameters
  • Data caching (Web page caching, why not?)
  • Application status, imagine that you add a on/off lock switch on your application that an admin could modify at will. This is not necessary at database server crash and could be reinitialized at a default value if necessary.
  • And many other things

In order to define an unlogged table, you need to use a new extension keyword called UNLOGGED (surprise!).
CREATE UNLOGGED TABLE aa (a int, b int);

This is a performance feature, so let’s see how much gain you could expect with pgbench.
Environment used is a 2.6GHz Dual core i5 with 4GB of memory.
PostgreSQL server has the following settings:

  • shared_buffers = 1GB
  • synchronous_commit = off
  • checkpoint_segments = 32
  • checkpoint_completion_target = 0.9

By default, pgbench is not able to use unlogged tables, so the code has been a bit modified to change all DDL definitions when tests are made on unlogged tables.
First, pgbench can be found in contrib directory. Once installed, you can initialize with pgbench with the following commands:
createdb benchtest
pgbench -i -s $SCALE_FACTOR benchtest

SCALE_FACTOR is used at 10 and 100 for this study. Roughly, it represents the number of tables. I do not advice using default value to avoid lock contention.

Then you can launch pgbench with commands like:
pgbench -c $CLIENT_NUM -T 300 benchtest
CLIENT_NUM is the number of clients connected to the database. Here we use successively 1, 24 and 48.
For each configuration, 5 tests of a duration of 5 minutes are made. The lowest and highest values are not taken into account, and the average based on the other values is calculated.

Here are the results found in TPS (transaction/second).

Clients Scale factor Normal tables Unlogged tables Gain (Unlogged – Perm)/avg(Unlogged, Perm)
1 10 561.63 632.55 11.87%
24 10 1419.30 1678.23 16.71%
48 10 1323.78 1555.40 16.08%
1 100 510.25 436.87 13.22%
24 100 1252.38 1493.44 17.55%
48 100 1260.09 1462.92 14.89%

So in short, in the environment tested unlogged tables have shown an increase of output by 13~17%.

©2010-2013 Michael Paquier All content is ©Copyright of Otacoo.com 2010-2013. Privacy Policy - Terms of Use