It is planned in PostgreSQL 9.2 to have support for cascading replication, which is the ability to add slaves under other slaves. In 9.2, slave-slave connections are only asynchronous.

This feature has been introduced by this commit.
commit 5286105800c7d5902f98f32e11b209c471c0c69c
Author: Simon Riggs
Date: Tue Jul 19 03:40:03 2011 +0100
 
Cascading replication feature for streaming log-based replication.
Standby servers can now have WALSender processes, which can work with
either WALReceiver or archive_commands to pass data. Fully updated
docs, including new conceptual terms of sending server, upstream and
downstream servers. WALSenders terminated when promote to master.
 
Fujii Masao, review, rework and doc rewrite by Simon Riggs

So let’s try to use it with the following configuration.

Cascading replication graph

Data files of master, slaves and archive folder are located in $HOME/bin/postgres.
You need to checkout the master branch of PostgreSQL GIT to try this feature.

The settings are pretty similar to what you can find here.
However, you have to take care of the following settings in postgresql.conf:

  • Master
    wal_level = hot_standby
    archive_mode = on
    archive_command = 'cp -i %p $HOME/bin/postgres/archive/%f'
    max_wal_senders = 10
  • Slave 1
    hot_standby = on
    port = 15432
  • Slave 2
    hot_standby = on
    port = 15433
  • Slave 1-1
    hot_standby = on
    port = 25432
  • Slave 1-2
    hot_standby = on
    port = 25433

You can setup slave1, and slave2 as synchronously with master by using synchronous_standby_names in postgresql.conf of master.
This is set a little bit later after initializing the master backup.

You need to change recovery.conf of slave 1 and slave 2 with those parameters:
standby_mode = on
primary_conninfo = 'host=localhost port=5432 application_name=$SLAVE_NAME'
restore_command = 'cp -i $HOME/bin/postgres/archive/%f %p'

$SLAVE_NAME being slave1 for slave 1, and slave2 for slave 2.

Then as, slave 1-1 and 1-2 have to connect to slave 1, use the same values as above for standby_mode and restore_command, but setup primary_conninfo like this:
primary_conninfo = 'host=localhost port=15432 application_name=$SLAVE_NAME'
$SLAVE_NAME being slave11 for slave 1-1, and slave12 for slave 1-2.

Here is the dirty script used to start up the system. This has been written quickly, so sorry for the bad quality :) .
#!/bin/bash
#Master has port 5432
#Slave1 has port 15432
#Slave2 has port 15433
#Slave1 has port 25432
#Slave2 has port 25433
 
PSQL_FOLDER=$HOME/bin/postgres
PSQL_BIN=$PSQL_FOLDER/bin #Binary folder
PSQL_CONFIG=$PSQL_FOLDER/config #Folder containing all the configuration files
PSQL_MASTER=$PSQL_FOLDER/master #Master data folder
 
PSQL_SLAVE1=$PSQL_FOLDER/slave1 #Slave 1 data folder
PSQL_SLAVE2=$PSQL_FOLDER/slave2 #Slave 2 data folder
PSQL_SLAVE11=$PSQL_FOLDER/slave11 #Slave 11 data folder
PSQL_SLAVE12=$PSQL_FOLDER/slave12 #Slave 12 data folder
PSQL_ARCHIVE=$PSQL_FOLDER/archive #Archive folder
 
#clean up, take down violently all the processes
killall postgres
rm -rf $PSQL_MASTER $PSQL_SLAVE1 $PSQL_SLAVE2 $PSQL_SLAVE11 $PSQL_SLAVE12 $PSQL_ARCHIVE
mkdir $PSQL_MASTER $PSQL_SLAVE1 $PSQL_SLAVE2 $PSQL_SLAVE11 $PSQL_SLAVE12 $PSQL_ARCHIVE
sleep 1
 
#Initialize master
$PSQL_BIN/initdb -D $PSQL_MASTER
cp $PSQL_CONFIG/postgresql.conf.master $PSQL_MASTER/postgresql.conf
cp $PSQL_CONFIG/pg_hba.conf.master $PSQL_MASTER/pg_hba.conf
 
#Start master
$PSQL_BIN/postgres -D $PSQL_MASTER &
#Wait a little before server start up, let it finish initialization
echo "Master started"
sleep 2
 
#Initialize slave
#This is used to start the backup so as slave does not have to recover from
#scratch when being build. It definitely accelerates standby start up
$PSQL_BIN/psql postgres -c "select pg_start_backup('backup')"
cp -pr $PSQL_MASTER/* $PSQL_SLAVE1/
cp -pr $PSQL_MASTER/* $PSQL_SLAVE2/
cp -pr $PSQL_MASTER/* $PSQL_SLAVE11/
cp -pr $PSQL_MASTER/* $PSQL_SLAVE12/
$PSQL_BIN/psql postgres -c "select pg_stop_backup()"
echo "Backup performed"
sleep 1
 
#Setup synchronous commit mode on master
echo "synchronous_standby_names='slave1,slave2'" >> $PSQL_MASTER/postgresql.conf
$PSQL_BIN/pg_ctl reload -D $PSQL_MASTER
echo "Force master to synchronize mode with slave1 (priority 1) and slave 2 (priority 2)"
sleep 1
 
#Then finish by copying all the configuration files for slaves
cp $PSQL_CONFIG/postgresql.conf.slave1 $PSQL_SLAVE1/postgresql.conf
cp $PSQL_CONFIG/recovery.conf.slave1 $PSQL_SLAVE1/recovery.conf
cp $PSQL_CONFIG/postgresql.conf.slave2 $PSQL_SLAVE2/postgresql.conf
cp $PSQL_CONFIG/recovery.conf.slave2 $PSQL_SLAVE2/recovery.conf
cp $PSQL_CONFIG/postgresql.conf.slave11 $PSQL_SLAVE11/postgresql.conf
cp $PSQL_CONFIG/recovery.conf.slave11 $PSQL_SLAVE11/recovery.conf
cp $PSQL_CONFIG/postgresql.conf.slave12 $PSQL_SLAVE12/postgresql.conf
cp $PSQL_CONFIG/recovery.conf.slave12 $PSQL_SLAVE12/recovery.conf
 
#Delete unnecessary xlog files and postmaster pid files
rm $PSQL_SLAVE1/postmaster.pid
rm -r $PSQL_SLAVE1/pg_xlog/*
rm $PSQL_SLAVE2/postmaster.pid
rm -r $PSQL_SLAVE2/pg_xlog/*
rm $PSQL_SLAVE11/postmaster.pid
rm -r $PSQL_SLAVE11/pg_xlog/*
rm $PSQL_SLAVE12/postmaster.pid
rm -r $PSQL_SLAVE12/pg_xlog/*
 
#Start Slave 1
chmod 700 $PSQL_SLAVE1
$PSQL_BIN/postgres -D $PSQL_SLAVE1 &
echo "Slave 1 started"
 
#Start Slave 2
chmod 700 $PSQL_SLAVE2
$PSQL_BIN/postgres -D $PSQL_SLAVE2 &
echo "Slave 2 started"
 
#Start Slave 11
chmod 700 $PSQL_SLAVE11
$PSQL_BIN/postgres -D $PSQL_SLAVE11 &
echo "Slave 11 started"
 
#Start Slave 12
chmod 700 $PSQL_SLAVE12
$PSQL_BIN/postgres -D $PSQL_SLAVE12 &
echo "Slave 12 started"
 
exit 0

Configuration files and scripts can be found here.

OK now let’s check if it works well.
$ psql postgres
postgres=# select application_name,state,sync_priority,sync_state from pg_stat_replication;
application_name | state | sync_priority | sync_state
------------------+-----------+---------------+------------
slave1 | streaming | 1 | sync
slave2 | streaming | 2 | potential
(2 rows)

On master, slave1 has priority 1 for synchronization (synchronous_standby_nodes has been set up ‘slave1,slave2′). It looks to be correctly synchronized.

Then let’s do the same check from slave1.
$ psql -p 15432 postgres
postgres=# select application_name,state,sync_priority,sync_state from pg_stat_replication;
application_name | state | sync_priority | sync_state
------------------+-----------+---------------+------------
slave12 | streaming | 0 | async
slave11 | streaming | 0 | async
(2 rows)

slave11 and slave12 are correctly linked to slave1. Yippee.

Why not some additional check with some data…
$ #Connection from master
$ psql -p 5432 postgres
postgres=# create table aa (a int);
CREATE TABLE
postgres=# insert into aa values (1),(2);
INSERT 0 2
postgres=# select * from aa;
a
---
1
2
(2 rows)
postgres=# \q

Has the slave of a slave been updated? Connection to let’s say… slave12
$ psql -p 25433 postgres
postgres=# select * from aa;
a
---
1
2
(2 rows)

OK, that rocks. I let you imagine then how to use that as an HA solution ;) .

This study is made with PostgreSQL 9.1.1, released a couple of days before this post is written.
Unlogged tables are a new performance feature of PostgreSQL 9.1, created by Robert Hass. So, by guessing from this feature name, those tables are not logged in the database system :) . More precisely, those tables do not use at all WAL (Write ahead log) that insure a safe database crash.
Those tables are a good performance gain to contain data that do not especially need to survive from a crash, they are truncated automatically after a crash or unclean shutdown.
Unlogged tables are shared among sessions, and are not deleted when a session ends. Autovacuum runs on them.

So, in what cases could you use it.

  • On web applications, for session parameters
  • Data caching (Web page caching, why not?)
  • Application status, imagine that you add a on/off lock switch on your application that an admin could modify at will. This is not necessary at database server crash and could be reinitialized at a default value if necessary.
  • And many other things

In order to define an unlogged table, you need to use a new extension keyword called UNLOGGED (surprise!).
CREATE UNLOGGED TABLE aa (a int, b int);

This is a performance feature, so let’s see how much gain you could expect with pgbench.
Environment used is a 2.6GHz Dual core i5 with 4GB of memory.
PostgreSQL server has the following settings:

  • shared_buffers = 1GB
  • synchronous_commit = off
  • checkpoint_segments = 32
  • checkpoint_completion_target = 0.9

By default, pgbench is not able to use unlogged tables, so the code has been a bit modified to change all DDL definitions when tests are made on unlogged tables.
First, pgbench can be found in contrib directory. Once installed, you can initialize with pgbench with the following commands:
createdb benchtest
pgbench -i -s $SCALE_FACTOR benchtest

SCALE_FACTOR is used at 10 and 100 for this study. Roughly, it represents the number of tables. I do not advice using default value to avoid lock contention.

Then you can launch pgbench with commands like:
pgbench -c $CLIENT_NUM -T 300 benchtest
CLIENT_NUM is the number of clients connected to the database. Here we use successively 1, 24 and 48.
For each configuration, 5 tests of a duration of 5 minutes are made. The lowest and highest values are not taken into account, and the average based on the other values is calculated.

Here are the results found in TPS (transaction/second).

Clients Scale factor Normal tables Unlogged tables Gain (Unlogged – Perm)/avg(Unlogged, Perm)
1 10 561.63 632.55 11.87%
24 10 1419.30 1678.23 16.71%
48 10 1323.78 1555.40 16.08%
1 100 510.25 436.87 13.22%
24 100 1252.38 1493.44 17.55%
48 100 1260.09 1462.92 14.89%

So in short, in the environment tested unlogged tables have shown an increase of output by 13~17%.

When tuning a PostgreSQL server, one the major setting parameters is the one controlling the amount of shared memory allowed with shared_buffers.
PostgreSQL has a default shared_buffers value at 32MB, what is enough for small configurations but it is said that this parameter should be set at 25% of the system’s RAM. This allows your system to keep a good performance in parallel with the database server.
So in the case of a machine with 4GB of RAM, you should set shared_buffers at 1GB.

In the case of ubuntu servers, you may find the following error when starting a PostgreSQL instance.
FATAL: could not create shared memory segment: 無効な引数です
DETAIL: Failed system call was shmget(key=5432001, size=1122263040, 03600).
HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter. You can either reduce the request size or reconfigure the kernel with larger SHMMAX. To reduce the request size (currently 1122263040 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.
If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter, in which case raising the request size or reconfiguring SHMMIN is called for.
The PostgreSQL documentation contains more information about shared memory configuration.

This means that Linux kernel cannot allow more shared memory than the kernel can.
In order to prevent that, customize the memory parameters of your machine kernel.
(for 1GB)
sysctl -w kernel.shmmax=1073741824
sysctl -w kernel.shmall=262144
(for 2GB)
sysctl -w kernel.shmmax=2147483648
sysctl -w kernel.shmall=524288

You need root rights to modify those parameters.

Using sysctl will not reinitialize those parameters at reboot. For a more permanent solution, add the following lines to /etc/sysctl.conf.
(for 1GB)
kernel.shmall = 262144
kernel.shmmax = 1073741824
(for 2GB)
kernel.shmall = 524288
kernel.shmmax = 2147483648

This post presents some basics when using foreign data wrappers with PostgreSQL for external files.
FOREIGN DATA WRAPPER is a part of SQL/MED (Management of external data with SQL) and its implementation has begun since Postgres 8.4. This mechanism is based on COPY FROM to import data files directly into your database.
Those tests have been done with 9.2 (development version).

First be sure that the contrib module file_fdw is correctly installed for your server.
cd /to/postgres/folder/contrib/file_fdw
make install

At the time of this post, PostgreSQL tar just contains a fdw library for external files (file_fdw). Some complementary work for PostgreSQL fdw will be done as a development for 9.2.

If you do not install that, you may get the following error when trying to create an extension.
CREATE EXTENSION file_fdw;
ERROR: could not open extension control file "/to/install/folder/share/extension/file_fdw.control": cannot find the following file

Let’s then take a try.
First create a simple text file that will be converted. This file has a CVS format
michael@boheme:~ $ cat ~/data/test.data
1,5,a
2,4,b
3,3,c
4,2,d
5,1,e

Then time to create the extension necessary for the fdw.
template1=# CREATE EXTENSION file_fdw;
CREATE EXTENSION

Then you need to create a *server* that will pinpoint to your file on your server.
template1=# CREATE SERVER test_server FOREIGN DATA WRAPPER file_fdw;
CREATE SERVER

As a last step, you only need to create a table referred in a foreign server
template1=# CREATE FOREIGN TABLE testdata (
id1 int,
id2 int,
text1 char(1)
) SERVER test_server
OPTIONS ( filename '/home/michael/data/test.data', format 'csv' );
CREATE FOREIGN TABLE

Finally try to look at your data:
template1=# select * from testdata;
id1 | id2 | text1
-----+-----+-------
1 | 5 | a
2 | 4 | b
3 | 3 | c
4 | 2 | d
5 | 1 | e
(5 rows)

And you’re done, congrats!

© 2012 Michael Paquier All content is ©Copyright of Otacoo.com 2010-2012. Privacy Policy - Terms of Use