Postgres-XC, write-scalable multi-master symetric cluster based on PostgreSQL, version XC 1.0beta1 has been released.
This beta version is based on PostgreSQL 9.1beta2, and all the fixes of PostgreSQL 9.1 stable branch will be backported in Postgres-XC 1.0 stabilized version.
It is under PostgreSQL license.

You can download the binary directly from here.
This tarball includes all the HTML and man documentation.

A PDF file containing all the references is also available here: Reference PDF.

Compared to version 0.9.7, the following features have been added:

  • Fast query shipping (FQS), quick identification of expressions in a query that can be pushed down to remote nodes
  • SERIAL types
  • TABLESPACE
  • Utility to clean up 2PC transactions in cluster (pgxc_clean)
  • Utility for initialization of GTM (global transaction manager, utility called initgtm)
  • Relation-size functions
  • Regression stabilization

The project can be followed on Source Forge.
And we use a couple of GIT repositories for development:

Postgres-XC tutorial at PGCon in Ottawa this May will use a 1.0 version, so be sure to touch this beta version to have an idea of what is Postgres-XC before coming!
Since the last release, a special effort has been made on stabilization and performance improvement, but be sure to give your feedback in order to provide a stable 1.0 release for PostgreSQL community.

The release of Postgres-XC 0.9.7 has been done in Source Forge and in GIT.
Source forge page is here.
The source tarball and manuals are available here.

Postgres-XC 0.9.7, based on PostgreSQL 9.1, contains a bunch of new features:

  • SELECT INTO/CREATE TABLE AS
  • INSERT SELECT complete support
  • Node subsets and sub clusters (data of table distributed on a portion of nodes)
  • Cluster node management with SQL and catalog extensions
  • Dynamic reload of pool connection information
  • Window functions
  • DML remote planning (partial expression push-down)
  • DEFAULT values (non-shippable expressions: volatile functions, etc.) => Use of DEFAULT nextval(‘seq’) available.
  • GTM Standby synchronous, asynchronous recovery

Compared to Postgres-XC 0.9.6, in 0.9.7 the cluster setting is mainly SQL-based and does not use GUC params anymore. So be sure to read the Install manual before creating a cluster.
HTML documentation and man pages are also incorporated in the downloadable tarball, and the reference PDF document is built from the same SGML source.

The code is also available on GIT:
git://postgres-xc.git.sourceforge.net/gitroot/postgres-xc/postgres-xc
A new maintenance branch called XC0_9_7_PG9_1 only for 0.9.7.

With last week’s commit:
commit 2a406e56dea3e750e74cc38115e83e30217b4822
Author: Michael P
Date: Thu Dec 8 13:50:25 2011 +0900
 
Simplify node DDL grammar and supress slave management part
 
New grammar uses WITH clause of CREATE TABLE in this manner:
CREATE/ALTER NODE nodename WITH (
[ TYPE = ('coordinator' | 'datanode'),]
[ HOST = 'string',]
[ PORT = portnum,]
[ PRIMARY,]
[ PREFERRED ]);
This applies to CREATE/ALTER NODE.
Grammar simplification results in the deletion in related_to column
of pgxc_node catalog.
 
Documentation is updated in consequence.
 
This commit solves also an issue with variable names sharing same
format between GTM and XC nodes.

The grammar of node DDL (CREATE NODE, ALTER NODE, DROP NODE) has been simplified and made more consistent with PostgreSQL. Now those queries are adapted as follows:

  • Use of the same WITH clause as CREATE TABLE
  • No slave nodes taken into account anymore: now a slave node needs to have the same name as its master. This facilitates also failover.

In consequence, here is how to create a cluster with a freshly-started Coordinator called coord1 and 2 Datanodes called dn1 and dn2 using respectively ports 15433 and 15434.
postgres=# create node dn1 with (type = 'datanode', port = 15433, host = 'localhost');
CREATE NODE
postgres=# create node dn2 with (type = 'datanode', port = 15434, host = 'localhost');
CREATE NODE
postgres=# select pgxc_pool_reload();
pgxc_pool_reload
------------------
t
(1 row)
postgres=# select pgxc_pool_check();
pgxc_pool_check
-----------------
t
(1 row)
postgres=# select * from pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred
-----------+-----------+-----------+-----------+----------------+------------------
coord1 | C | 5432 | localhost | f | f
dn1 | D | 15433 | localhost | f | f
dn2 | D | 15434 | localhost | f | f
(3 rows)

In bonus to this article, you can find here an updated version of the script that can setup a cluster on a local machine with the following options:

  • -c to indicate the number of Coordinators
  • -n to indicate the number of Datanodes

If you came at this page, it means that you got interest in a cluster solution based on PostgreSQL.
Currently developed for version 0.9.7, Postgres-XC has been largely improved with the way cluster is being set.

Just lately, I committed this commit.
Support for dynamic pooler/session connection information cache reload
 
A new system function called pgxc_pool_reload has been added.
If called, this function reloads connection information to remote nodes
in a consistent way with the following process:
1) A lock is taken on pooler forbidding new connection requests
2) Database pools (user and database-dependant pools) are reloaded
depending on the node information located on catalog pgxc_node.
The following rules are followed depending on node connection
information modification:
- node whose node and port value is changed has its connections
dropped and this node pool is deleted from each database pool
- node deleted is deleted from each database pool
- node unchanged is kept as is. However, its index value is changed
depending on the new cluster configuration.
- node created is added to each database pool
3) Lock is released
4) Session that invocated pgxc_pool_reload signals all the other
server sessions to reconnect to pooler to allow each agent to update
with newest connection information and reload session information
related to remote node handles. This has as effect to abort current
transactions and to remove all the temporary and prepared objects
on session. Then a WARNING message is sent back to client to inform
about the cluster configuration modification.
5) Session that invocated pgxc_pool_reload reconnects to pooler by
itself and reloads its session information related to remote
node handles. No WARNING message is sent back to client to inform
about the session reload.
This operation is limited to local Coordinator and returns a boolean
depending on the success of the operation. If pooler data is consistent
with catalog information when pgxc_pool_reload is invocated, nothing is
done but a success message is returned.
 
This has the following simplifications for cluster settings:
- cluster_nodes.sql is deleted.
- a new mandatory option --nodename is used to specify the node name
of the node initialized. This allows to set up pgxc_node catalog
with the node itself. pgxc_node_name in postgresql.conf is also
set automatically.
- CREATE/ALTER/DROP node are launched on local Coordinator only, meaning
that when a cluster is set up, it is necessary to create node information
on each Coordinator and then upload this information to pooler and sessions
by invocaing pgxc_pool_reload.
 
This optimization avoids to have to restart a Coordinator when changing
cluster configuration and solves security problems related to cluster_nodes.sql
that could be edited with all types of SQL even if its first target was only NODE
DDL.

So what is behing this looooong commit text? Well, it is a feature that will simplify your life.
It is strongly related the feature called Node DDL that has been committed at the end of October. Just to recall, node DDL is a feature allowing to manage the cluster nodes with catalog tables such as you don’t have to bother about heavy settings in postgresql.conf. However, even if node DDL have been supported, it does not mean that dropping, creating or altering a node is visible to the connection pooling. You had to restart a node, increasing by that much the downtime of each Coordinators.

This commit, in one word, introduces this => pgxc_pool_reload. It is a new system function used to check whose details are described here used to reload all the server sessions and pooler connection information without having to restart a Coordinator. In other words, it simplifies the way to set up a cluster.

Now let’s enter in the main subject: the cluster setting, what can be done with the following steps:

  • Initialize the nodes with initdb
  • Create a global transaction manager and start it
  • Start up all the nodes
  • Connect to a Coordinator
  • Create all the nodes initialized with node DDL
  • Reload connection data with “select pgxc_node_reload();”

Here are a couple of details:

  • There is a new mandatory option in initdb called –nodename that is used to setup the name of the node being initialized. This is a Postgres-XC specific option. This option is used to define itself in pgxc_node catalog the node being initialized. It also sets automatically pgxc_node_name in postgresql.conf.
  • You can check the consistency of the information cached in pooler and catalogs by calling the system function pgxc_pool_check. It returns a boolean on operation success or failure.
  • The specifications of node DDL is located at those pages: CREATE NODE, DROP NODE and ALTER NODE
  • Invocating pgxc_pool_reload aborts the current transaction, and drops all the prepared and temporary objects in session. This is effective in all the session of the server
  • Node DDL run locally, so you need to launch the same node DDL on all Coordinators of the cluster. This allows more smoothness in case Coordinators view the same Datanode with different IPs.

It is also possible to manipulate cluster nodes even after initialization. It doesn’t matter how many times you change it as long as pgxc_pool_reload is used to update data cached in sessions and connection pool.

Here is also a bonus, a script that you can use to setup easily a cluster with a chosen number of Coordinators and Datanodes on a local machine. Port numbers are fixed, but it helps in trying Postgres-XC.
#!/bin/bash
#Otacoo.com
 
#Build cluster from scratch and run pg_regress
#1) Build the XC cluster: 1GTM with Coordinators (default 1) and Datanodes (default 2) defined
#2) Run pg_regress if wanted
 
#Take and check options
EXPECTED_ARGS=0
FLAG_REGRESS=0
NUM_COORDS=1
NUM_DATANODES=2
 
#Treat options
while getopts 'c:n:r' OPTION
do
  case $OPTION in
  c) #Number of Coordinators
    NUM_COORDS="$OPTARG"
    EXPECTED_ARGS=$(($EXPECTED_ARGS + 2))
    ;;
  n) #Number of Datanodes
    NUM_DATANODES="$OPTARG"
    EXPECTED_ARGS=$(($EXPECTED_ARGS + 2))
    ;;
  r) #Run regressions or not?
    FLAG_REGRESS=1
    EXPECTED_ARGS=$(($EXPECTED_ARGS + 1))
    ;;
  ?) echo "Usage: `basename $0` [-c num_coords] [-n num datanodes] [-r]\n"
    echo "Example: `basename $0` -c 4 -n 4 -r"
    exit 0
    ;;
  esac
done
 
#Check number of arguments
if [ $# -ne $EXPECTED_ARGS ]
then
  echo "Usage: `basename $0` [-c num_coords] [-n num datanodes] [-r]\n"
  echo "Example: `basename $0` -c 4 -n 4 -r"
  exit 1
fi
 
#Setup Default values
#GTM has a unique value
#Coordinator ports are mapped from 5432
#Datanode ports are mapped from 15432
#All the machines run on local host
COORD_PORT_START=5431
DN_PORT_START=15432
COORD_PORTS[1]=$COORD_PORT_START
DN_PORTS[1]=$DN_PORT_START
for i in $(seq 1 $NUM_COORDS)
do
  COORD_PORTS[$i]=$(($COORD_PORT_START + $i))
done
for i in $(seq 1 $NUM_DATANODES)
do
  DN_PORTS[$i]=$(($DN_PORT_START + $i))
done
GTM_PORT=7777
PSQL_FOLDER=$HOME/pgsql
 
#Finish calculating dependencies between folders
PSQL_SHARE=$PSQL_FOLDER/share
PSQL_BIN=$PSQL_FOLDER/bin
GTM_DATA=$PSQL_FOLDER/gtm
LOG_DATA=$PSQL_FOLDER/log
 
#Setup data folders
for i in $(seq 1 $NUM_COORDS)
do
  COORD_DATAS[$i]=$PSQL_FOLDER/coord$i
done
for i in $(seq 1 $NUM_DATANODES)
do
  DN_DATAS[$i]=$PSQL_FOLDER/datanode$i
done
 
#Kill all the processes that may remain
#in the most atrocious way possible as they meritated it
#OK this is not very clean...
echo "Take out Postgres-XC processes"
kill -9 `ps ux | grep "bin/gtm" | cut -d " " -f 2-3`
killall postgres gtm psql
sleep 2
 
#Check if data folders exist or not and create them
echo "Creating data folders"
for folder in $GTM_DATA $LOG_DATA ${COORD_DATAS[@]} ${DN_DATAS[@]}
do
  if [ ! -d $CODE_REPO_GIT ]
  then
    mkdir $folder
  fi
done
 
#Clean up all the data folders
echo "Clean up data folders"
for folder in $GTM_DATA $LOG_DATA ${COORD_DATAS[@]} ${DN_DATAS[@]}
do
  rm -r $folder/*
done
sleep 1
 
#OK, let's begin the show...
 
#make initialization
echo "Initializing PGXC nodes"
for i in $(seq 1 $NUM_DATANODES)
do
  $PSQL_BIN/initdb --locale=POSIX --nodename dn$i -D ${DN_DATAS[$i]}
done
for i in $(seq 1 $NUM_COORDS)
do
  $PSQL_BIN/initdb --locale=POSIX --nodename coord$i -D ${COORD_DATAS[$i]}
done
 
#copy all configuration files to remote machin
echo "Copy of configuration files"
#Create an empty GTM conf file and add host/port data
touch $GTM_DATA/gtm.conf
echo "nodename = 'one'" >> $GTM_DATA/gtm.conf
echo "listen_addresses = '*'" >> $GTM_DATA/gtm.conf
echo "port = 7777" >> $GTM_DATA/gtm.conf
echo "log_file = 'gtm.log'" >> $GTM_DATA/gtm.conf
 
#Node common settings
OPTIONS="logging_collector = on\n"\
"gtm_port = $GTM_PORT\n"\
"datestyle = 'postgres, mdy'\n"\
"timezone = 'PST8PDT'\n"\
"default_text_search_config = 'pg_catalog.english'\n"\
"log_statement = 'all'\n"\
"log_min_messages = debug1\n"\
"log_min_error_statement = debug1\n"\
"max_prepared_transactions = 20\n"
 
#Pooler options
POOLER_BASE_PORT=6667
#Coordinator settings
for i in $(seq 1 $NUM_COORDS)
do
  echo -e $OPTIONS >> ${COORD_DATAS[$i]}/postgresql.conf
  POOLER_NUM=$(($POOLER_BASE_PORT + $i))
  echo -e "pooler_port = $POOLER_NUM\n" >> ${COORD_DATAS[$i]}/postgresql.conf
done
#Datanode settings
for i in $(seq 1 $NUM_DATANODES)
do
  echo -e $OPTIONS >> ${DN_DATAS[$i]}/postgresql.conf
done
 
#launch gtm
echo "launch GTM"
$PSQL_BIN/gtm -x 10000 -D $GTM_DATA &
sleep 1
 
#launch datanodes
echo "launch Datanodes..."
for i in $(seq 1 $NUM_DATANODES)
do
  $PSQL_BIN/postgres -X -i -p ${DN_PORTS[$i]} -D ${DN_DATAS[$i]} > $LOG_DATA/datanode$i.log &
done
sleep 1
 
#launch coordinators
echo "launching Coordinators..."
for i in $(seq 1 $NUM_COORDS)
do
  $PSQL_BIN/postgres -C -i -p ${COORD_PORTS[$i]} -D ${COORD_DATAS[$i]} > $LOG_DATA/coord$i.log &
done
sleep 1
 
#Initialize Coordinators with cluster data
echo "initializing Coordinators..."
for i in $(seq 1 $NUM_COORDS)
do
  #Datanode connection info
  for j in $(seq 1 $NUM_DATANODES)
  do
    NODE_NAME=dn$j
    NODE_PORT=${DN_PORTS[$j]}
    $PSQL_BIN/psql -p ${COORD_PORTS[$i]} -c "CREATE NODE $NODE_NAME WITH (HOSTIP = 'localhost', NODE MASTER, NODEPORT = $NODE_PORT);" postgres
  done
  #Other Coordinator info
  for j in $(seq 1 $NUM_COORDS)
  do
    if [ "$i" -eq "$j" ]
    then
      continue
    fi
    NODE_NAME=coord$j
    NODE_PORT=${COORD_PORTS[$j]}
    $PSQL_BIN/psql -p ${COORD_PORTS[$i]} -c "CREATE NODE $NODE_NAME WITH (HOSTIP = 'localhost', COORDINATOR MASTER, NODEPORT = $NODE_PORT);" postgres
  done
  #reload data
  $PSQL_BIN/psql -p ${COORD_PORTS[$i]} -c "SELECT pgxc_pool_reload();" postgres
done
 
if [ "$FLAG_REGRESS" == 1 ]
then
  echo "running pg_regress"
  pgregress
fi
 
exit `echo $?`

You can also download it from this link.

Just yesterday I committed that.
commit 2aea0c2e0e01031f5dd4260b6985dc0ed4eadc50
Author: Michael P
Date: Tue Nov 15 09:54:54 2011 +0900
 
 Support for data distribution among a subset of datanodes
 
 CREATE TABLE has been extended with the following clause:
 CREATE TABLE ...
 [ TO ( GROUP groupname | NODE nodename [, ... ] ) ]
 
 This clause allows to distribute data among a subset of nodes
 listed by a node list, or a group alias.
 Node groups can be defined with CREATE NODE GROUP.
 
 The base structure for this support was added with commit
 56a90674444df1464c8e7012c6113efd7f9bc7db, but check of mapping of
 subsets of node list was still missing for the management of join
 push down and materialization evaluation in planner.

So what the hell is it??? Simply a feature that allows you to better control the data distributed among your Postgres-XC cluster.

Let’s take an example of a cluster with 2 Coordinators and 4 Datanodes.
postgres=# select oid,node_name from pgxc_node;
oid | node_name
-------+-----------
11133 | coord1
11134 | coord2
11135 | dn1
11136 | dn2
11137 | dn3
11138 | dn4
(6 rows)

Prior to this functionality, creating a table forced you to distribute the data among all the datanodes of your cluster.
postgres=# create table test (a int);
CREATE TABLE
postgres=# select nodeoids from pgxc_class where pcrelid = 'test'::regclass;
nodeoids
-------------------------
11135 11136 11137 11138
(1 row)

Now, CREATE TABLE has a new clause extension to be able to create a table only on a subset of nodes.
This is documented here.
The new clause is written as:
CREATE TABLE
...
[ TO ( GROUP groupname | NODE nodename [, ... ] ) ]

So you can specify a list of node names or a node group. A node group is simply an alias for a node list.
Let’s try it.

  • test12_rep is a replicated table whose data is distributed in nodes 1 and 2
  • test34_hash is a hash table whose data is distributed in nodes 3 and 4
  • test234_rep is a replicated table whose data is distributed in nodes 2, 3 and 4

postgres=# create table test12_rep (a int) distribute by replication to node dn1,dn2;
CREATE TABLE
postgres=# create table test34_hash (a int) distribute by hash(a) to node dn3,dn4;
CREATE TABLE
postgres=# create node group dn234 with dn2,dn3,dn4;
CREATE NODE GROUP
postgres=# create table test234_rep (a int) distribute by replication to group dn234;
CREATE TABLE
-- Check the node subset for distribution
postgres=# select nodeoids from pgxc_class where pcrelid = 'test12_rep'::regclass;
nodeoids
-------------
11135 11136
(1 row)
postgres=# select nodeoids from pgxc_class where pcrelid = 'test34_hash'::regclass;
nodeoids
-------------
11137 11138
(1 row)
postgres=# select nodeoids from pgxc_class where pcrelid = 'test234_rep'::regclass;
nodeoids
-------------------
11136 11137 11138
(1 row)

Now let’s insert some data.
postgres=# insert into test12_rep values (1),(2),(3);
INSERT 0 3
postgres=# insert into test234_rep values (1),(2),(3);
INSERT 0 3
postgres=# insert into test34_hash values (1),(2),(3);
INSERT 0 3

Then is data of test12_rep correctly distributed?
postgres=# execute direct on node dn1 'select * from test12_rep';
a
---
1
2
3
(3 rows)
postgres=# execute direct on node dn2 'select * from test12_rep';
a
---
1
2
3
(3 rows)
postgres=# execute direct on node dn3 'select * from test12_rep';
a
---
(0 rows)
postgres=# execute direct on node dn4 'select * from test12_rep';
a
---
(0 rows)

test12_rep is only replicated in nodes dn1 and dn2 only.

Let’s do the same checks for test234_rep and test34_hash.
--First for test34_hash
postgres=# execute direct on node dn1 'select * from test34_hash';
a
---
(0 rows)
postgres=# execute direct on node dn2 'select * from test34_hash';
a
---
(0 rows)
postgres=# execute direct on node dn3 'select * from test34_hash';
a
---
1
2
(2 rows)
postgres=# execute direct on node dn4 'select * from test34_hash';
a
---
3
(1 row)
--Then for test234_rep
postgres=# execute direct on node dn1 'select * from test234_rep';
a
---
(0 rows)
postgres=# execute direct on node dn2 'select * from test234_rep';
a
---
1
2
3
(3 rows)
postgres=# execute direct on node dn3 'select * from test234_rep';
a
---
1
2
3
(3 rows)
postgres=# execute direct on node dn4 'select * from test234_rep';
a
---
1
2
3
(3 rows)

So test234_rep is correctly replicated in nodes 2, 3 and 4. test34_hash is correctly hash-partitioned in nodes 3 and 4.

Now let’s do some join and push down checks.
postgres=# explain verbose select a from test34_hash join test234_rep using (a);
QUERY PLAN
-------------------------------------------------------------------
 Data Node Scan (Node Count [2]) (cost=0.00..0.00 rows=0 width=0)
  Output: test34_hash.a
(2 rows)

In this case replicated table test234_rep is completely mapped by test34_hash so a push down is possible to nodes 3 and 4 directly.

postgres=# explain verbose select a from test34_hash join test12_rep using (a);
QUERY PLAN
------------------------------------------------------
 Nested Loop (cost=0.00..2.04 rows=1 width=4)
  Output: test34_hash.a
  Join Filter: (test34_hash.a = test12_rep.a)
  -> Materialize (cost=0.00..1.01 rows=1 width=4)
      Output: test34_hash.a
      -> Data Node Scan (Node Count [2]) on test34_hash (cost=0.00..1.01 rows=1000 width=4)
        Output: test34_hash.a
  -> Materialize (cost=0.00..1.01 rows=1 width=4)
      Output: test12_rep.a
      -> Data Node Scan (Node Count [1]) on test12_rep (cost=0.00..1.01 rows=1000 width=4)
        Output: test12_rep.a
(11 rows)

In this case test34_hash and test12_rep are distributed on a disjoint list of nodes, so performing a join needs to first fetch data from Datanodes then materialize it on Coordinator.

There is still no way to change the table distribution type or the node list after table creation. This is one of the next plans, based on ALTER TABLE this time.

©2010-2013 Michael Paquier All content is ©Copyright of Otacoo.com 2010-2013. Privacy Policy - Terms of Use