Postgres-XC, read&write-scalable multi-master symmetric cluster based on PostgreSQL, version 1.0.0 is released.
This project is seen as an open-source alternative to costly products such as OracleRAC. Postgres-XC is based on the code of PostgreSQL, so it can naturally use all its technologies, which are enhaunced to have a shared-nothing multi-master PostgreSQL-based database cluster.

This first stable version is based on PostgreSQL 9.1.4. All the patches in PostgreSQL 9.1 stable branch have been merged up to commit 873d1c1 (1st of June 2012).
This includes the security fix related to pg_crypto dated of 30th of May.
You can download the source tarball directly from here
This tarball contains all the HTML and man documentation.

30 bug fixes have been made since release of beta2, with some notable enhancements:

  • Support for EXTENSION is fixed
  • Stabilization of the use of slave nodes in cluster
  • Fix of a bug related to read-only transactions, improving performance by 15%
  • Support of compilation for MacOSX

About the scalability of this release, Postgres-XC 1.0.0 scales to a factor of 3 when compared to a standalone server PostgreSQL 9.1.3 on 5 nodes using a benchmark called DBT-1.

Compared to version Postgres-XC 0.9.7, the following features have been added:

  • Fast query shipping (FQS), quick identification of expressions in a query that can be pushed down to remote nodes
  • SERIAL types
  • TABLESPACE
  • Utility to clean up 2PC transactions in cluster (pgxc_clean)
  • Utility for initialization of GTM (global transaction manager, utility called initgtm)
  • Relation-size functions and locking functions
  • Regression stabilization

The documentation of 1.0, including release notes, is available here.

The project can be followed on SourceForge.
And a couple of GIT repositories are used for development:

The core team is currently working in the addition of new features for the next major release including:

  • Merge with PostgreSQL 9.2
  • Data redistribution functionality, changing table distribution in cluster with a simple ALTER TABLE
  • New functionalities related to online node addition and deletion for a better user experience
  • Triggers
  • Planner improvements
  • Global constraints

The roadmap of the project is located here in section Roadmap.

The project is under the same license as PostgreSQL, now managed under a single entity called “Postgres-XC Development Group”.
In order to keep in touch with the project, whose development follows the same model as PostgreSQL, you can register to the following mailing lists:

  • postgres-xc-general@lists.sourceforge.net, for general questions. Registration can be done here
  • postgres-xc-developers@lists.sourceforge.net. hachers mailing list. Registration can be done here

This short manual targets PostgreSQL users looking for a smooth introduction to dblink.

dblink is a PostgreSQL contrib module that can be found in the folder contrib/dblink. It is treated as an extension, meaning that the installation of this module is in two phases, explained in this post a bit later.
The goal of this module is to provide simple functionalities to connect and interact with remote database servers from a given PostgreSQL server to which your client application or driver is connected.

The first thing that you need to do is to install the sources of dblink. You can do it easily by installing all the modules of PostgreSQL at once from source code.
./configure --prefix $INSTALL_FOLDER
make install-world

$INSTALL_FOLDER is the folder where you wish to install PostgreSQL binaries.

Or if you wish only to install dblink (you might have already installed PostgreSQL ressources), do it directly from its source folder.
cd contrib/dblink
make install

The installed files for dblink can be found in $INSTALL_FOLDER/share/extensions.
$ cd $INSTALL_FOLDER/share/extension
$ ls dblink*
dblink--1.0.sql dblink--unpackaged--1.0.sql dblink.control

For the purpose of this demonstration, two PostgreSQL servers called server1 and server2 are created on the same local server with port values respectively of 5432 and 5433.

Some data will be inserted on server2, and the goal is to fetch this data to server1 using dblink.

Let’s first prepare server 2 and create some data on it.
$ psql -p 5433 postgres
psql (9.2beta1)
Type "help" for help.
postgres=# create table tab (a int, b varchar(3));
CREATE TABLE
postgres=# insert into tab values (1, 'aaa'), (2,'bbb'), (3,'ccc');
INSERT 0 3

So now that the remote server2 is ready to work, all the remaining tasks need to be done on server1.

The sources of dblink have been installed, but they are not yet active on server1. dblink is treated as an extension, which is a functionality that has been introduced since PostgreSQL 9.1. In order to activate a new extension module, here dblink, on a PostgreSQL server, the following commands are necessary.
$ psql postgres
psql (9.2beta1)
Type "help" for help.
postgres=# CREATE EXTENSION dblink;
CREATE EXTENSION
postgres=# \dx
List of installed extensions
Name | Version | Schema | Description
---------+---------+------------+--------------------------------------------------------------
dblink | 1.0 | public | connect to other PostgreSQL databases from within a database
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(2 rows)

You can then confirm that the extension has been activated by using \dx from a psql client.

Now let’s fetch the data from server2 with dblink while connecting on server1. The function dblink can be invocated to fetch data as it uses as return type “SETOF record”. This implies that the function has to be called in FROM clause.
postgres=# select * from dblink('port=5433 dbname=postgres', 'select * from tab') as t1 (a int, b varchar(3));
a | b
---+-----
1 | aaa
2 | bbb
3 | ccc
(3 rows)

Do not forget to use aliases in the FROM clause to avoid errors of the following type:
postgres=# select * from dblink_exec('port=5433 dbname=postgres', 'select * from tab');
ERROR: statement returning results not allowed

It is also possible to do more fancy stuff with dblink functions.
dblink_connect allows you to create a permanent connection to a remote server. Such connections are defined by names you can choose. This avoids to have to create new connections to remote servers all the time at invocating of function dblink, allowing to gain more time by maintaining connections alive. In case you wish to use the connection created, simply invocate its name when using dblink functions.

Execution of other queries, like DDL or DML, can be done with function dblink_exec.
postgres=# select dblink_exec('port=5433 dbname=postgres', 'create table aa (a int, b int)');
dblink_exec
--------------
CREATE TABLE
(1 row)

dblink has a dozen of functions that allows to control remote database servers from a single connection point.
be sure to have a look at it!

First you need to download the latest version of Postgres-XC from here.

Then open the tarball and install the binaries as you would do with a normal PostgresSQL.
configure --prefix=$INSTALL_FOLDER
make
make install

$INSTALL_FOLDER is the folder where to install the sources. In this post $PATH redirects to $INSTALL_FOLDER so no need to specify a folder when launching commands.

Next, the goal is to install a cluster when few simple commands. Once you are done, your cluster will have the same shape as picture below.
Simple 1 Coordinator/2Datanode cluster
Assuming that you are familiar with Postgres-XC architecture, this cluster is made with 1 Coordinator (to which your application connects), 2 Datanodes (meaning that your table data can be distributed up to 2 nodes) and a GTM, mandatory unique component distributing transaction ID and snapshot in the cluster.
If you are not familiar with the architecture, you can still refer to documents located here. Among the documents available, the tutorial done at PGCon 2012 is a good beginning.
For simplicity’s sake, all the nodes are installed on a local machine.

Like PostgreSQL, each node of Postgres-XC needs a data folder. All of them are located in $DATA_FOLDER.
So let’s move in and initialize each node.
cd $HOME/pgsql
initgtm -Z gtm -D gtm # Initialize GTM
initdb -D datanode1 --nodename dn1 # Initialize Datanode 1
initdb -D datanode2 --nodename dn2 # Initialize Datanode 2
initdb -D coord1 --nodename co1 # Initialize Coordinator 1

Then you need to modify manually the port value of Datanode 1 and Datanode 2 in each postgresql.conf.
cd datanode1 # or `cd datanode2`
vim postgresql.conf

Then change the line “#port = 5432″ by “port = 15432″ for Datanode 1, and “port = 15433″ for Datanode 2.

Then it is time to start up the cluster.
gtm -D gtm & # Start-up GTM
postgres -X -D datanode1 -i & # Start Datanode 1
postgres -X -D datanode2 -i & # Start Datanode 2
postgres -C -D coord1 -i & # Start Coordinator 1

What remains is to set up the Coordinator to make him know about Datanode 1 and 2.
So connect to coordinator 1.
psql postgres
Then launch that to finish setting up cluster:
CREATE NODE dn1 WITH (TYPE='datanode', PORT=15432);
CREATE NODE dn2 WITH (TYPE='datanode', PORT=15433);
select pgxc_pool_reload();

And you are done.
Now you can connect to Coordinator 1 and test your newly-made cluster.
12 short commands have been enough once binaries have been installed.

Postgres-XC, write-scalable multi-master symmetric cluster based on PostgreSQL, version 1.0beta2 has been released.
This beta version is based on PostgreSQL 9.1.3. All the patches in PostgreSQL 9.1 stable branch have been merged up to commit 1c0e678 (4th of May 2012).

You can download the tarball directly from here.
This tarball contains all the HTML and man documentation.

The following enhancements have been made since release of 1.0beta1:

  • Redaction of release notes, summarizing all the features added in Postgres-XC since the creation of the project
  • Support for make world
  • Regressions stabilized (no failures for 139 tests)
  • Fix of more than 50 bugs
  • Merge with stable branch of PostgreSQL 9.1 (600~ commits)

Compared to version Postgres-XC 0.9.7, the following features have been added:

  • Fast query shipping (FQS), quick identification of expressions in a query that can be pushed down to remote nodes
  • SERIAL types
  • TABLESPACE
  • Utility to clean up 2PC transactions in cluster (pgxc_clean)
  • Utility for initialization of GTM (global transaction manager, utility called initgtm)
  • Relation-size functions
  • Regression stabilization

The release notes of 1.0 are directly available here.

The project can be followed on Source Forge.
And the project uses a couple of GIT repositories for development:

Postgres-XC 1.0beta2 will be used during the Postgres-XC tutorial at PGCon in Ottawa, so be sure to touch this beta version to have an idea of what Postgres-XC is before attending!

One of the features that has been really improved the last couple of weeks is the stabilization of remote query planning for DML for Postgres-XC standard planner. And this has consequences on rules, because a rule is fired on Coordinators by design, and you need to provide a global way to plan queries correctly with remote nodes. Just to recall, a rule is the possibility to define an alternative action when doing an INSERT, UPDATE or DELETE on a table.
Another important point is that the query of a rules is not planned at the moment of the rule creation, but after rule is fired, however it doesn’t change the fact that a correct query planning is needed at a moment or another.

A rule can for example be used to define DML on views.
A view is roughly a projection of table data into a wanted shape, and it is by default not possible to define DML actions on it.

Let’s take an example, here are a simple table and a simple view.
postgres=# CREATE TEMP TABLE t1 (a int PRIMARY KEY, b int) DISTRIBUTE BY HASH(a);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "t1_pkey" for table "t1"
CREATE TABLE
postgres=# INSERT INTO t1 VALUES (1,1),(2,2),(3,3),(4,4);
INSERT 0 4
postgres=# CREATE VIEW t1_v AS SELECT a AS a_v, b AS b_v FROM t1;
NOTICE: view "t1_v" will be a temporary view
CREATE VIEW

When trying to UPDATE the view, you will get the following error.
postgres=# UPDATE t1_v SET b_v = 2 WHERE a_v = 3;
ERROR: cannot update view "t1_v"
HINT: You need an unconditional ON UPDATE DO INSTEAD rule or an INSTEAD OF UPDATE trigger.

So let’s define a view on it and check that an UPDATE on a view is possible.
postgres=# CREATE RULE t1_upd AS ON UPDATE TO t1_v DO INSTEAD UPDATE t1 SET b = new.b_v WHERE a = old.a_v AND b = old.b_v;
CREATE RULE
postgres=# UPDATE t1_v SET b_v = 2 WHERE a_v = 3;
UPDATE 1
postgres=# select * from t1_v;
a_v | b_v
-----+-----
1 | 1
2 | 2
4 | 4
3 | 2
(4 rows)

So yes, RULES are now completely supported in Postgres-XC. And it is included in 1.0. The secret of how it works? The thing that took me 4 weeks to figure out?
Well, some extensions have been added in standard planner for DELETE, INSERT and UPDATE remote planning (ModifyTable used as TopPlan). For people willing to look at the code, all the secrets are located in functions create_remoteinsert_plan, create_remoteupdate_plan and create_remotedelete_plan of createplan.c. Those functions have been built and adapted to former scan plan of PostgreSQL to react as a wrapper for inner remote table scans within PostgreSQL standard planner. The trick is to create a DML query generated based on the scan plans generated for each tables that has to be updated, deleted or insert.

One of the limitations? The use of constraints.
Depending on the distribution strategy used, you may not be able to check the consistency of constraints globally.
But this is another story…

©2010-2013 Michael Paquier All content is ©Copyright of Otacoo.com 2010-2013. Privacy Policy - Terms of Use