Just today this commit has happened in Postgres-XC GIT repository.
commit d09a42f2aac08a909ad9c23b534f11c6e7f16cee
Author: Michael P
Date: Tue Feb 21 09:02:04 2012 +0900
 
Support for SERIAL types
 
SERIAL columns in table use default values based on nextval of sequences
to auto-generate values. In vanilla Postgres, table creation with serial
column(s) is made with the following process:
1 - Create sequence(s)
2 - Create table
3 - Alter sequence(s) to change it as being owned by the column of table
previously created to manage object dependency.
 
In Postgres-XC, the sequence creation is made such as the object is created
on all the nodes, so a boolean flag associated to the serial process is
added to bypass sequence creation on remote nodes in case a sequence is created
within a serial process, and the query sent to remote nodes is the one given
by client application and it is sent only once when table is created on local
node.
 
Regression tests are all updated in consequence.

This means that now serial types, used for auto-incrementing column values based on a sequence, are now available in Postgres-XC. This functionality is widely used by framework applications, so this is really an asset for the coming release. Here is a short demonstration.
postgres=# create table aa (a serial, b varchar(10));
NOTICE: CREATE TABLE will create implicit sequence "aa_a_seq" for serial column "aa.a"
CREATE TABLE
postgres=# insert into aa (b) values ('aaa');
INSERT 0 1
postgres=# insert into aa (b) values ('bbb');
INSERT 0 1
postgres=# insert into aa (b) values ('ccc');
INSERT 0 1
postgres=# select * from aa;
a | b
---+-----
1 | aaa
2 | bbb
3 | ccc
(3 rows)
postgres=# \d
List of relations
Schema | Name | Type | Owner
--------+----------+----------+---------
public | aa | table | michael
public | aa_a_seq | sequence | michael
(2 rows)
postgres=# \d aa
Table "public.aa"
Column | Type | Modifiers
--------+-----------------------+------------------------------------------------
a | integer | not null default nextval('aa_a_seq'::regclass)
b | character varying(10) |

Enjoy!

A replication and clustering event for PostgreSQL has been hold in Paris on 2nd of February 2012. I had the chance to be invited to give a presentation about Postgres-XC and here is a summary of what happened there.

First a couple of words about the place, called “Le comptoir general”, used by associations to organize events. The place has been designed in an old-fashion way to well, show it as a kind of voodoo or sorcery place. Have a look at the pictures to make you an image of how it is.


The conference, organized by Dalibo, a French company specialized in consulting of PostgreSQL, was free of participation with a number of places limited at 70/80.

The presentation about Postgres-XC went well, you can find the presentation document here. Initially planned to last 45-50 minutes including questions, I finally stayed 1,5 hour on stage. The public was great, they asked a lot of pertinent questions and made me feel they understood what Postgres-XC is and what the project is aiming as an open-source software. Here is a summary of the main questions asked.

  • Could Postgres-XC be used as an alternative for Oracle RAC? Yes, definitely.
  • What means “transparent transaction management”? A way to manage transaction in a global way such as application sees a database cluster as if it is a single instance. This transparent management uses an MVCC-based method with transaction ID and snapshot fed in consistent way to all the cluster nodes.
  • Is it possible to use normal Postgres instead of a Datanode? No, there are corner cases where a Datanode needs a global snapshot like autovacuum.
  • Can we set a Coordinator and a Datanode on the same server? You can set as many nodes as you wish of the types you wish, you need just avoid port conflicts.
  • Is it possible to change the table definition on a single node? No, you need to have consistent table definition on all the nodes. You can however manage users and roles like normal Postgres, there is also a pooler protocol to separate user and database-dedicated connections to avoid inter-user conflicts on connections. So why not changing the access of a column for one special user or group of users globally?
  • Like RAC, in case of a read query that failed due to a node failure, do you have a functionality to target another node still alive inside the same transaction? There is no such functionality yet, but once we had slave node management in the code, this is a functionality we want to add. So definitely yes.
  • Can we participate in the project? Like PostgreSQL, Postgres-XC is community-based. All contributions, even minimal are warmly welcome.

There have been a couple of other presentations in the conference about Londiste, Slony and streaming replication usage. However, the point that caught my attention is the presentation made by Simon Riggs about the future of replication features in PostgreSQL. In order to make PostgreSQL multi-master, his idea is to implement a replication type called circular replication, feature already done in MySQL and Oracle. What is circular replication? Have a look at the picture below.

As can be understood from the name, such a cluster configuration has database master instances placed in circle. You can have those masters located at several places in the world. If an update occurs on one of the masters, it is sent circularly to the next master, until the update circle completes. In case a master fails, what you do is simply skip it for the time being and move forward to the next node in circle. In the diagram masters are noted as M and slaves as S (asynchronous/synchronous standby nodes, cascade of nodes…). It takes a certain time to make the update available to all the nodes, that is why it is a kind of asynchronous multi-master configuration.
Circular replication clearly targets applications using non-critical operations. It is not possible to imagine operations on this structure for billing or ordering systems that cannot have any transaction loss. However, this could be used for social networking and blogging network, where there are a lot of reads occurring and users do not really care if there is a delay occurring during the update. Well, there are issues that easily come out with such configurations. A simple one is what happens to sequences. The delay occurring between masters makes it hard to use serial tables or global frameworks. However, it is great to see that PostgreSQL community is going forward on implementing more and more things that will make it a more advanced database in the world.

When using a database cluster where nodes are completely dependent on the network behavior like it is the case for shared-nothing based architecture, an essential step in integrating a new application is its design to maximize the performance and enhance the cluster strengths knowing its capabilities. The problem with shared-nothing architectures is that queries that may run very quickly in a single instance environment might take a bunch of time in your cluster, especially for queries having aggregates that need global results for all the nodes in the cluster or queries needing subsequent results from internal sub-queries. The nightmare of most of database developers trying to optimize an application is always to face queries of the type “SELECT *”, fetching all the tuples of a table in a single shot. Such queries are already costly for single database instances, but just don’t imagine how much you might load your application when fetching all the tuples of your table with additional node layers forcing your database to fetch results from multiple nodes.
I would say by experience that there are three guidelines when customizing an application for a cluster:

  • Choose the number of nodes in cluster and servers where they are located to minimize overall network load.
  • Customize queries such as they request results from a minimal number of nodes.
  • Design the data distribution strategy of your tables to minimize the results to be fetched to a single place for join conditions.

After this short digression, the aim of this post is not to discuss deeply about such conditions, it is to enlighten a feature that your application needs to target when you design its database. Well, the name of this feature is in the title, called “Fast Query Shipping”. It is not really a feature in itself, more a goal applications should try to reach as much as possible to improve their performance on database cluster softwares. Fast query shipping (FQS) is the ability for a cluster to evaluate if a query can be completely shipped to a remote node in the cluster, making it a simple send and receive, minimizing the plan cost on the node planning the query and the data transfer cost because the data fetched from remote node with such a query is minimized to exactly what the application targets.

To be honest, googling “Fast Query shipping” does not bring any result except on Postgres-XC. The basic implementation of this feature has been done with the following commit.
commit 191d55ebf1faf897aed51f1b5fdcd71ec3ccdc6c
Author: Ashutosh Bapat
Date: Thu Feb 2 16:59:04 2012 +0530
 
Add the support for Fast Query Shipping (FQS), a method to identify
whether a query can be sent to the datanode/s as it is for evaluation and do so
if deemed fit. In such cases, we create a plan with a single RemoteQuery node
corresponding to the query and avoid the planning phase on coordinator.
 
A query tree walker analyses all the nodes in the query tree and finds out the
conditions under which the query is shippable and detects presence of
expressions which can not be evaluated on the datanode. It looks at the
relations involved in the query and deducts whether JOINs between these
relations can be evaluated on a single datanode.
 
Adds testcases xc_FQS and xc_FQS_join to test the fast query shipping
functionality and make it independent of cluster configuration.

So, in this case, an extension of PostgreSQL planner has been done exclusively for Postgres-XC to evaluate if a query is entirely shippable to its dedicated remote node. This planning step determines the list of target nodes where to launch the query. The query can be basically shipped as depending on a lot of conditions like analysis of clauses, but basically you cannot ship a query if it contains expressions that cannot be evaluated on a remote node. A simple expression following that is the next value of a sequence, or timestamps. In a more general way it is a volatile or stable functions. There are also other expressions that cannot be shipped like window functions, GROUP BY clauses, aggregates, etc. Sometimes you may be able to ship entirely a query having an aggregate function, but targeting a single query. Well, there are a lot of cases possible, and you might look at the code in details if you are interested about each corner case.

Let’s have a look at some simple test cases with replicated and hash tables:
postgres=# create table rep (a int) distribute by replication;
CREATE TABLE
postgres=# create table hash (a int) distribute by hash(a);
CREATE TABLE

For a table replicated on all nodes, shipping results from a single node is sufficient for the simple “select *”. For a distribute table, all the nodes are targetted and results are sent back as such. This is in such configuration that you will get the most efficient queries running in a cluster environment.
postgres=# explain select * from rep;
QUERY PLAN
----------------------------------------------------------------------------
Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0)
Node/s: dn1
(2 rows)
postgres=# explain select * from hash;
QUERY PLAN
----------------------------------------------------------------------------
Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0)
Node/s: dn1, dn2
(2 rows)

A session parameter called enable_fast_query_shipping is available to set switch this feature to on/off. Let’s see what happens.
postgres=# SET enable_fast_query_shipping TO false;
SET
postgres=# explain select * from hash;
QUERY PLAN
-------------------------------------------------------------------
Result (cost=0.00..1.01 rows=1000 width=4)
-> Data Node Scan on hash (cost=0.00..1.01 rows=1000 width=4)
Node/s: dn1, dn2
(3 rows)
postgres=# explain select * from rep;
QUERY PLAN
------------------------------------------------------------------
Result (cost=0.00..1.01 rows=1000 width=4)
-> Data Node Scan on rep (cost=0.00..1.01 rows=1000 width=4)
Node/s: dn1
(3 rows)

Here what happens is that the query is not using any FQS features, so what happens is that you do not directly fetch the results from the node, but you also materialize them on the Coordinator where query is launched before sending them back to client.

This was a small introduction of the fast query shipping feature, just do not forget to test it.

The release of Postgres-XC 0.9.7 has been done in Source Forge and in GIT.
Source forge page is here.
The source tarball and manuals are available here.

Postgres-XC 0.9.7, based on PostgreSQL 9.1, contains a bunch of new features:

  • SELECT INTO/CREATE TABLE AS
  • INSERT SELECT complete support
  • Node subsets and sub clusters (data of table distributed on a portion of nodes)
  • Cluster node management with SQL and catalog extensions
  • Dynamic reload of pool connection information
  • Window functions
  • DML remote planning (partial expression push-down)
  • DEFAULT values (non-shippable expressions: volatile functions, etc.) => Use of DEFAULT nextval(‘seq’) available.
  • GTM Standby synchronous, asynchronous recovery

Compared to Postgres-XC 0.9.6, in 0.9.7 the cluster setting is mainly SQL-based and does not use GUC params anymore. So be sure to read the Install manual before creating a cluster.
HTML documentation and man pages are also incorporated in the downloadable tarball, and the reference PDF document is built from the same SGML source.

The code is also available on GIT:
git://postgres-xc.git.sourceforge.net/gitroot/postgres-xc/postgres-xc
A new maintenance branch called XC0_9_7_PG9_1 only for 0.9.7.

This week, I spent a long time working on this commit. Just by looking at the date, commit happened before leaving for week-end :)
commit 8ef0c48acadec3c9888d302888a7d279d82323e5
Author: Michael P
Date: Fri Jan 13 16:05:00 2012 +0900
 
Improve target list selection for remote DML queries
 
This commit makes remote DML planning generally available
for replicated and hash tables. There are still issues
related to node selection for round robin tables though.
The target list of UPDATE and DELETE using coordinator quals
was set to fetch only CTID when generating SELECT in their
inner plan generated by create_remotequery_plan.
 
Their target list is rewritten to include the columns in quals
so as to be able to evaluate those quals correctly on Coordinator.
In addition remote planning for UPDATE has been improved to be
able to target correct node when launching query.
 
A new regression test called xc_remote is added, it uses the
parameter enable_fast_query_shipping to force all the queries
to go through standard planner. Tests are done on replicated,
hash and round robin tables.

In all the examples of this article, those two tables are used with the following cluster configuration of Postgres-XC cluster.
db=# select node_name, node_type from pgxc_node; -- 1 Coordinator, 2 Datanodes
node_name | node_type
-----------+-----------
coord1 | C
dn1 | D
dn2 | D
(3 rows)
db=# create table aa (a int, b timestamp) distribute by hash(a);
CREATE TABLE
db=# create table bb (a int, b timestamp) distribute by replication;
CREATE TABLE

For database clusters in general, it is essential to have an efficient and consistent way to manage queries on both local and remote nodes. Efficiency is important to reduce data load on the system. Consistency is even more important to avoid dirty data in your database. So, about queries in general, let’s use an example. SELECT queries may contain expressions that can be evaluated on remote nodes. A common example for that is when the expression is a constant.
db=# explain verbose select * from aa where a = 1;
QUERY PLAN
---------------------------------------------------
Data Node Scan (cost=0.00..0.00 rows=0 width=0)
  Output: a, b
  Node/s: dn1
  Remote query: SELECT a, b FROM aa WHERE (a = 1)
(4 rows)

In this case the query can be completely shipped to the remote node, returning correct results.

Expressions that cannot be pushed down are those who need to be evaluated on local nodes with all the necessary data fetched from remote nodes. For example, let’s take the replicated table bb. We want to select data on it with a time-based expression. Each node of the cluster (at least in the case of Postgres-XC) is located on a different server, each server having a different time line.So, is the following SQL shippable?
SELECT a from bb where b < now();
The answer is no. What is necessary to do is to get all the tuples (a,b) from table bb (a is necessary to send back result), and then apply the time based condition on all the results (explaining why b is necessary).
This results in the following plan.
db=# explain verbose select a from bb where b < now();
QUERY PLAN
-----------------------------------------------------------------
 Result (cost=0.00..1.01 rows=1000 width=4)
 Output: a
 -> Data Node Scan on bb (cost=0.00..1.01 rows=1000 width=4)
   Output: a, b
   Node/s: dn1
   Remote query: SELECT a, b FROM ONLY bb WHERE true
   Coordinator quals: (bb.b < now())
(7 rows)

Well, Postgres-XC has already a lot of mechanisms to manage SELECT and INSERT queries. But what was missing are the parts related to UPDATE and DELETE. So the new functionality committed this week allows to use complex expressions.
For example, in the case of update, you can run sequence and time based updates needing local node evaluation to run consistently.
db=# insert into bb values (1,now());
INSERT 0 1
db=# insert into bb values (2,now());
INSERT 0 1
db=# insert into bb values (3,now());
INSERT 0 1
db=# select * from bb;
a | b
---+---------------------------------
1 | Fri Jan 13 06:26:32.872665 2012
2 | Fri Jan 13 06:26:38.261489 2012
3 | Fri Jan 13 06:26:40.943182 2012
(3 rows)
db=# update bb set a = nextval('seq'), b = now();
UPDATE 3
db=# select * from bb;
a | b
---+---------------------------------
1 | Fri Jan 13 06:28:01.273496 2012
2 | Fri Jan 13 06:28:01.273496 2012
3 | Fri Jan 13 06:28:01.273496 2012
(3 rows)
db=# explain verbose update bb set a = nextval('seq'), b = now();
QUERY PLAN
-----------------------------------------------------------------------
 Update on public.bb (cost=0.00..11.01 rows=1000 width=6)
 Node/s: dn1, dn2
 Remote query: UPDATE public.bb SET a = $1, b = $2 WHERE ctid = $3
 -> Result (cost=0.00..11.01 rows=1000 width=6)
   Output: nextval('seq'::regclass), now(), ctid
   -> Data Node Scan on bb (cost=0.00..1.01 rows=1000 width=6)
      Output: ctid
      Node/s: dn1
      Remote query: SELECT ctid FROM ONLY bb WHERE true
(9 rows)

You need here to select all the data to be updated from remote nodes, then you have to apply the time base expression (now) and the sequence value (nextval), and finally push those values to dedicated remote nodes.

This works also with WHERE clauses using non-shippable expressions.
db=# explain verbose update bb set a = nextval('seq'), b = now() WHERE b < now();
QUERY PLAN
---------------------------------------------------------------------------------
 Update on public.bb (cost=0.00..11.02 rows=1000 width=14)
  Node/s: dn1, dn2
  Remote query: UPDATE public.bb SET a = $1, b = $2 WHERE b = $3 AND ctid = $4
  -> Result (cost=0.00..11.02 rows=1000 width=14)
    Output: nextval('seq'::regclass), now(), b, ctid
    -> Data Node Scan on bb (cost=0.00..1.01 rows=1000 width=14)
      Output: b, ctid
      Node/s: dn1
      Remote query: SELECT b, ctid FROM ONLY bb WHERE true
      Coordinator quals: (bb.b < now())
(10 rows)

Here what is added is a condition to pre-select a subset of rows. Such operation is costly though because you have to fetch all the rows of the table first in inner plan.

The same kind of crazy SQL are also possible for DELETE with mixing shippable and non-shippable expressions.
db=# explain verbose delete from bb where a = 2 and b < now();
QUERY PLAN
------------------------------------------------------------------------------
 Delete on public.bb (cost=0.00..1.02 rows=1000 width=18)
 Node/s: dn1, dn2
 Remote query: DELETE FROM public.bb WHERE a = $1 AND b = $2 AND ctid = $3
 -> Result (cost=0.00..1.02 rows=1000 width=18)
   Output: a, b, ctid
   -> Data Node Scan on bb (cost=0.00..1.02 rows=1000 width=18)
     Output: a, b, ctid
     Node/s: dn1
     Remote query: SELECT a, b, ctid FROM ONLY bb WHERE (a = 2)
     Coordinator quals: (bb.b < now())
(10 rows)

You can notice here that the constant expression "a = 2" is shipped in the most inner plan, improving query efficiency by that much.
A lot of things are now possible, and all this stuff will be included in release 0.9.7!

All content is ©Copyright of Otacoo.com 2010-2011. Privacy Policy - Terms of Use