Postgres-XC, write-scalable multi-master symmetric cluster based on PostgreSQL, version 1.0beta2 has been released.
This beta version is based on PostgreSQL 9.1.3. All the patches in PostgreSQL 9.1 stable branch have been merged up to commit 1c0e678 (4th of May 2012).

You can download the tarball directly from here.
This tarball contains all the HTML and man documentation.

The following enhancements have been made since release of 1.0beta1:

  • Redaction of release notes, summarizing all the features added in Postgres-XC since the creation of the project
  • Support for make world
  • Regressions stabilized (no failures for 139 tests)
  • Fix of more than 50 bugs
  • Merge with stable branch of PostgreSQL 9.1 (600~ commits)

Compared to version Postgres-XC 0.9.7, the following features have been added:

  • Fast query shipping (FQS), quick identification of expressions in a query that can be pushed down to remote nodes
  • SERIAL types
  • TABLESPACE
  • Utility to clean up 2PC transactions in cluster (pgxc_clean)
  • Utility for initialization of GTM (global transaction manager, utility called initgtm)
  • Relation-size functions
  • Regression stabilization

The release notes of 1.0 are directly available here.

The project can be followed on Source Forge.
And the project uses a couple of GIT repositories for development:

Postgres-XC 1.0beta2 will be used during the Postgres-XC tutorial at PGCon in Ottawa, so be sure to touch this beta version to have an idea of what Postgres-XC is before attending!

One of the features that has been really improved the last couple of weeks is the stabilization of remote query planning for DML for Postgres-XC standard planner. And this has consequences on rules, because a rule is fired on Coordinators by design, and you need to provide a global way to plan queries correctly with remote nodes. Just to recall, a rule is the possibility to define an alternative action when doing an INSERT, UPDATE or DELETE on a table.
Another important point is that the query of a rules is not planned at the moment of the rule creation, but after rule is fired, however it doesn’t change the fact that a correct query planning is needed at a moment or another.

A rule can for example be used to define DML on views.
A view is roughly a projection of table data into a wanted shape, and it is by default not possible to define DML actions on it.

Let’s take an example, here are a simple table and a simple view.
postgres=# CREATE TEMP TABLE t1 (a int PRIMARY KEY, b int) DISTRIBUTE BY HASH(a);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "t1_pkey" for table "t1"
CREATE TABLE
postgres=# INSERT INTO t1 VALUES (1,1),(2,2),(3,3),(4,4);
INSERT 0 4
postgres=# CREATE VIEW t1_v AS SELECT a AS a_v, b AS b_v FROM t1;
NOTICE: view "t1_v" will be a temporary view
CREATE VIEW

When trying to UPDATE the view, you will get the following error.
postgres=# UPDATE t1_v SET b_v = 2 WHERE a_v = 3;
ERROR: cannot update view "t1_v"
HINT: You need an unconditional ON UPDATE DO INSTEAD rule or an INSTEAD OF UPDATE trigger.

So let’s define a view on it and check that an UPDATE on a view is possible.
postgres=# CREATE RULE t1_upd AS ON UPDATE TO t1_v DO INSTEAD UPDATE t1 SET b = new.b_v WHERE a = old.a_v AND b = old.b_v;
CREATE RULE
postgres=# UPDATE t1_v SET b_v = 2 WHERE a_v = 3;
UPDATE 1
postgres=# select * from t1_v;
a_v | b_v
-----+-----
1 | 1
2 | 2
4 | 4
3 | 2
(4 rows)

So yes, RULES are now completely supported in Postgres-XC. And it is included in 1.0. The secret of how it works? The thing that took me 4 weeks to figure out?
Well, some extensions have been added in standard planner for DELETE, INSERT and UPDATE remote planning (ModifyTable used as TopPlan). For people willing to look at the code, all the secrets are located in functions create_remoteinsert_plan, create_remoteupdate_plan and create_remotedelete_plan of createplan.c. Those functions have been built and adapted to former scan plan of PostgreSQL to react as a wrapper for inner remote table scans within PostgreSQL standard planner. The trick is to create a DML query generated based on the scan plans generated for each tables that has to be updated, deleted or insert.

One of the limitations? The use of constraints.
Depending on the distribution strategy used, you may not be able to check the consistency of constraints globally.
But this is another story…

This year, Postgres-XC has a tutorial at PGCon. Yeah.
Based on the schedule of PGCon 2012, XC team has 3 hours to make you masters of this cluster based on PostgreSQL.

We are working hard at shaping up the presentation flow following those guidelines.

  1. Postgres-XC, what it is and what it is not
  2. Postgres-XC elements — Global Transaction Manager, Coordinator and Datanode
  3. How to design a Postgres-XC cluster — cluster configuration and table design
  4. Build and installation
  5. How to configure Postgres-XC
  6. How to test Postgres-XC
  7. Cluster-wide backup and restore
  8. High availability and component failure
  9. Postgres-XC as a community, be a developer!

Well, to be honest, 3 hours is short for a tutorial that would need a full course of 2 days. So we are planning to have some demos running during the presentation, but for sure we will not be able to show as much as we wish during this short time.
So, you, who is reading this post and will of course participate in Postgres-XC tutorial at PGCon, what do you expect from this tutorial? Are there things you would like to see more than others based on the guidelines written above?

Feel free to post your opinions here, we’ll try to take into account everybody’s viewpoint!

Postgres-XC, write-scalable multi-master symetric cluster based on PostgreSQL, version XC 1.0beta1 has been released.
This beta version is based on PostgreSQL 9.1beta2, and all the fixes of PostgreSQL 9.1 stable branch will be backported in Postgres-XC 1.0 stabilized version.
It is under PostgreSQL license.

You can download the binary directly from here.
This tarball includes all the HTML and man documentation.

A PDF file containing all the references is also available here: Reference PDF.

Compared to version 0.9.7, the following features have been added:

  • Fast query shipping (FQS), quick identification of expressions in a query that can be pushed down to remote nodes
  • SERIAL types
  • TABLESPACE
  • Utility to clean up 2PC transactions in cluster (pgxc_clean)
  • Utility for initialization of GTM (global transaction manager, utility called initgtm)
  • Relation-size functions
  • Regression stabilization

The project can be followed on Source Forge.
And we use a couple of GIT repositories for development:

Postgres-XC tutorial at PGCon in Ottawa this May will use a 1.0 version, so be sure to touch this beta version to have an idea of what is Postgres-XC before coming!
Since the last release, a special effort has been made on stabilization and performance improvement, but be sure to give your feedback in order to provide a stable 1.0 release for PostgreSQL community.

Here are some thoughts about trigger events in a database cluster environment, those design thoughts are particularly related to Postgres-XC, scaling-out database cluster based on PostgreSQL.

In PostgreSQL, triggers have the following format:
CREATE [ CONSTRAINT ] TRIGGER name { BEFORE | AFTER | INSTEAD OF } { event [ OR ... ] }
  ON table
  [ FROM referenced_table_name ]
  { NOT DEFERRABLE | [ DEFERRABLE ] { INITIALLY IMMEDIATE | INITIALLY DEFERRED } }
  [ FOR [ EACH ] { ROW | STATEMENT } ]
  [ WHEN ( condition ) ]
  EXECUTE PROCEDURE function_name ( arguments )

A trigger event can be fired with the following events.
    INSERT
    UPDATE [ OF column_name [, ... ] ]
    DELETE
    TRUNCATE

OK, up to now nothing is new and it is plainly what you can find in PostgreSQL documentation.

However, let’s do a short explanation of what are triggers in a database (really basic). A trigger is the possibility to perform automatic action after a special event in a database. Triggers are usually used to make some checks on constraints in database, to do some additional operations on tables like statistics for an application independent on PostgreSQL system. In a Web environment, a direct application of trigger is the possibility to fire for example an email after a certain event occurred on the database. There are a lot of applications around that. So roughly, a trigger is fired after a certain event, which is data modification on database.

PostgreSQL provides a good granularity when defining the conditions of a trigger firing. All the following elements constitute an event that could fire a trigger.

  • Table on which the SQL operation is done.
  • Type of SQL operation happening on table, basically a DML (INSERT, UPDATE, DELETE) or a TRUNCATE.
  • Boolean condition determining if the trigger will be fired (WHEN, INSTEAD OF). It can be a condition on the old and new values during the DML operation. In this case you cannot use old values with an INSERT and new values with a DELETE.
  • Transaction commit time (DEFERRABLE). A trigger firing can be deferred at a transaction commit or at the statement time.
  • Statement or tuple-based firing. In the case of a statement-based firing, trigger is fired with each query. For a tuple-based trigger, this trigger is fired each time a tuple of the table is modified. Let’s imagine that you have a system with a trigger that is fired when a tuple from a table A is deleted. If you delete 1,000,000 tuples in this table A, this trigger will be fired 1,000,000 times. Here it would be pretty costly.
  • After or before the data is being modified (AFTER/BEFORE).

Once a trigger is fired, it launches a procedure that may interact with the data modified by the awakening action. It is usually a plpgsql function. Well, as it can be a very costly operation, take care when you design your client application running on top of the database.

Now, let’s move to the root of the problem, which is the goal of this post. What are the conditions under which we should manage triggers in a clustering environment? By a clustering environment, I mean a database cluster where you have a client application connecting to one node of this cluster, but it needs to interact with remote nodes. In this case, the main point is to determine if a trigger can be fired on local node, the node where the application is connected to, or on remote node, the node where the database modification happens (and you may also have data modification happening on local node, why not?!). So, we need to determine under which conditions a trigger is shippable to a remote node or not.
Without suspense, here are the two conditions that I suppose are necessary and sufficient (other ideas are welcome) to determine if a trigger is shippable or not in a database cluster.

  1. The shippability of the procedure fired by trigger
  2. The shippability of the query used by application, the one modifying database and firing the trigger event

In PostgreSQL, a procedure is shippable if it is immutable, meaning that it does not modify the content of the database if launched. A query is shippable if it does not contain expressions that are not shippable.
Here are some simple examples of shippable queries:
SELECT * from aa WHERE a = 1;
INSERT INTO aa VALUES (1),(2);

Non-shippable queries:
SELECT * FROM aa WHERE a = nextval('seq');
INSERT INTO aa VALUES (currval('seq'), nextval('seq'));

Please note that query shippability depends also on a bunch of other conditions like join conditions and distribution types of the tables with which is for example interacting the query. For example by doing a join on two tables whose data are on different nodes, we need to fetch the data from remote node to local node, then perform the join on local node. There are a lot of cases to consider depending on the way your database clustering application distributes data among the cluster (column, sharding, etc.).
The choice of those 2 conditions is simply made by analyzing which part of the trigger firing conditions may contain elements that are clustering-dependent. In this case query conditioning the firing and the procedure fired by trigger are the only two conditions that need to be checked.

Those two conditions lead to 4 cases to determine where a trigger can be fired.

  • If query is shippable and procedure is shippable, trigger is fired on remote node.
  • If query is not shippable and procedure is shippable, trigger is fired on local node.
  • If query is shippable and procedure is not shippable, trigger is fired on local node.
  • If query is not shippable and procedure is not shippable, trigger is fired on local node.

This analysis has been a part of some design work for Postgres-XC in order to support triggers fully in a database cluster.
Once again, those are only thoughts, and any opinion is welcome. So feel free to comment.

©2010-2013 Michael Paquier All content is ©Copyright of Otacoo.com 2010-2013. Privacy Policy - Terms of Use