One of the features that has been really improved the last couple of weeks is the stabilization of remote query planning for DML for Postgres-XC standard planner. And this has consequences on rules, because a rule is fired on Coordinators by design, and you need to provide a global way to plan queries correctly with remote nodes. Just to recall, a rule is the possibility to define an alternative action when doing an INSERT, UPDATE or DELETE on a table.
Another important point is that the query of a rules is not planned at the moment of the rule creation, but after rule is fired, however it doesn’t change the fact that a correct query planning is needed at a moment or another.

A rule can for example be used to define DML on views.
A view is roughly a projection of table data into a wanted shape, and it is by default not possible to define DML actions on it.

Let’s take an example, here are a simple table and a simple view.
postgres=# CREATE TEMP TABLE t1 (a int PRIMARY KEY, b int) DISTRIBUTE BY HASH(a);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "t1_pkey" for table "t1"
CREATE TABLE
postgres=# INSERT INTO t1 VALUES (1,1),(2,2),(3,3),(4,4);
INSERT 0 4
postgres=# CREATE VIEW t1_v AS SELECT a AS a_v, b AS b_v FROM t1;
NOTICE: view "t1_v" will be a temporary view
CREATE VIEW

When trying to UPDATE the view, you will get the following error.
postgres=# UPDATE t1_v SET b_v = 2 WHERE a_v = 3;
ERROR: cannot update view "t1_v"
HINT: You need an unconditional ON UPDATE DO INSTEAD rule or an INSTEAD OF UPDATE trigger.

So let’s define a view on it and check that an UPDATE on a view is possible.
postgres=# CREATE RULE t1_upd AS ON UPDATE TO t1_v DO INSTEAD UPDATE t1 SET b = new.b_v WHERE a = old.a_v AND b = old.b_v;
CREATE RULE
postgres=# UPDATE t1_v SET b_v = 2 WHERE a_v = 3;
UPDATE 1
postgres=# select * from t1_v;
a_v | b_v
-----+-----
1 | 1
2 | 2
4 | 4
3 | 2
(4 rows)

So yes, RULES are now completely supported in Postgres-XC. And it is included in 1.0. The secret of how it works? The thing that took me 4 weeks to figure out?
Well, some extensions have been added in standard planner for DELETE, INSERT and UPDATE remote planning (ModifyTable used as TopPlan). For people willing to look at the code, all the secrets are located in functions create_remoteinsert_plan, create_remoteupdate_plan and create_remotedelete_plan of createplan.c. Those functions have been built and adapted to former scan plan of PostgreSQL to react as a wrapper for inner remote table scans within PostgreSQL standard planner. The trick is to create a DML query generated based on the scan plans generated for each tables that has to be updated, deleted or insert.

One of the limitations? The use of constraints.
Depending on the distribution strategy used, you may not be able to check the consistency of constraints globally.
But this is another story…

This week, a great feature has been added by commit 8a05756, completed by commit caf1554 in Postgres-XC GIT repository.
commit 8a05756a702051d55a35ec3f4953f381f977b53a
Author: Pavan Deolasee Date: Wed Dec 14 09:35:53 2011 +0530
 
Implement support for CREATE TABLE AS, SELECT INTO and INSERT INTO
statements. We start by fixing the INSERT INTO support. For every result
relation, we now build a corresponding RemoteQuery node so that the
inserts can be carried out at the remote datanodes. Subsequently, at
the coordinator at execution time, instead of inserting the resulting tuples
in a local heap, we invoke remote execution and insert the rows in the
remote datanodes. This works nicely even for prepared queries, multiple
values clause for insert as well as any other mechanism of generating
tuples.
 
We use this infrastructure to then support CREATE TABLE AS SELECT (CTAS).
The query is transformed into a CREATE TABLE statement followed by
INSERT INTO statement and then run through normal planning/execution.
 
There are many regression cases that need fixing because these statements
now work correctly. This patch fixes many of them. Few might still be
failing, but they seem unrelated to the work itself and might be a
side-effect. We will fix them once this patch gets in.

Simply, this is the support for CREATE TABLE AS and SELECT INTO. All the possible combinations of INSERT SELECT are also possible whatever the type of table used.

Let’s see through a couple of examples with this cluster of 1 Coordinator and 4 Datanodes.
postgres=# select oid,node_name,node_type from pgxc_node;
oid | node_name | node_type
-------+-----------+-----------
11133 | coord1 | C
16384 | dn1 | D
16385 | dn2 | D
16386 | dn3 | D
16387 | dn4 | D
(5 rows

Let’s create a table and populate it with some data.
postgres=# create table a as select generate_series(1,100);
INSERT 0 100
postgres=# select count(*) from a;
count
-------
100
(1 row)

The data is distributed through the cluster of the 4 Datanodes.
postgres=# execute direct on node dn4 'select count(*) from a';
count
-------
27
(1 row)
postgres=# execute direct on node dn3 'select count(*) from a';
count
-------
19
(1 row)
postgres=# execute direct on node dn2 'select count(*) from a';
count
-------
31
(1 row)
postgres=# execute direct on node dn1 'select count(*) from a';
count
-------
23
(1 row)

CREATE TABLE AS is not only limited to global tables, you can define a distribution type, a subset of nodes, and of course the table can be unlogged or temporary. Here the table is distributed by round robin on datanodes dn1 and dn2.
postgres=# create table c distribute by round robin to node dn1,dn2 as select * from b;
INSERT 0 100
postgres=# execute direct on node dn1 'select count(*) from c';
count
-------
50
(1 row)
postgres=# execute direct on node dn2 'select count(*) from c';
count
-------
50
(1 row)
postgres=# execute direct on node dn3 'select count(*) from c';
count
-------
0
(1 row)
postgres=# execute direct on node dn4 'select count(*) from c';
count
-------
0
(1 row)

However, SELECT INTO does not have any extension for distribution type and node subsets. The reason for that is because SELECT INTO is by default a SELECT query, CREATE TABLE AS is a DDL. So in this case table created is distributed by hash on all the nodes.
postgres=# select * into d from b;
INSERT 0 100
postgres=# select pclocatortype,nodeoids from pgxc_class where pcrelid = 'd'::regclass;
-[ RECORD 1 ]-+------------------------
pclocatortype | H
nodeoids | 16384 16385 16386 16387

Yeah, that rocks.

©2010-2013 Michael Paquier All content is ©Copyright of Otacoo.com 2010-2013. Privacy Policy - Terms of Use