This week, a great feature has been added by commit 8a05756, completed by commit caf1554 in Postgres-XC GIT repository.
commit 8a05756a702051d55a35ec3f4953f381f977b53a
Author: Pavan Deolasee Date: Wed Dec 14 09:35:53 2011 +0530
 
Implement support for CREATE TABLE AS, SELECT INTO and INSERT INTO
statements. We start by fixing the INSERT INTO support. For every result
relation, we now build a corresponding RemoteQuery node so that the
inserts can be carried out at the remote datanodes. Subsequently, at
the coordinator at execution time, instead of inserting the resulting tuples
in a local heap, we invoke remote execution and insert the rows in the
remote datanodes. This works nicely even for prepared queries, multiple
values clause for insert as well as any other mechanism of generating
tuples.
 
We use this infrastructure to then support CREATE TABLE AS SELECT (CTAS).
The query is transformed into a CREATE TABLE statement followed by
INSERT INTO statement and then run through normal planning/execution.
 
There are many regression cases that need fixing because these statements
now work correctly. This patch fixes many of them. Few might still be
failing, but they seem unrelated to the work itself and might be a
side-effect. We will fix them once this patch gets in.

Simply, this is the support for CREATE TABLE AS and SELECT INTO. All the possible combinations of INSERT SELECT are also possible whatever the type of table used.

Let’s see through a couple of examples with this cluster of 1 Coordinator and 4 Datanodes.
postgres=# select oid,node_name,node_type from pgxc_node;
oid | node_name | node_type
-------+-----------+-----------
11133 | coord1 | C
16384 | dn1 | D
16385 | dn2 | D
16386 | dn3 | D
16387 | dn4 | D
(5 rows

Let’s create a table and populate it with some data.
postgres=# create table a as select generate_series(1,100);
INSERT 0 100
postgres=# select count(*) from a;
count
-------
100
(1 row)

The data is distributed through the cluster of the 4 Datanodes.
postgres=# execute direct on node dn4 'select count(*) from a';
count
-------
27
(1 row)
postgres=# execute direct on node dn3 'select count(*) from a';
count
-------
19
(1 row)
postgres=# execute direct on node dn2 'select count(*) from a';
count
-------
31
(1 row)
postgres=# execute direct on node dn1 'select count(*) from a';
count
-------
23
(1 row)

CREATE TABLE AS is not only limited to global tables, you can define a distribution type, a subset of nodes, and of course the table can be unlogged or temporary. Here the table is distributed by round robin on datanodes dn1 and dn2.
postgres=# create table c distribute by round robin to node dn1,dn2 as select * from b;
INSERT 0 100
postgres=# execute direct on node dn1 'select count(*) from c';
count
-------
50
(1 row)
postgres=# execute direct on node dn2 'select count(*) from c';
count
-------
50
(1 row)
postgres=# execute direct on node dn3 'select count(*) from c';
count
-------
0
(1 row)
postgres=# execute direct on node dn4 'select count(*) from c';
count
-------
0
(1 row)

However, SELECT INTO does not have any extension for distribution type and node subsets. The reason for that is because SELECT INTO is by default a SELECT query, CREATE TABLE AS is a DDL. So in this case table created is distributed by hash on all the nodes.
postgres=# select * into d from b;
INSERT 0 100
postgres=# select pclocatortype,nodeoids from pgxc_class where pcrelid = 'd'::regclass;
-[ RECORD 1 ]-+------------------------
pclocatortype | H
nodeoids | 16384 16385 16386 16387

Yeah, that rocks.

Just yesterday I committed that.
commit 2aea0c2e0e01031f5dd4260b6985dc0ed4eadc50
Author: Michael P
Date: Tue Nov 15 09:54:54 2011 +0900
 
 Support for data distribution among a subset of datanodes
 
 CREATE TABLE has been extended with the following clause:
 CREATE TABLE ...
 [ TO ( GROUP groupname | NODE nodename [, ... ] ) ]
 
 This clause allows to distribute data among a subset of nodes
 listed by a node list, or a group alias.
 Node groups can be defined with CREATE NODE GROUP.
 
 The base structure for this support was added with commit
 56a90674444df1464c8e7012c6113efd7f9bc7db, but check of mapping of
 subsets of node list was still missing for the management of join
 push down and materialization evaluation in planner.

So what the hell is it??? Simply a feature that allows you to better control the data distributed among your Postgres-XC cluster.

Let’s take an example of a cluster with 2 Coordinators and 4 Datanodes.
postgres=# select oid,node_name from pgxc_node;
oid | node_name
-------+-----------
11133 | coord1
11134 | coord2
11135 | dn1
11136 | dn2
11137 | dn3
11138 | dn4
(6 rows)

Prior to this functionality, creating a table forced you to distribute the data among all the datanodes of your cluster.
postgres=# create table test (a int);
CREATE TABLE
postgres=# select nodeoids from pgxc_class where pcrelid = 'test'::regclass;
nodeoids
-------------------------
11135 11136 11137 11138
(1 row)

Now, CREATE TABLE has a new clause extension to be able to create a table only on a subset of nodes.
This is documented here.
The new clause is written as:
CREATE TABLE
...
[ TO ( GROUP groupname | NODE nodename [, ... ] ) ]

So you can specify a list of node names or a node group. A node group is simply an alias for a node list.
Let’s try it.

  • test12_rep is a replicated table whose data is distributed in nodes 1 and 2
  • test34_hash is a hash table whose data is distributed in nodes 3 and 4
  • test234_rep is a replicated table whose data is distributed in nodes 2, 3 and 4

postgres=# create table test12_rep (a int) distribute by replication to node dn1,dn2;
CREATE TABLE
postgres=# create table test34_hash (a int) distribute by hash(a) to node dn3,dn4;
CREATE TABLE
postgres=# create node group dn234 with dn2,dn3,dn4;
CREATE NODE GROUP
postgres=# create table test234_rep (a int) distribute by replication to group dn234;
CREATE TABLE
-- Check the node subset for distribution
postgres=# select nodeoids from pgxc_class where pcrelid = 'test12_rep'::regclass;
nodeoids
-------------
11135 11136
(1 row)
postgres=# select nodeoids from pgxc_class where pcrelid = 'test34_hash'::regclass;
nodeoids
-------------
11137 11138
(1 row)
postgres=# select nodeoids from pgxc_class where pcrelid = 'test234_rep'::regclass;
nodeoids
-------------------
11136 11137 11138
(1 row)

Now let’s insert some data.
postgres=# insert into test12_rep values (1),(2),(3);
INSERT 0 3
postgres=# insert into test234_rep values (1),(2),(3);
INSERT 0 3
postgres=# insert into test34_hash values (1),(2),(3);
INSERT 0 3

Then is data of test12_rep correctly distributed?
postgres=# execute direct on node dn1 'select * from test12_rep';
a
---
1
2
3
(3 rows)
postgres=# execute direct on node dn2 'select * from test12_rep';
a
---
1
2
3
(3 rows)
postgres=# execute direct on node dn3 'select * from test12_rep';
a
---
(0 rows)
postgres=# execute direct on node dn4 'select * from test12_rep';
a
---
(0 rows)

test12_rep is only replicated in nodes dn1 and dn2 only.

Let’s do the same checks for test234_rep and test34_hash.
--First for test34_hash
postgres=# execute direct on node dn1 'select * from test34_hash';
a
---
(0 rows)
postgres=# execute direct on node dn2 'select * from test34_hash';
a
---
(0 rows)
postgres=# execute direct on node dn3 'select * from test34_hash';
a
---
1
2
(2 rows)
postgres=# execute direct on node dn4 'select * from test34_hash';
a
---
3
(1 row)
--Then for test234_rep
postgres=# execute direct on node dn1 'select * from test234_rep';
a
---
(0 rows)
postgres=# execute direct on node dn2 'select * from test234_rep';
a
---
1
2
3
(3 rows)
postgres=# execute direct on node dn3 'select * from test234_rep';
a
---
1
2
3
(3 rows)
postgres=# execute direct on node dn4 'select * from test234_rep';
a
---
1
2
3
(3 rows)

So test234_rep is correctly replicated in nodes 2, 3 and 4. test34_hash is correctly hash-partitioned in nodes 3 and 4.

Now let’s do some join and push down checks.
postgres=# explain verbose select a from test34_hash join test234_rep using (a);
QUERY PLAN
-------------------------------------------------------------------
 Data Node Scan (Node Count [2]) (cost=0.00..0.00 rows=0 width=0)
  Output: test34_hash.a
(2 rows)

In this case replicated table test234_rep is completely mapped by test34_hash so a push down is possible to nodes 3 and 4 directly.

postgres=# explain verbose select a from test34_hash join test12_rep using (a);
QUERY PLAN
------------------------------------------------------
 Nested Loop (cost=0.00..2.04 rows=1 width=4)
  Output: test34_hash.a
  Join Filter: (test34_hash.a = test12_rep.a)
  -> Materialize (cost=0.00..1.01 rows=1 width=4)
      Output: test34_hash.a
      -> Data Node Scan (Node Count [2]) on test34_hash (cost=0.00..1.01 rows=1000 width=4)
        Output: test34_hash.a
  -> Materialize (cost=0.00..1.01 rows=1 width=4)
      Output: test12_rep.a
      -> Data Node Scan (Node Count [1]) on test12_rep (cost=0.00..1.01 rows=1000 width=4)
        Output: test12_rep.a
(11 rows)

In this case test34_hash and test12_rep are distributed on a disjoint list of nodes, so performing a join needs to first fetch data from Datanodes then materialize it on Coordinator.

There is still no way to change the table distribution type or the node list after table creation. This is one of the next plans, based on ALTER TABLE this time.

Arrays can be created easily in PostgreSQL using the additional syntax [] when defining a column of a table.
CREATE TABLE aa (a int primary key, b int[]);
CREATE TABLE bb (a int primary key, b varchar(5)[]);

Arrays follow some special grammar. You can insert array data directly with ‘{data1,data2}’ format or by using things like ARRAY[data1,data2].
postgres=# INSERT INTO aa VALUES (1, '{1,2,3,4}');
INSERT 0 1
postgres=# INSERT INTO aa VALUES (2, ARRAY[1,2,3,4]);
INSERT 0 1
postgres=# select * from aa;
a | b
---+-----------
1 | {1,2,3,4}
2 | {1,2,3,4}
(2 rows)

An array in postgres does not have any dimension restrictions. You can create arrays with multiple dimensions if desired.
postgres=# INSERT INTO aa VALUES (3, '{{1,2},{3,4}}');
INSERT 0 1
postgres=# INSERT INTO aa VALUES (4, ARRAY[ARRAY[1,2],ARRAY[3,4]]);
INSERT 0 1
postgres=# select * from aa;
a | b
---+---------------
1 | {1,2,3,4}
2 | {1,2,3,4}
3 | {{1,2},{3,4}}
4 | {{1,2},{3,4}}
(4 rows)

A special function called array_dims allows to get dimensions of an array.
postgres=# select a, array_dims(b) from aa;
a | array_dims
---+------------
1 | [1:4]
2 | [1:4]
3 | [1:2][1:2]
4 | [1:2][1:2]
(4 rows)

An array length can be obtained by array_length.
postgres=# select array_length(b,1) from aa where a = 1;
array_length
--------------
5
(1 row)

There are another couple of useful functions like:

  • array_append, array_prepend, to add values directly to an array
  • array_cat, to assemble arrays

Here is an example.
postgres=# update aa set b = array_append(b, 5) where a = 1;
UPDATE 1
postgres=# select * from aa where a = 1;
a | b
---+-------------
1 | {1,2,3,4,5}
(1 row)

The contribution module int_array contains additional functions on integer arrays to sort elements.

The last function that looks useful for array manipulation are unnest and array_string. array_string returns data of a array as a string (Oh!) with a given separator.
postgres=# select array_to_string(b,';') from aa where a = 1;
array_to_string
-----------------
1;2;3;4;5
(1 row)

This is particularly useful for array manipulation on application side.

unnest decomposes array into single elements. This can be used to refer to foreign tables in IN clauses for example.
postgres=# select unnest(b) from aa where a = 1;
unnest
--------
1
2
3
4
5
(5 rows)
postgres=# create table cc (a int, b char(2));
CREATE TABLE
postgres=# insert into cc values (1, 'Aa'), (2, 'Bb'), (3, 'Cc'), (4, 'Dd'), (6, 'Ff');
INSERT 0 5
postgres=# select b from cc where a in (select unnest(b) from aa where a = 1);
b
----
Aa
Bb
Cc
Dd
(4 rows)

unnest is implemented internally since postgres 8.4. If you use an older version, you can defined it with that.
CREATE OR REPLACE FUNCTION unnest(anyarray)
RETURNS SETOF anyelement AS
$BODY$
SELECT $1[i] FROM
generate_series(array_lower($1,1),
array_upper($1,1)) i;
$BODY$
LANGUAGE 'sql' IMMUTABLE;

Hope you enjoyed this post.

Here is the list of latest data release in the Monster Portable 3rd Wiki.

  • Monster Item data accessible from here for each monster, for both Minions and Bosses. Each monster page contains the available items by classified by quest level with the probability to get each item.
  • Shared Armor data, by “shared”, I mean armor usable by both Gunners and Swordsman. This data had a huge update!
  • Swordsman Armor data, a lot of missing armors have been added. This is now complete at 95%. It also includes armor that are only obtainable from event quests.

Armor Data has been integrated to make you an easy navigation. Don’t hesitate to check also the list of armors!

Of course, all the data in Japanese has been completely translated in English.
Both are written to help players go through the game.

The next focus will be Gunner Armors. Data is ready, but its full upload requires a bit of time.

To finish, a nice gameplay video of Deviljho with Lance.

© 2012 Michael Paquier All content is ©Copyright of Otacoo.com 2010-2012. Privacy Policy - Terms of Use