After obtaining a machine with MacOS Lion in it, here are the applications that look absolutely mandatory.
This list is more a memo than anything, but you could find it useful.

  • ClamXav, a free open-source antivirus. This can be downloaded from the AppleStore. Scans are light and quick.
  • LibreOffice, great free ressource for documentation, and has no Oracle logo included with it.
  • Xcode, a free development kit for Mac applications. It includes a gcc compiler, and can be found in the AppleStore.
  • iterm(2), alternative for terminal knowing that the default terminal in MacOS is a pain to use. iterm is not fully compatible with MacOSX Lion but iterm2 can be a solution.
  • vlc, a video viewer ressource. I’m with it for a couple of years, and it has always been astonishing light and fast.
  • MacPorts, an alternative way to manage packages with command line.
  • Minecraft, no comments on this one…

And that’s all…

After long weeks of battle, this week this commit has happened in Postgres-XC’s GIT repository.
commit 56a90674444df1464c8e7012c6113efd7f9bc7db
Author: Michael P
Date: Thu Oct 27 10:57:30 2011 +0900
 
Support for Node and Node Group DDL
 
Node information is not anymore supported by node number using
GUC parameters but node names.
Node connection information is taken from a new catalog table
called pgxc_node. Node group information can be found in pgxc_group.
 
Node connection information is taken from catalog when user session
begins and sticks with it for the duration of the session. This brings
more flexibility to the cluster settings. Cluster node information can
now be set when node is initialized with initdb using cluster_nodes.sql
located in share directory.
 
This commits adds support for the following new DDL:
- CREATE NODE
- ALTER NODE
- DROP NODE
- CREATE NODE GROUP
- DROP NODE GROUP
 
The following parameters are deleted from postgresql.conf:
- num_data_nodes
- preferred_data_nodes
- data_node_hosts
- data_node_ports
- primary_data_node
- num_coordinators
- coordinator_hosts
- coordinator_ports
pgxc_node_id is replaced by pgxc_node_name to identify the node-self.
 
Documentation is added for the new queries. Functionalities such as
EXECUTE DIRECT, CLEAN CONNECTION use node names instead of node numbers now.

So what is it about? Until now Postgres-XC has only used a heavy configuration to set up node connection information. There were 8 parameters dedicated to Coordinators and Datanodes, and those parameters had to follow a special format.
Now, the following SQL queries can be issued to set up cluster connection information, and information is cached once user session is up.
For the time being, a file called cluster_nodes.sql has to be set in share folder for initdb. But soon functionalities will be added to update pooler connection information based on node information update, insert or deletion.
This brings a lot of simplicity in cluster setting. And now, nodes are not identified by their position number in a GUC string, but by a unique global name that maintains consistency in the whole cluster.

Here are some examples of cluster settings.
1 Coordinator and 2 Datanodes:
CREATE NODE coord1 WITH (HOSTIP = 'localhost', COORDINATOR MASTER, NODEPORT = $COORD1_PORT);
CREATE NODE dn1 WITH (HOSTIP = 'localhost', NODE MASTER, NODEPORT = $DN1_PORT, PREFERRED);
CREATE NODE dn2 WITH (HOSTIP = 'localhost', NODE MASTER, NODEPORT = $DN2_PORT, PRIMARY);

2 Coordinators and 2 Datanodes:
CREATE NODE coord2 WITH (HOSTIP = 'localhost', COORDINATOR MASTER, NODEPORT = $COORD2_PORT);
CREATE NODE coord1 WITH (HOSTIP = 'localhost', COORDINATOR MASTER, NODEPORT = $COORD1_PORT);
CREATE NODE dn2 WITH (HOSTIP = 'localhost', NODE MASTER, NODEPORT = $DN2_PORT, PRIMARY);
CREATE NODE dn1 WITH (HOSTIP = 'localhost', NODE MASTER, NODEPORT = $DN1_PORT, PREFERRED);

So, what happens in the cluster for 2 Datanodes and 2 Coordinators?
postgres=# select oid,* from pgxc_node;
-[ RECORD 1 ]----+----------
oid | 11133
node_name | coord1
node_type | C
node_related | 0
node_port | 5432
node_host | localhost
nodeis_primary | f
nodeis_preferred | f
-[ RECORD 2 ]----+----------
oid | 11134
node_name | coord2
node_type | C
node_related | 0
node_port | 5452
node_host | localhost
nodeis_primary | f
nodeis_preferred | f
-[ RECORD 3 ]----+----------
oid | 11135
node_name | dn1
node_type | D
node_related | 0
node_port | 15451
node_host | localhost
nodeis_primary | f
nodeis_preferred | t
-[ RECORD 4 ]----+----------
oid | 11136
node_name | dn2
node_type | D
node_related | 0
node_port | 15452
node_host | localhost
nodeis_primary | t
nodeis_preferred | f

Other functionalities now also work with node names, like EXECUTE DIRECT and CLEAN CONNECTION:
postgres=# clean connection to node dn1 for database postgres;
CLEAN CONNECTION
postgres=# execute direct on node dn1 'select oid,* from pgxc_node where node_type = ''D''';
-[ RECORD 1 ]----+----------
oid | 11135
node_name | dn1
node_type | D
node_related | 0
node_port | 15451
node_host | localhost
nodeis_primary | f
nodeis_preferred | t
-[ RECORD 2 ]----+----------
oid | 11136
node_name | dn2
node_type | D
node_related | 0
node_port | 15452
node_host | localhost
nodeis_primary | t
nodeis_preferred | f

pg_regress is a PostgreSQL test module that permits to check if you have done correctly an installation of a PostgreSQL server.

Until now, the development of Postgres-XC has been focused on scalability and performance, without always checking if implementation sticked with PostgreSQL standards.
However, in order to be able to consider Postgres-XC as a product, it has to pass those regression tests.
This is also the easiest way to check if it respects the SQL rules protected by PostgreSQL, making it a user-friendly software.

So, why passing regression tests?

  1. Prove that XC can be stable
  2. Improve efficiency of the implementation of new functionalities. All the SQL test cases are already in the regression tests, so checking if an implementation is correct is faster and secured. Passing also regression tests makes the basics of Postgres-XC really stronger.

Well, are those regression tests sufficient?
No, they are a base to protect the basics of the cluster product when running SQL queries. As a cluster, Postgres-XC needs tests for:

  1. High-availability (node failure, security)
  2. performance (write-scalability)
  3. regression tests specific to Postgres-XC (CREATE TABLE has been extended with DISTRIBUTE BY [REPLICATION | HASH(column) | ROUNDROBIN | MODULO(column)])

Let’s talk a little bit more about pg_regress.

All its files are located in src/test/regress.
The most common usage made is an installation check, what would basically consist in typing the following command in src/test/regress:
make installcheck
This command allows to launch regression tests on a PostgreSQL server having the default port 5432 open.
./pg_regress --inputdir=. --dlpath=. --multibyte=SQL_ASCII --psqldir=/home/ioltas/pgsql/bin --schedule=./serial_schedule

Let’s have a look at what makes pg_regress… You can find the following folders:

  • data, all the external data used for mainly COPY
  • input, input data for SQL queries that depend on the environment where regression tests are launched: COPY, TABLESPACE… Those files have the suffix .source, and are saved in folder sql after generation
  • output, output files whose content are modified depending on the environment where regressions are installed
  • expected, all the expected results. Those files have the prefix .out and have the same prefix name as the sql or source files
  • sql, all the files containing the SQL queries to run for regression tests. They have the same prefix name as the corresponding expected result files .out.

For Postgres-XC, as the default table type is round robin, or hash if the first column can be distributed, the order of output data for SELECT queries cannot be controlled.
As regressions have to give the same results whatever the cluster configuration (it cannot depend on the number of Coordinators and Datanodes), SELECT queries are sometimes completed with ORDER BY.
For some types where ORDER BY has no effect like box or point, the table is created as a replicated one (use of keyword DISTRIBUTE BY REPLICATION at the end of CREATE TABLE).

There are 121 test cases that have to be checked in pg_regress.
Most of them can be corrected based on the current limitations of Postgres-XC (update, delete, case, guc…).
But some of them require more fundamental work (select_having, subselect, returning).
Others are currently making the cluster entering in a stall state (errors, constraints).

This is a huge task. But once this is completed,
Postgres-XC will have the base that will make it a great cluster product!

©2010-2013 Michael Paquier All content is ©Copyright of Otacoo.com 2010-2013. Privacy Policy - Terms of Use