Continuing with the new features planned for PostgreSQL 9.3, here are some explanations about a new COPY mode called FREEZE. This feature has been introduced by this commit.
commit 8de72b66a2edcf12c812de0a73bd50b6b7d81d62
Author: Simon Riggs
Date: Sat Dec 1 12:54:20 2012 +0000
 
COPY FREEZE and mark committed on fresh tables.
When a relfilenode is created in this subtransaction or
a committed child transaction and it cannot otherwise
be seen by our own process, mark tuples committed ahead
of transaction commit for all COPY commands in same
transaction. If FREEZE specified on COPY
and pre-conditions met then rows will also be frozen.
Both options designed to avoid revisiting rows after commit,
increasing performance of subsequent commands after
data load and upgrade. pg_restore changes later.
 
Simon Riggs, review comments from Heikki Linnakangas, Noah Misch and design
input from Tom Lane, Robert Haas and Kevin Grittner

With an additional tweak for CREATE TABLE.
commit 1f023f929702efc9fd4230267b0f0e8d72ba5067
Author: Simon Riggs
Date: Fri Dec 7 13:26:52 2012 +0000
 
Optimize COPY FREEZE with CREATE TABLE also.
 
Jeff Davis, additional test by me

This feature allows to insert rows already *frozen* during the COPY process, the same thing can be done with a VACUUM FREEZE after doing a normal COPY, but it is always nice to have the possibility to do that during the process, especially if a lot of data is loaded. Such *frozen* rows are already marked as committed at insert, which is good for performance but not that much for visibility, as the rows loaded are viewable from other sessions while being loaded. So this feature is limited to some particular conditions:

  • Table been freshly created or truncated in current subtransaction
  • No older snapshots
  • No open cursors

If one of those conditions is not satisfied, COPY will fail silently and return to a normal process.

Let’s see how this works with a simple set of data like that.
$ cat ~/desktop/data.txt
1,2
3,4
5,6
7,8

In the case of a normal COPY, you would get something like that (replace $HOME by your own local folder).
postgres=# CREATE TABLE aa (a int, b int);
CREATE TABLE
postgres=# COPY aa FROM '$HOME/desktop/data.txt' DELIMITER ',';
COPY 4
postgres=# SELECT xmin,xmax,a,b FROM aa WHERE a = 1;
xmin | xmax | a | b
------+------+---+---
687 | 0 | 1 | 2
(1 row)

Have a look at the xmin value, it has been set to the XID of the transaction used.

A COPY FREEZE needs to be done on a fresh table, so you can use it like this.
postgres=# BEGIN;
BEGIN
postgres=# CREATE TABLE aa (a int, b int);
CREATE TABLE
postgres=# COPY aa FROM '$HOME/desktop/data.txt' DELIMITER ',' FREEZE;
COPY 4
postgres=# COMMIT;
COMMIT
postgres=# SELECT xmin,xmax,a,b FROM aa WHERE a = 1;
xmin | xmax | a | b
------+------+---+---
2 | 0 | 1 | 2
(1 row)

xmin has been aggressively set to 2.

Note that this also works with TRUNCATE.
postgres=# BEGIN;
BEGIN
postgres=# TRUNCATE aa;
TRUNCATE TABLE
postgres=# COPY aa FROM '/home/ioltas/desktop/data.txt' DELIMITER ',' FREEZE;
COPY 4
postgres=# COMMIT;
COMMIT
postgres=# SELECT xmin,xmax,a,b FROM aa WHERE a = 1;
xmin | xmax | a | b
------+------+---+---
2 | 0 | 1 | 2
(1 row)

Let’s see if the conditions to perform the FREEZE are not met:
postgres=# TRUNCATE aa;
TRUNCATE TABLE
postgres=# BEGIN;
BEGIN
postgres=# COPY aa FROM '/home/ioltas/desktop/data.txt' DELIMITER ',' FREEZE;
COPY 4
postgres=# COMMIT;
COMMIT
postgres=# SELECT xmin,xmax,a,b FROM aa WHERE a = 1;
xmin | xmax | a | b
------+------+---+---
692 | 0 | 1 | 2
(1 row)

Note once again that xmin is set normally.

And it looks to be all for this new feature, have fun with it.

A new feature ideal for module makers is appearing in Postgres 9.3. Called “background worker processes”, this feature, which is a set of useful APIs, offers the possibility to create and customize worker processes called bgworkers able to run user-specified code directly plugged in the server. This worker is loaded and managed entirely by the server. So such processes can be considered as envelopped in a wrapper on top of the core code as a plug-in.

This feature has been introduced by this commit.
commit da07a1e856511dca59cbb1357616e26baa64428e
Author: Alvaro Herrera
Date: Thu Dec 6 14:57:52 2012 -0300
 
Background worker processes
 
Background workers are postmaster subprocesses that run arbitrary
user-specified code. They can request shared memory access as well as
backend database connections; or they can just use plain libpq frontend
database connections.
 
Modules listed in shared_preload_libraries can register background
workers in their _PG_init() function; this is early enough that it's not
necessary to provide an extra GUC option, because the necessary extra
resources can be allocated early on. Modules can install more than one
bgworker, if necessary.
 
Care is taken that these extra processes do not interfere with other
postmaster tasks: only one such process is started on each ServerLoop
iteration. This means a large number of them could be waiting to be
started up and postmaster is still able to quickly service external
connection requests. Also, shutdown sequence should not be impacted by
a worker process that's reasonably well behaved (i.e. promptly responds
to termination signals.)
 
The current implementation lets worker processes specify their start
time, i.e. at what point in the server startup process they are to be
started: right after postmaster start (in which case they mustn't ask
for shared memory access), when consistent state has been reached
(useful during recovery in a HOT standby server), or when recovery has
terminated (i.e. when normal backends are allowed).
 
In case of a bgworker crash, actions to take depend on registration
data: if shared memory was requested, then all other connections are
taken down (as well as other bgworkers), just like it were a regular
backend crashing. The bgworker itself is restarted, too, within a
configurable timeframe (which can be configured to be never).
 
More features to add to this framework can be imagined without much
effort, and have been discussed, but this seems good enough as a useful
unit already.
 
An elementary sample module is supplied.

You can imagine many scenarios where your own customized workers would be useful, especially for maintainance tasks, and here are some examples:

  • Disconnect automatically idle connections on the server, with a combo of the type pg_stat_activity/pg_terminate_backend
  • Create customized statistic information
  • Save information related to the database
  • Monitor table indexes and reindex things that have been failing
  • Log extra information regarding the activity of the database

There are of course many other use cases possible…

By the way, in order to show how this feature works, I wrote a really simple example of a customized worker called count_relations that counts every second the number of relations in the database server and that outputs the result in the server logs.

The code written uses as a base the example in contrib/worker_spi/ of Postgres tarball in a really simplified way. It is available for download here.

After playing with this code, I also wanted to share my experience, so here are a couple of points you should take care of when writing your own customized worker.

Initialize and register the worker correctly

Postgres core uses _PG_init as an entry point to register the customized worker, so be sure to initialize your worker(s) in a way close to that:
void
_PG_init(void)
{
 BackgroundWorker worker;
 /* register the worker processes */
 worker.bgw_flags = BGWORKER_SHMEM_ACCESS |
  BGWORKER_BACKEND_DATABASE_CONNECTION;
 worker.bgw_start_time = BgWorkerStart_RecoveryFinished;
 worker.bgw_main = worker_spi_main;
 worker.bgw_sighup = worker_spi_sighup;
 worker.bgw_sigterm = worker_spi_sigterm;
 worker.bgw_name = "count relations";
 worker.bgw_restart_time = BGW_NEVER_RESTART;
 worker.bgw_main_arg = NULL;
 RegisterBackgroundWorker(&worker);
}

Respect the transaction flow

Inside a worker, queries are run through the SPI, so be sure to respect a transaction flow similar to that:
StartTransactionCommand(); // start transaction
SPI_connect(); // start the SPI
PushActiveSnapshot(GetTransactionSnapshot());
[ ... ] // build query
SPI_execute(query, true/false, 0); //true for read-only, false for read-write
[ ... ] // result treatment
SPI_finish(); // finish SPI
PopActiveSnapshot(); // throw snapshot
CommitTransactionCommand(); // commit transaction

Make a simple makefile

The makefile does not need to be that complicated, something like that is sufficient:
MODULES = count_relations
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

In order to install this module correctly, be sure that LD_LIBRARY_PATH is points to the folder where the libraries of postgres are installed, then only do a “make install” and you are done.

Set up server and load your library

Before starting your server, you need to set up shared_preload_libraries in postgresql.conf to the name of your customized libraries to have them uploaded at start-up. In the case of count_relations, it consists in adding that:
shared_preload_libraries = 'count_relations'

Check worker start

All the workers are called “bgworker: $NAME”, depending on the name you chose for your module.
In the case of count_relations:
ps x | grep bgworker
25146 ? Ss 0:00 postgres: bgworker: count relations

Then, for count_relations, you can also have a look at the logs of the server and you will see lines of that type, proving that the bgworker is working correctly.
LOG: Currently 292 relations in database

Rely on the worker_spi example

When writing a new worker, try not to write it from scratch but use as a base the code of worker_spi. This code already implements some methods that can be used generically like signal handling, latch management and database connection. So use it and abuse of it!

As a last word, it is important to understand that the example of bgworker made for this post is really basic, and touches only a tiny portion of what is available in the feature APIs, so be sure to have a look at the documentation for further details.

Postgres-XC, read/write-scalable multi-master symmetric cluster based on PostgreSQL, version 1.0.1 has been released today.

This minor release is based on the latest PostgreSQL 9.1.5+alpha, meaning that all the patches in PostgreSQL 9.1 stable branch have been merged up to commit d10ddf4 (3rd of September 2012).

You can download the source tarball directly from here.
Like PostgreSQL, this tarball contains all the HTML and man documentation.

The documentation of 1.0, including release notes, is available here.

Around 20 bugs have been fixed since 1.0.0, with in particular those fixes:

  • Applications like pgadmin had problems to connect to Postgres-XC server.
  • Drop of sequence was not managed correctly when its database was dropped

You can find all the details in the release notes here.

The project can be followed on Source Forge:
And a couple of GIT repositories are used for development:

  • SourceForge
  • Github
  • Twitter: @PostgresXCBot, bot giving tweets about the commits in Postgres-XC GIT repository

The project members are currently working hard on the next version of Postgres-XC that will include those features:

  • triggers (being implemented)
  • Merge with PostgreSQL 9.2 code (already committed)
  • RETURNING, WHERE CURRENT OF (being implemented)
  • Insure consistency of utilities that cannot run inside transaction block (ex: CREATE DATABASE safely insured in multiple nodes, being implemented)
  • Change table distribution type with ALTER TABLE (already committed)
  • Support for cursors (already committed)
  • Stuff related to node addition and deletion
  • and other things…

The project is under the same license as PostgreSQL, and is managed under a single entity called “Postgres-XC Development Group”.

Have fun with this stable release.

For the last couple of days I have been working on merging the code of Postgres-XC from 9.1 to 9.2. The release 1.0 of XC has been based on 9.1, but it is time to move forward and steal the latest PostgreSQL amazing features :) . The plan was to plug in the code of XC up to the intersection of PostgreSQL master branch and 9.2 stable branch, pretty interesting for two things:

  • Possibility to create stable branches of Postgres-XC based on the 9.2 stable branch of Postgres
  • Preserve the code for merges with future PostgreSQL releases

So the code stays in sync with the latest stable branch of Postgres and the master branch.

However, after this small regression… It has been honestly an interesting experience, allowing me to have a look at the latest evolutions inside PostgreSQL itself between 9.1 and 9.2, and I picked up a couple of items that changed since last year. This list is of course not a complete one, as it results from all the merge conflicts I have seen between Postgres-XC code and PostgreSQL code, so I might not have seen everything on board. More a memo than anything else, perhaps this will help some hackers when upgrading their own code and features not in core.

  • Separation of PGPROC into PGXACT and PGPROC. The new structure PGXACT contains information related to vacuum and snapshot. This really improves the performance for multi-core.
  • Removal of RecentGlobalXmin from snapshot data
  • Addition of clause IF EXISTS in several DDL. You should avoid to use missing_ok in RangeVarGetRelid to false in order to get correct error messages
  • Check if a table column has a dependency with a rule and do not drop it when it is the case
  • Return messages presenting details of tuple data back to client for tuples that failed to satisfy a constraint
  • Removal of inner_plan and outer_plan from deparse_namespace
  • Creation of a new utility command for CREATE TABLE AS, called CreateTableAsStmt. Before CTAS was included in Query with intoClause, this clause has been removed and now both SELECT INTO and CREATE TABLE AS are transformed into this utility at query analysis. Honestly this is cleaner like this
  • Management of cached plans largely modified. In short, the way plans cached have been changed in plancache.c has been really refactored
  • Management of tuple sorting in tuplesort.c. SortSupport is now used instead of ScanKeys
  • Use of PlannerInfo instead of PlannerGlobal when setting references in a plan
  • Setup of parameters in a planner path (see pathnode.c)
  • Modifications for RangeVarGetRelid functions in namespace.h. The calling protocol is extended with no-wait and callback options
  • New functionalities in var.c to pull out Var clauses for a given varno. Some of the APIs were already implemented in XC, but better to use the stuff from core
  • DROP of TRIGGER and RULES is now groupped with DropStmt, before it was under the banner DropPropertyStmt

Of course this list is far from showing all the things done in Postgres for 1 year… This is just the top of the iceberg.

Postgres-XC, read&write-scalable multi-master symmetric cluster based on PostgreSQL, version 1.0.0 is released.
This project is seen as an open-source alternative to costly products such as OracleRAC. Postgres-XC is based on the code of PostgreSQL, so it can naturally use all its technologies, which are enhaunced to have a shared-nothing multi-master PostgreSQL-based database cluster.

This first stable version is based on PostgreSQL 9.1.4. All the patches in PostgreSQL 9.1 stable branch have been merged up to commit 873d1c1 (1st of June 2012).
This includes the security fix related to pg_crypto dated of 30th of May.
You can download the source tarball directly from here
This tarball contains all the HTML and man documentation.

30 bug fixes have been made since release of beta2, with some notable enhancements:

  • Support for EXTENSION is fixed
  • Stabilization of the use of slave nodes in cluster
  • Fix of a bug related to read-only transactions, improving performance by 15%
  • Support of compilation for MacOSX

About the scalability of this release, Postgres-XC 1.0.0 scales to a factor of 3 when compared to a standalone server PostgreSQL 9.1.3 on 5 nodes using a benchmark called DBT-1.

Compared to version Postgres-XC 0.9.7, the following features have been added:

  • Fast query shipping (FQS), quick identification of expressions in a query that can be pushed down to remote nodes
  • SERIAL types
  • TABLESPACE
  • Utility to clean up 2PC transactions in cluster (pgxc_clean)
  • Utility for initialization of GTM (global transaction manager, utility called initgtm)
  • Relation-size functions and locking functions
  • Regression stabilization

The documentation of 1.0, including release notes, is available here.

The project can be followed on SourceForge.
And a couple of GIT repositories are used for development:

The core team is currently working in the addition of new features for the next major release including:

  • Merge with PostgreSQL 9.2
  • Data redistribution functionality, changing table distribution in cluster with a simple ALTER TABLE
  • New functionalities related to online node addition and deletion for a better user experience
  • Triggers
  • Planner improvements
  • Global constraints

The roadmap of the project is located here in section Roadmap.

The project is under the same license as PostgreSQL, now managed under a single entity called “Postgres-XC Development Group”.
In order to keep in touch with the project, whose development follows the same model as PostgreSQL, you can register to the following mailing lists:

  • postgres-xc-general@lists.sourceforge.net, for general questions. Registration can be done here
  • postgres-xc-developers@lists.sourceforge.net. hachers mailing list. Registration can be done here
©2010-2013 Michael Paquier All content is ©Copyright of Otacoo.com 2010-2013. Privacy Policy - Terms of Use