PostgreSQL is already pretty useful for application developers when returning to client error messages by providing a certain level of details with multiple distinct fields like the position of the code where the error occurred. However this was lacking with the database object names, forcing the client application to deparse the error string returned by server, generally with field ‘M’, to get more details about the objects that have been involved in the errors. This functionality has been added in PostgreSQL 9.3 thanks to this commit.
commit 991f3e5ab3f8196d18d5b313c81a5f744f3baaea
Author: Tom Lane
Date: Tue Jan 29 17:06:26 2013 -0500
 
Provide database object names as separate fields in error messages.
 
This patch addresses the problem that applications currently have to
extract object names from possibly-localized textual error messages,
if they want to know for example which index caused a UNIQUE_VIOLATION
failure. It adds new error message fields to the wire protocol, which
can carry the name of a table, table column, data type, or constraint
associated with the error. (Since the protocol spec has always instructed
clients to ignore unrecognized field types, this should not create any
compatibility problem.)
 
Support for providing these new fields has been added to just a limited set
of error reports (mainly, those in the "integrity constraint violation"
SQLSTATE class), but we will doubtless add them to more calls in future.
 
Pavel Stehule, reviewed and extensively revised by Peter Geoghegan, with
additional hacking by Tom Lane.

Thanks to this feature, it is possible to obtain more detailed information about the objects involved in an error by providing additional fields. This has as advantage to avoid having to deparse an error message string that could change, even slightly, between major releases, and to provide a centralized way to report errors. There are five new additional error fields introduced with this commit:

  • ‘s’, schema name
  • ‘t’, table name
  • ‘c’, column name
  • ‘d’, datatype name
  • ‘n’, constraint name

Note that those fields are used only for certain error codes involving those specific objects. You can have a look to the error code appendix for more details.

In order to be able to view the new fields, be sure to set up the error report verbosity to ‘verbose’. With a psql client, you simply need to use this command:
\set VERBOSITY verbose
This is particularly useful as default value in a development environment, so also feel free to set it in ~/.psqlrc if necessary.

So, let’s now have a look at this feature with an extremely simple table using a primary key.
postgres=# CREATE TABLE aa (a int PRIMARY KEY);
CREATE TABLE
postgres=# INSERT INTO aa VALUES (1);
INSERT 0 1

Prior to 9.3, here is what you would get as error message when a primary key constraint is violated (9.2 stable branch code, 9.2.4+alpha):
postgres=# INSERT INTO aa VALUES (1);
ERROR: 23505: duplicate key value violates unique constraint "aa_pkey"
DETAIL: Key (a)=(1) already exists.
LOCATION: _bt_check_unique, nbtinsert.c:396

And Here is what you get with 9.3 (beta1):
postgres=# INSERT INTO aa VALUES (1);
ERROR: 23505: duplicate key value violates unique constraint "aa_pkey"
DETAIL: Key (a)=(1) already exists.
SCHEMA NAME: public
TABLE NAME: aa
CONSTRAINT NAME: aa_pkey
LOCATION: _bt_check_unique, nbtinsert.c:398

Note the presence of the fields ‘SCHEMA NAME’, ‘TABLE NAME’ and ‘CONSTRAINT NAME’ here.

Having such an additional output for psql is always useful in order to catch quickly the object causing the errors, but honestly this is far more useful for backend applications using an interface like libpq that manipulate error fields directly as it removes the necessity to apply some magic on the driver or application-side to deparse manually a given error message.

IF EXISTS and IF NOT EXISTS are clauses allowing to return a notice message instead of an error if a DDL query running on a given object already exists or not depending on the DDL action done. If a given query tries to create an object when IF NOT EXISTS is specified, a notice message is returned to client if the object has already been created and nothing is done on server side. If the object is altered or dropped when IF EXISTS is used, a notice message is returned back to client if the object does not exist and nothing is done.

Here is what simply happens when a table that exists is created:
postgres=# CREATE TABLE IF NOT EXISTS aa (a int);
CREATE TABLE
postgres=# CREATE TABLE IF NOT EXISTS aa (a int);
NOTICE: relation "aa" already exists, skipping
CREATE TABLE

Similarly, when dropping this table based on its existence.
postgres=# DROP TABLE IF EXISTS aa;
DROP TABLE
postgres=# DROP TABLE IF EXISTS aa;
NOTICE: table "aa" does not exist, skipping
DROP TABLE

Prior to 9.3, PostgreSQL already proposed this feature with many objects: tables, index, functions, triggers, language, etc. Such SQL extensions are useful when running several times the same script several times and avoiding errors on environments already installed.

9.3 introduces some new flavors of IF [NOT] EXISTS completing a bit more the set of objects already supported.

  • CREATE SCHEMA [IF NOT EXISTS]
  • ALTER TYPE ADD VALUE [IF NOT EXISTS]
  • Extension of DROP TABLE IF EXISTS such as it succeeds if the specified schema does not exists

Note also that the new materialized views are also supported with IF [NOT] EXISTS for CREATE, ALTER and DROP.

The extension of CREATE SCHEMA with IF NOT EXISTS is pretty simple. Similarly to the other objects, command succeeds if the schema already exists and a notice message about the existence of schema is sent back to client.
postgres=# CREATE SCHEMA foo;
ERROR: schema "foo" already exists
postgres=# CREATE SCHEMA IF NOT EXISTS foo;
NOTICE: schema "foo" already exists, skipping
CREATE SCHEMA

Note that subsequent schema elements cannot be used with this option.
postgres=# CREATE SCHEMA IF NOT EXISTS foo CREATE TABLE aa (a int);
ERROR: CREATE SCHEMA IF NOT EXISTS cannot include schema elements
LINE 1: CREATE SCHEMA IF NOT EXISTS foo CREATE TABLE aa (a int);

The second addition, ALTER TYPE ADD VALUE [IF NOT EXISTS] is useful in the case of enumeration types to condition the addition of new values.
postgres=# CREATE TYPE character_type AS ENUM ('warrior', 'priest', 'sorcerer');
CREATE TYPE
postgres=# ALTER TYPE character_type ADD VALUE IF NOT EXISTS 'magician';
ALTER TYPE
postgres=# ALTER TYPE character_type ADD VALUE IF NOT EXISTS 'magician';
NOTICE: enum label "magician" already exists, skipping
ALTER TYPE

The last improvement is also a nice thing to have. Here is what you could obtain prior to 9.3 when trying to use DROP TABLE IF EXISTS on a table using a schema that did not exist.
postgres=# DROP TABLE IF EXISTS foosch.foo;
ERROR: schema "foosch" does not exist

And here is what you get now:
postgres=# DROP TABLE IF EXISTS foosch.foo;
NOTICE: table "foo" does not exist, skipping
DROP TABLE

Those are definitely nice additions, especially the new extension of IF NOT EXISTS on schemas which was really missing in the existing set.

Before launching make for a raw build, Postgres does some preprocessing with configure to setup the installation based on the environment and the different options given by user. If you don’t do that before trying a make, your build will fail with a critical hit of this type:
$ make
You need to run the 'configure' program first. See the file
'INSTALL' for installation instructions.
make: *** [all] Error 1

It is important to get familiar with this way of doing before actually working or developing Postgres, so be sure to go through the notes of this post to learn one thing or two, or even complete those notes with comments at the bottom.

This process is launched with ./configure, located at the root of the code path (it can even be called outside of code folder in the case of a vpath installation) and generated directly from configure.in using autoconf. A couple of scripts located in config/ containing additional methods not in autoconf are also used when generating ./configure from ./configure.in for many things:

  • Check presence of necessary library dependencies, some of them being optional (documentation) and others mandatory as core process need them (flex, bison)
  • Check availability of some languages (perl.m4, python.m4)
  • Test C-related functionalities (c-compiler.m4, c-library.m4)
  • etc.

Note that those files are included through ./aclocal.m4 like that:
m4_include([config/c-compiler.m4])
./aclocal.m4 is included itself automatically by autoconf as default.

If you are a PostgreSQL developer, you might at some point create a fork of Postgres for a private project. In this case, you should definitely modify your project to be a maximum consistent with Postgres itself in order to facilitate merges with future versions. The preprocessing Postgres uses has few chances to change but it usually includes fixes that may be platform dependent, so be sure to fetch the fixes when they are here. Here are also some advices about how you should manage a fork regarding configure:

  • Modify in priority configure.in and not configure. Generate configure based on your modifications of configure.in.
  • When extra preprocessing is needed, add your own m4 procedures in config/ in separate files, except if they overlap with existing checks, so as if a fix happens on Postgres itself you will be able to at least detect easily if there is a conflict with what your own stuff.
  • Declare additional m4 files in ./aclocal.m4
  • Similarly, do not forget to update Makefile.global.in when adding some new variables, and use them!

Here is a short example of how you could add your own .in file with some code dedicated to configure.in understandable by autoconf.
#Some initialization
AC_INIT([mypgfork], [1.0devel], [joe@example.com])
AC_CONFIG_AUX_DIR(config)
 
# Setup a default prefix
AC_PREFIX_DEFAULT(/usr/local/psql)
 
# Addition of my own customized file
AC_CONFIG_FILES([foo.cfg.in])
 
# Option to enforce optimize to a wanted flag
PGAC_ARG_BOOL(enable, optimize, no,
  [build with optimize symbol (-O2)])
 
# supply -g if --enable-super-debug
if test "$enable_optimize" = yes; then
 CFLAGS="-O2"
fi
 
# Substitute CFLAGS value in .in files
AC_SUBST(CFLAGS)
 
# Generate the output to files
AC_OUTPUT

In this case a file called foo.cfg *is* generated… Feel free to play with this piece of code btw.

For a vanilla Postgres build, two files that are used for code make and installation are generated: src/Makefile.global.in (containing all the variables that are environment-dependent as well as all the configuration parameters that user has set when launching ./configure) and GNUMakefile.in.

When generating Postgres from raw code, there are some options in ./configure you should absolutely know about if you are a developer:

  • –prefix to define the folder where library and binaries will be installed (bin, share, lib, include).
  • –enable-cassert, to enable assertions inside a build (avoid that for a production build, this option is good only or development).
  • –enable-debug, making the code to compile with -g in CFLAGS, the default being -O2 for production builds.
  • –enable-depend, to enable automatic dependency tracking, useful to recompile all the objects affected by a header modification.

There are also some options that you might need if you develop an application based on Postgres:

  • –with-perl, to enable PL/Perl on server side
  • –with-python, to enable PL/Python on server side
  • –with-blocksize, defining the default block size used by table pages, default begin 8kB. Note that using binaries compiled with a given block size is not compatible with a existing server that has been initialized with a different block size.

Do not forget to have a look at all the options available here, configure might have many things that remained hidden to you until now.

psql is the native client of PostgreSQL widely used by application developers and database administrators on a daily-basis for common operations when interacting with a PostgreSQL server. With a full set of integrated functionalities, it is among the most popular (if not the number one) client applications in the Postgres community. If you are a Postgres nerd (highly possible if you are reading this page), you know that the feeling of discovering a new functionality in psql is close to the excitement you can have when opening a christmas present as you know that such a feature went though the strict community review process. The upcoming release 9.3 of Postgres is not an exception and brings a new useful command called \watch. This feature has been introduced by this commit:
commit c6a3fce7dd4dae6e1a005e5b09cdd7c1d7f9c4f4
Author: Tom Lane
Date: Thu Apr 4 19:56:33 2013 -0400
 
Add \watch [SEC] command to psql.
 
This allows convenient re-execution of commands.
 
Will Leinweber, reviewed by Peter Eisentraut, Daniel Farina, and Tom Lane

Close to what \g can do when replaying the last query stored in buffer, \watch offers the possibility to replay the same query with a given interval of time as follows:
\watch [ seconds ]

Compared to a wrapper on psql that would run repetitively the same query, \watch does not need to acquire a new connection each time the query is executed, saving some execution overhead. Also, the watch automatically stops if a failure occurs for the query.

Note that \watch can only be used at the end of the query you want to run.
postgres=# \watch 2 "SELECT 1"
\watch cannot be used with an empty query

This error does not appear if a query is stored in buffer.

If no query is specified it will use the latest query in buffer.
postgres=# \watch 10
Watch every 10s Thu May 2 13:15:14 2013
 
?column?
----------
1
(1 row)

For example, simply check if your server is alive using psql:
postgres=# select 1; \watch 1
?column?
----------
1
(1 row)
Watch every 1s Thu May 2 13:06:53 2013
 
?column?
----------
1
(1 row)
 
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

Oops, the connection has been closed…

Here is an other example: check every minute the latest query that ran on server.
postgres=# SELECT datname, query, usename FROM pg_stat_activity ORDER BY query_start DESC LIMIT 1; \watch 60
datname | query | usename
----------+-----------------------------------------------------------------------------------------+----------
postgres | SELECT datname, query, usename FROM pg_stat_activity ORDER BY query_start DESC LIMIT 1; | postgres
(1 row)
 
Watch every 60s Thu May 2 13:40:49 2013
 
datname | query | usename
----------+-----------------------------------------------------------------------------------------+----------
postgres | SELECT datname, query, usename FROM pg_stat_activity ORDER BY query_start DESC LIMIT 1; | postgres
(1 row)

A last example: kill periodically backends (here every 60s) that have not been activated for a given period of time (here 30s).
postgres=# SELECT pg_terminate_backend(pid) as status FROM pg_stat_activity
postgres=# WHERE now() - state_change > interval '30 s' AND
postgres=# pid != pg_backend_pid();
status
--------
(0 rows)
 
postgres=# \watch 60
Watch every 60s Thu May 2 13:51:04 2013
 
status
--------
(0 rows)

In short, a lot of things are doable with \watch, you can automatize for example actions easily with a psql client, like the refresh of a materialized view. At least I won’t need anymore to type 50 times the same query when developping an application using Postgres or creating a new feature.

Among one of the many new features implemented in 9.3, pg_dump now offers the possibility to perform parallel dumps. This feature has been introduced by the commit below.
commit 9e257a181cc1dc5e19eb5d770ce09cc98f470f5f
Author: Andrew Dunstan
Date: Sun Mar 24 11:27:20 2013 -0400
 
Add parallel pg_dump option.
 
New infrastructure is added which creates a set number of workers
(threads on Windows, forked processes on Unix). Jobs are then
handed out to these workers by the master process as needed.
pg_restore is adjusted to use this new infrastructure in place of the
old setup which created a new worker for each step on the fly. Parallel
dumps acquire a snapshot clone in order to stay consistent, if
available.
 
The parallel option is selected by the -j / --jobs command line
parameter of pg_dump.
 
Joachim Wieland, lightly editorialized by Andrew Dunstan

This is an extremely nice improvement of pg_dump as it allows accelerating the speed a dump is taken, particularly for machines having multiple cores as the load can be shared among separate threads.

Note that this option only works with the format called directoyy that can be specified with option -Fd or –format=directory, which outputs the database dump as a directory-format archive. A new option -j/–jobs can also be used to define the number of jobs that will run in parallel when performing the dump.

When using parallel pg_dump, it is important to remember that n+1 connections are opened to the server, n being the number of jobs defined, with an extra master connection to control the shared locks taken on the objects dumped. So be sure that max_connections is set up to a number high enough in accordance to the number of jobs that are planned.

Thanks to synchronized snapshots shared among the backends managed by the jobs, the dump is taken consistently ensuring that all the jobs share the same data view. However, as synchronized snapshots are only available since PostgreSQL 9.2, you need to be sure that no external sessions are doing any DML or DDL when performing a dump on servers whose version is lower than 9.2. It is also necessary to specify the option –no-synchronized-snapshots in this case.

Now, using a server having 16 cores, let’s check how this feature performs. For this test, the schema of the database dumped is extremely simple: 16 tables with a constant size of approximately 200MB each (5000000 rows with a single int4 column), for a database having a total size of 3.2GB. Tests are conducted with 1, 2, 4, 8 and 16 jobs, so in the case of 16 jobs one table would be dumped by a unique job running on a single connection. This is of course an unrealistic schema for a production database, but here the point is to give an idea of how this feature can speed up a dump in an ideal case. 5 successive runs are done for each case.

Each test case has been run with the following command:
time pg_dump -Fd -f $DUMP_DIRECTORY -j $NUM_JOBS $DATABASE_NAME

Jobs – Runs(s) 1 2 3 4 5 Avg
1 56.714 54.385 54.242 59.300 57.705 56.47
2 27.023 26.207 27.211 26.112 25.206 26.35
4 12.641 12.797 12.484 12.604 12.486 12.60
8 7.641 7.013 7.913 7.081 6.702 6.27
16 5.086 5.045 5.079 5.216 5.054 5.10

As expected, dump time is halved each time job number is doubled with this ideal database schema. However, due to some I/O disk bottleneck, the time gain is not that important with a high number of jobs. For example, in those series of tests, there is not much difference between 8 and 16 jobs, so be always aware of the I/O your dump disk can manage at most and choose carefully the number of jobs used for dumps based on that.

©2010-2013 Michael Paquier All content is ©Copyright of Otacoo.com 2010-2013. Privacy Policy - Terms of Use