IF EXISTS and IF NOT EXISTS are clauses allowing to return a notice message instead of an error if a DDL query running on a given object already exists or not depending on the DDL action done. If a given query tries to create an object when IF NOT EXISTS is specified, a notice message is returned to client if the object has already been created and nothing is done on server side. If the object is altered or dropped when IF EXISTS is used, a notice message is returned back to client if the object does not exist and nothing is done.

Here is what simply happens when a table that exists is created:
postgres=# CREATE TABLE IF NOT EXISTS aa (a int);
CREATE TABLE
postgres=# CREATE TABLE IF NOT EXISTS aa (a int);
NOTICE: relation "aa" already exists, skipping
CREATE TABLE

Similarly, when dropping this table based on its existence.
postgres=# DROP TABLE IF EXISTS aa;
DROP TABLE
postgres=# DROP TABLE IF EXISTS aa;
NOTICE: table "aa" does not exist, skipping
DROP TABLE

Prior to 9.3, PostgreSQL already proposed this feature with many objects: tables, index, functions, triggers, language, etc. Such SQL extensions are useful when running several times the same script several times and avoiding errors on environments already installed.

9.3 introduces some new flavors of IF [NOT] EXISTS completing a bit more the set of objects already supported.

  • CREATE SCHEMA [IF NOT EXISTS]
  • ALTER TYPE ADD VALUE [IF NOT EXISTS]
  • Extension of DROP TABLE IF NOT EXISTS such as it succeeds if the specified schema does not exists

Note also that the new materialized views are also supported with IF [NOT] EXISTS for CREATE, ALTER and DROP.

The extension of CREATE SCHEMA with IF NOT EXISTS is pretty simple. Similarly to the other objects, command succeeds if the schema already exists and a notice message about the existence of schema is sent back to client.
postgres=# CREATE SCHEMA foo;
ERROR: schema "foo" already exists
postgres=# CREATE SCHEMA IF NOT EXISTS foo;
NOTICE: schema "foo" already exists, skipping
CREATE SCHEMA

Note that subsequent schema elements cannot be used with this option.
postgres=# CREATE SCHEMA IF NOT EXISTS foo CREATE TABLE aa (a int);
ERROR: CREATE SCHEMA IF NOT EXISTS cannot include schema elements
LINE 1: CREATE SCHEMA IF NOT EXISTS foo CREATE TABLE aa (a int);

The second addition, ALTER TYPE ADD VALUE [IF NOT EXISTS] is useful in the case of enumeration types to condition the addition of new values.
postgres=# CREATE TYPE character_type AS ENUM ('warrior', 'priest', 'sorcerer');
CREATE TYPE
postgres=# ALTER TYPE character_type ADD VALUE IF NOT EXISTS 'magician';
ALTER TYPE
postgres=# ALTER TYPE character_type ADD VALUE IF NOT EXISTS 'magician';
NOTICE: enum label "magician" already exists, skipping
ALTER TYPE

The last improvement is also a nice thing to have. Here is what you could obtain prior to 9.3 when trying to use DROP TABLE IF EXISTS on a table using a schema that did not exist.
postgres=# DROP TABLE IF EXISTS foosch.foo;
ERROR: schema "foosch" does not exist

And here is what you get now:
postgres=# DROP TABLE IF EXISTS foosch.foo;
NOTICE: table "foo" does not exist, skipping
DROP TABLE

Those are definitely nice additions, especially the new extension of IF NOT EXISTS on schemas which was really missing in the existing set.

Before launching make for a raw build, Postgres does some preprocessing with configure to setup the installation based on the environment and the different options given by user. If you don’t do that before trying a make, your build will fail with a critical hit of this type:
$ make
You need to run the 'configure' program first. See the file
'INSTALL' for installation instructions.
make: *** [all] Error 1

It is important to get familiar with this way of doing before actually working or developing Postgres, so be sure to go through the notes of this post to learn one thing or two, or even complete those notes with comments at the bottom.

This process is launched with ./configure, located at the root of the code path (it can even be called outside of code folder in the case of a vpath installation) and generated directly from configure.in using autoconf. A couple of scripts located in config/ containing additional methods not in autoconf are also used when generating ./configure from ./configure.in for many things:

  • Check presence of necessary library dependencies, some of them being optional (documentation) and others mandatory as core process need them (flex, bison)
  • Check availability of some languages (perl.m4, python.m4)
  • Test C-related functionalities (c-compiler.m4, c-library.m4)
  • etc.

Note that those files are included through ./aclocal.m4 like that:
m4_include([config/c-compiler.m4])
./aclocal.m4 is included itself automatically by autoconf as default.

If you are a PostgreSQL developer, you might at some point create a fork of Postgres for a private project. In this case, you should definitely modify your project to be a maximum consistent with Postgres itself in order to facilitate merges with future versions. The preprocessing Postgres uses has few chances to change but it usually includes fixes that may be platform dependent, so be sure to fetch the fixes when they are here. Here are also some advices about how you should manage a fork regarding configure:

  • Modify in priority configure.in and not configure. Generate configure based on your modifications of configure.in.
  • When extra preprocessing is needed, add your own m4 procedures in config/ in separate files, except if they overlap with existing checks, so as if a fix happens on Postgres itself you will be able to at least detect easily if there is a conflict with what your own stuff.
  • Declare additional m4 files in ./aclocal.m4
  • Similarly, do not forget to update Makefile.global.in when adding some new variables, and use them!

Here is a short example of how you could add your own .in file with some code dedicated to configure.in understandable by autoconf.
#Some initialization
AC_INIT([mypgfork], [1.0devel], [joe@example.com])
AC_CONFIG_AUX_DIR(config)
 
# Setup a default prefix
AC_PREFIX_DEFAULT(/usr/local/psql)
 
# Addition of my own customized file
AC_CONFIG_FILES([foo.cfg.in])
 
# Option to enforce optimize to a wanted flag
PGAC_ARG_BOOL(enable, optimize, no,
  [build with optimize symbol (-O2)])
 
# supply -g if --enable-super-debug
if test "$enable_optimize" = yes; then
 CFLAGS="-O2"
fi
 
# Substitute CFLAGS value in .in files
AC_SUBST(CFLAGS)
 
# Generate the output to files
AC_OUTPUT

In this case a file called foo.cfg *is* generated… Feel free to play with this piece of code btw.

For a vanilla Postgres build, two files that are used for code make and installation are generated: src/Makefile.global.in (containing all the variables that are environment-dependent as well as all the configuration parameters that user has set when launching ./configure) and GNUMakefile.in.

When generating Postgres from raw code, there are some options in ./configure you should absolutely know about if you are a developer:

  • –prefix to define the folder where library and binaries will be installed (bin, share, lib, include).
  • –enable-cassert, to enable assertions inside a build (avoid that for a production build, this option is good only or development).
  • –enable-debug, making the code to compile with -g in CFLAGS, the default being -O2 for production builds.
  • –enable-depend, to enable automatic dependency tracking, useful to recompile all the objects affected by a header modification.

There are also some options that you might need if you develop an application based on Postgres:

  • –with-perl, to enable PL/Perl on server side
  • –with-python, to enable PL/Python on server side
  • –with-blocksize, defining the default block size used by table pages, default begin 8kB. Note that using binaries compiled with a given block size is not compatible with a existing server that has been initialized with a different block size.

Do not forget to have a look at all the options available here, configure might have many things that remained hidden to you until now.

psql is the native client of PostgreSQL widely used by application developers and database administrators on a daily-basis for common operations when interacting with a PostgreSQL server. With a full set of integrated functionalities, it is among the most popular (if not the number one) client applications in the Postgres community. If you are a Postgres nerd (highly possible if you are reading this page), you know that the feeling of discovering a new functionality in psql is close to the excitement you can have when opening a christmas present as you know that such a feature went though the strict community review process. The upcoming release 9.3 of Postgres is not an exception and brings a new useful command called \watch. This feature has been introduced by this commit:
commit c6a3fce7dd4dae6e1a005e5b09cdd7c1d7f9c4f4
Author: Tom Lane
Date: Thu Apr 4 19:56:33 2013 -0400
 
Add \watch [SEC] command to psql.
 
This allows convenient re-execution of commands.
 
Will Leinweber, reviewed by Peter Eisentraut, Daniel Farina, and Tom Lane

Close to what \g can do when replaying the last query stored in buffer, \watch offers the possibility to replay the same query with a given interval of time as follows:
\watch [ seconds ]

Compared to a wrapper on psql that would run repetitively the same query, \watch does not need to acquire a new connection each time the query is executed, saving some execution overhead. Also, the watch automatically stops if a failure occurs for the query.

Note that \watch can only be used at the end of the query you want to run.
postgres=# \watch 2 "SELECT 1"
\watch cannot be used with an empty query

This error does not appear if a query is stored in buffer.

If no query is specified it will use the latest query in buffer.
postgres=# \watch 10
Watch every 10s Thu May 2 13:15:14 2013
 
?column?
----------
1
(1 row)

For example, simply check if your server is alive using psql:
postgres=# select 1; \watch 1
?column?
----------
1
(1 row)
Watch every 1s Thu May 2 13:06:53 2013
 
?column?
----------
1
(1 row)
 
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

Oops, the connection has been closed…

Here is an other example: check every minute the latest query that ran on server.
postgres=# SELECT datname, query, usename FROM pg_stat_activity ORDER BY query_start DESC LIMIT 1; \watch 60
datname | query | usename
----------+-----------------------------------------------------------------------------------------+----------
postgres | SELECT datname, query, usename FROM pg_stat_activity ORDER BY query_start DESC LIMIT 1; | postgres
(1 row)
 
Watch every 60s Thu May 2 13:40:49 2013
 
datname | query | usename
----------+-----------------------------------------------------------------------------------------+----------
postgres | SELECT datname, query, usename FROM pg_stat_activity ORDER BY query_start DESC LIMIT 1; | postgres
(1 row)

A last example: kill periodically backends (here every 60s) that have not been activated for a given period of time (here 30s).
postgres=# SELECT pg_terminate_backend(pid) as status FROM pg_stat_activity
postgres=# WHERE now() - state_change > interval '30 s' AND
postgres=# pid != pg_backend_pid();
status
--------
(0 rows)
 
postgres=# \watch 60
Watch every 60s Thu May 2 13:51:04 2013
 
status
--------
(0 rows)

In short, a lot of things are doable with \watch, you can automatize for example actions easily with a psql client, like the refresh of a materialized view. At least I won’t need anymore to type 50 times the same query when developping an application using Postgres or creating a new feature.

Among one of the many new features implemented in 9.3, pg_dump now offers the possibility to perform parallel dumps. This feature has been introduced by the commit below.
commit 9e257a181cc1dc5e19eb5d770ce09cc98f470f5f
Author: Andrew Dunstan
Date: Sun Mar 24 11:27:20 2013 -0400
 
Add parallel pg_dump option.
 
New infrastructure is added which creates a set number of workers
(threads on Windows, forked processes on Unix). Jobs are then
handed out to these workers by the master process as needed.
pg_restore is adjusted to use this new infrastructure in place of the
old setup which created a new worker for each step on the fly. Parallel
dumps acquire a snapshot clone in order to stay consistent, if
available.
 
The parallel option is selected by the -j / --jobs command line
parameter of pg_dump.
 
Joachim Wieland, lightly editorialized by Andrew Dunstan

This is an extremely nice improvement of pg_dump as it allows accelerating the speed a dump is taken, particularly for machines having multiple cores as the load can be shared among separate threads.

Note that this option only works with the format called directoyy that can be specified with option -Fd or –format=directory, which outputs the database dump as a directory-format archive. A new option -j/–jobs can also be used to define the number of jobs that will run in parallel when performing the dump.

When using parallel pg_dump, it is important to remember that n+1 connections are opened to the server, n being the number of jobs defined, with an extra master connection to control the shared locks taken on the objects dumped. So be sure that max_connections is set up to a number high enough in accordance to the number of jobs that are planned.

Thanks to synchronized snapshots shared among the backends managed by the jobs, the dump is taken consistently ensuring that all the jobs share the same data view. However, as synchronized snapshots are only available since PostgreSQL 9.2, you need to be sure that no external sessions are doing any DML or DDL when performing a dump on servers whose version is lower than 9.2. It is also necessary to specify the option –no-synchronized-snapshots in this case.

Now, using a server having 16 cores, let’s check how this feature performs. For this test, the schema of the database dumped is extremely simple: 16 tables with a constant size of approximately 200MB each (5000000 rows with a single int4 column), for a database having a total size of 3.2GB. Tests are conducted with 1, 2, 4, 8 and 16 jobs, so in the case of 16 jobs one table would be dumped by a unique job running on a single connection. This is of course an unrealistic schema for a production database, but here the point is to give an idea of how this feature can speed up a dump in an ideal case. 5 successive runs are done for each case.

Each test case has been run with the following command:
time pg_dump -Fd -f $DUMP_DIRECTORY -j $NUM_JOBS $DATABASE_NAME

Jobs – Runs(s) 1 2 3 4 5 Avg
1 56.714 54.385 54.242 59.300 57.705 56.47
2 27.023 26.207 27.211 26.112 25.206 26.35
4 12.641 12.797 12.484 12.604 12.486 12.60
8 7.641 7.013 7.913 7.081 6.702 6.27
16 5.086 5.045 5.079 5.216 5.054 5.10

As expected, dump time is halved each time job number is doubled with this ideal database schema. However, due to some I/O disk bottleneck, the time gain is not that important with a high number of jobs. For example, in those series of tests, there is not much difference between 8 and 16 jobs, so be always aware of the I/O your dump disk can manage at most and choose carefully the number of jobs used for dumps based on that.

Continuing on the coverage of new JSON features added in Postgres 9.3, and after writing about JSON data generation and JSON operators, let’s now focus on some new functions that can be used for the parsing of JSON data.

The are many new functions introduced:

  • json_each, json_each_text
  • json_extract_path, json_extract_path_text
  • json_object_keys
  • json_populate_record, json_populate_recordset
  • json_array_length
  • json_array_elements

The following set of data is used in all the examples of this post,.
postgres=# CREATE TABLE aa (a int, b json);
CREATE TABLE
postgres=# INSERT INTO aa VALUES (1, '{"f1":1,"f2":true,"f3":"Hi I''m \"Daisy\""}');
INSERT 0 1
postgres=# INSERT INTO aa VALUES (2, '{"f1":2,"f2":false,"f3":"Hi I''m \"Dave\""}');
INSERT 0 1
postgres=# INSERT INTO aa VALUES (3, '{"f1":3,"f2":true,"f3":"Hi I''m \"Popo\""}');
INSERT 0 1
postgres=# INSERT INTO aa VALUES (4, '{"f1":{"f11":11,"f12":12},"f2":2}');
INSERT 0 1
postgres=# INSERT INTO aa VALUES (5, '{"f1":[1,"Robert \"M\""],"f2":[2,"Kevin \"K\"",false]}');
INSERT 0 1

So now let’s begin. The most valuable functions might be json_each and json_each_text which can be used to expand JSON data as key/value records.
postgres=# SELECT * FROM json_each((SELECT b FROM aa WHERE a = 1));
key | value
-----+--------------------
f1 | 1
f2 | true
f3 | "Hi I'm \"Daisy\""
(3 rows)

The difference between json_each and json_each_text is that the former returns values as legal JSON format and the latter returns it as text.
postgres=# SELECT * FROM json_each_text((SELECT b FROM aa WHERE a = 1));
key | value
-----+----------------
f1 | 1
f2 | true
f3 | Hi I'm "Daisy"
(3 rows)

This operation is effective only on the outermost field.
postgres=# SELECT * FROM json_each((SELECT b FROM aa WHERE a = 4)) WHERE key = 'f1';
key | value
-----+---------------------
f1 | {"f11":11,"f12":12}
(1 row)

And you can also apply this operation on some inner fields by selecting directly an inner JSON field or using some WITH mechanism.
SELECT * FROM json_each((SELECT b->'f1' FROM aa WHERE a = 4));
key | value
-----+-------
f11 | 11
f12 | 12
(2 rows)

json_extract_path and json_extract_path_text can be used to extract a field value based on some given keys, or a chain or keys, equivalent to what the operators “->” and “->>” can respectively do.
postgres=# SELECT json_extract_path(b, 'f1') AS f1a, b->'f1' AS f1b FROM aa WHERE a = 4;
f1a | f1b
---------------------+---------------------
{"f11":11,"f12":12} | {"f11":11,"f12":12}
(1 row)
postgres=# SELECT json_extract_path(b, 'f1', 'f12') AS f12a, b->'f1'->'f12' AS f12b FROM aa WHERE a = 4;
f12a | f12b
------+------
12 | 12
(1 row)

json_object_keys retrieves the set of keys of a given JSON object on the outermost object. As it returns the field names of all the tuples scanned, be sure to group the results or to select a limited number of tuples.
postgres=# SELECT json_object_keys(b) FROM aa GROUP BY 1 ORDER BY 1;
json_object_keys
------------------
f1
f2
f3
(3 rows)
postgres=# SELECT json_object_keys(b->'f1') FROM aa WHERE a = 4;
json_object_keys
------------------
f11
f12
(2 rows)

Next, json_populate_record can help in casting a JSON record into a given type.
postgres=# CREATE TYPE aat AS (f1 int, f2 bool, f3 text);
CREATE TYPE
postgres=# SELECT * FROM json_populate_record(null::aat, (SELECT b FROM aa WHERE a = 1)) AS popo;
f1 | f2 | f3
----+----+----------------
1 | t | Hi I'm "Daisy"
(1 row)

This operation can only be used on a single row.
postgres=# SELECT * FROM json_populate_record(null::aat, (SELECT b FROM aa WHERE a = 1 OR a = 2)) AS popo;
ERROR: more than one row returned by a subquery used as an expression

Similarly to json_populate_record, json_populate_recordset can be used on a set of records. It can become particularly powerful when combined with json_agg.
postgres=# SELECT * FROM json_populate_recordset(null::aat, (SELECT json_agg(b) FROM aa WHERE a < 4)) AS popo;
f1 | f2 | f3
----+----+----------------
1 | t | Hi I'm "Daisy"
2 | f | Hi I'm "Dave"
3 | t | Hi I'm "Popo"
(3 rows)

Note that this operation does not work on nested objects, aka when the JSON fields are not strictly the same for each row.
postgres=# SELECT * FROM json_populate_recordset(null::aat, (SELECT json_agg(b) FROM aa WHERE a = 1 OR a = 4), false) AS popo;
ERROR: cannot call json_populate_recordset on a nested object

Finally there are two functions focused on the manipulation and analysis of JSON arrays. The first function is called json_array_length. With this you can get the number of elements in a JSON array.
SELECT json_array_length(b->'f1') FROM aa WHERE a = 5;
json_array_length
-------------------
2
(1 row)
postgres=# SELECT json_array_length(b->'f2') FROM aa WHERE a = 5;
json_array_length
-------------------
3
(1 row)

If used on an object that is not an array, this function complains with a nice error message.
postgres=# SELECT json_array_length(b->'f1') FROM aa WHERE a = 1;
ERROR: cannot get array length of a scalar
postgres=# SELECT json_array_length(b->'f1') FROM aa WHERE a = 4;
ERROR: cannot get array length of a non-array

The second one is json_array_elements which expends a JSON array to a set of elements.
postgres=# SELECT json_array_elements(b->'f1') FROM aa WHERE a = 5;
json_array_elements
---------------------
1
"Robert \"M\""
(2 rows)
postgres=# SELECT json_array_elements(b->'f1') FROM aa WHERE a = 1;
ERROR: cannot call json_array_elements on a scalar
postgres=# SELECT json_array_elements(b->'f1') FROM aa WHERE a = 4;
ERROR: cannot call json_array_elements on a non-array

Combined with the new JSON features for data generation and operators, parsing functions complete the new set of tools implemented in Postgres 9.3 here to leverage the manipulation of JSON data directly on server side. The addition of such features continues the morphing of PostgreSQL from a database software to a database platform, JSON features making it stepping more in the field of NoSQL and document-oriented systems. So now, if you want to create an application which is JSON-oriented, simply use Postgres!

©2010-2013 Michael Paquier All content is ©Copyright of Otacoo.com 2010-2013. Privacy Policy - Terms of Use