IF EXISTS and IF NOT EXISTS are clauses allowing to return a notice message instead of an error if a DDL query running on a given object already exists or not depending on the DDL action done. If a given query tries to create an object when IF NOT EXISTS is specified, a notice message is returned to client if the object has already been created and nothing is done on server side. If the object is altered or dropped when IF EXISTS is used, a notice message is returned back to client if the object does not exist and nothing is done.

Here is what simply happens when a table that exists is created:
postgres=# CREATE TABLE IF NOT EXISTS aa (a int);
CREATE TABLE
postgres=# CREATE TABLE IF NOT EXISTS aa (a int);
NOTICE: relation "aa" already exists, skipping
CREATE TABLE

Similarly, when dropping this table based on its existence.
postgres=# DROP TABLE IF EXISTS aa;
DROP TABLE
postgres=# DROP TABLE IF EXISTS aa;
NOTICE: table "aa" does not exist, skipping
DROP TABLE

Prior to 9.3, PostgreSQL already proposed this feature with many objects: tables, index, functions, triggers, language, etc. Such SQL extensions are useful when running several times the same script several times and avoiding errors on environments already installed.

9.3 introduces some new flavors of IF [NOT] EXISTS completing a bit more the set of objects already supported.

  • CREATE SCHEMA [IF NOT EXISTS]
  • ALTER TYPE ADD VALUE [IF NOT EXISTS]
  • Extension of DROP TABLE IF NOT EXISTS such as it succeeds if the specified schema does not exists

Note also that the new materialized views are also supported with IF [NOT] EXISTS for CREATE, ALTER and DROP.

The extension of CREATE SCHEMA with IF NOT EXISTS is pretty simple. Similarly to the other objects, command succeeds if the schema already exists and a notice message about the existence of schema is sent back to client.
postgres=# CREATE SCHEMA foo;
ERROR: schema "foo" already exists
postgres=# CREATE SCHEMA IF NOT EXISTS foo;
NOTICE: schema "foo" already exists, skipping
CREATE SCHEMA

Note that subsequent schema elements cannot be used with this option.
postgres=# CREATE SCHEMA IF NOT EXISTS foo CREATE TABLE aa (a int);
ERROR: CREATE SCHEMA IF NOT EXISTS cannot include schema elements
LINE 1: CREATE SCHEMA IF NOT EXISTS foo CREATE TABLE aa (a int);

The second addition, ALTER TYPE ADD VALUE [IF NOT EXISTS] is useful in the case of enumeration types to condition the addition of new values.
postgres=# CREATE TYPE character_type AS ENUM ('warrior', 'priest', 'sorcerer');
CREATE TYPE
postgres=# ALTER TYPE character_type ADD VALUE IF NOT EXISTS 'magician';
ALTER TYPE
postgres=# ALTER TYPE character_type ADD VALUE IF NOT EXISTS 'magician';
NOTICE: enum label "magician" already exists, skipping
ALTER TYPE

The last improvement is also a nice thing to have. Here is what you could obtain prior to 9.3 when trying to use DROP TABLE IF EXISTS on a table using a schema that did not exist.
postgres=# DROP TABLE IF EXISTS foosch.foo;
ERROR: schema "foosch" does not exist

And here is what you get now:
postgres=# DROP TABLE IF EXISTS foosch.foo;
NOTICE: table "foo" does not exist, skipping
DROP TABLE

Those are definitely nice additions, especially the new extension of IF NOT EXISTS on schemas which was really missing in the existing set.

Among one of the many new features implemented in 9.3, pg_dump now offers the possibility to perform parallel dumps. This feature has been introduced by the commit below.
commit 9e257a181cc1dc5e19eb5d770ce09cc98f470f5f
Author: Andrew Dunstan
Date: Sun Mar 24 11:27:20 2013 -0400
 
Add parallel pg_dump option.
 
New infrastructure is added which creates a set number of workers
(threads on Windows, forked processes on Unix). Jobs are then
handed out to these workers by the master process as needed.
pg_restore is adjusted to use this new infrastructure in place of the
old setup which created a new worker for each step on the fly. Parallel
dumps acquire a snapshot clone in order to stay consistent, if
available.
 
The parallel option is selected by the -j / --jobs command line
parameter of pg_dump.
 
Joachim Wieland, lightly editorialized by Andrew Dunstan

This is an extremely nice improvement of pg_dump as it allows accelerating the speed a dump is taken, particularly for machines having multiple cores as the load can be shared among separate threads.

Note that this option only works with the format called directoyy that can be specified with option -Fd or –format=directory, which outputs the database dump as a directory-format archive. A new option -j/–jobs can also be used to define the number of jobs that will run in parallel when performing the dump.

When using parallel pg_dump, it is important to remember that n+1 connections are opened to the server, n being the number of jobs defined, with an extra master connection to control the shared locks taken on the objects dumped. So be sure that max_connections is set up to a number high enough in accordance to the number of jobs that are planned.

Thanks to synchronized snapshots shared among the backends managed by the jobs, the dump is taken consistently ensuring that all the jobs share the same data view. However, as synchronized snapshots are only available since PostgreSQL 9.2, you need to be sure that no external sessions are doing any DML or DDL when performing a dump on servers whose version is lower than 9.2. It is also necessary to specify the option –no-synchronized-snapshots in this case.

Now, using a server having 16 cores, let’s check how this feature performs. For this test, the schema of the database dumped is extremely simple: 16 tables with a constant size of approximately 200MB each (5000000 rows with a single int4 column), for a database having a total size of 3.2GB. Tests are conducted with 1, 2, 4, 8 and 16 jobs, so in the case of 16 jobs one table would be dumped by a unique job running on a single connection. This is of course an unrealistic schema for a production database, but here the point is to give an idea of how this feature can speed up a dump in an ideal case. 5 successive runs are done for each case.

Each test case has been run with the following command:
time pg_dump -Fd -f $DUMP_DIRECTORY -j $NUM_JOBS $DATABASE_NAME

Jobs – Runs(s) 1 2 3 4 5 Avg
1 56.714 54.385 54.242 59.300 57.705 56.47
2 27.023 26.207 27.211 26.112 25.206 26.35
4 12.641 12.797 12.484 12.604 12.486 12.60
8 7.641 7.013 7.913 7.081 6.702 6.27
16 5.086 5.045 5.079 5.216 5.054 5.10

As expected, dump time is halved each time job number is doubled with this ideal database schema. However, due to some I/O disk bottleneck, the time gain is not that important with a high number of jobs. For example, in those series of tests, there is not much difference between 8 and 16 jobs, so be always aware of the I/O your dump disk can manage at most and choose carefully the number of jobs used for dumps based on that.

PostgreSQL 9.3 comes with a pretty cool feature called materialized views. It has been created by Kevin Grittner and committed by the same person not so long ago.
commit 3bf3ab8c563699138be02f9dc305b7b77a724307
Author: Kevin Grittner
Date: Sun Mar 3 18:23:31 2013 -0600
 
Add a materialized view relations.
 
A materialized view has a rule just like a view and a heap and
other physical properties like a table. The rule is only used to
populate the table, references in queries refer to the
materialized data.
 
This is a minimal implementation, but should still be useful in
many cases. Currently data is only populated "on demand" by the
CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements.
It is expected that future releases will add incremental updates
with various timings, and that a more refined concept of defining
what is "fresh" data will be developed. At some point it may even
be possible to have queries use a materialized in place of
references to underlying tables, but that requires the other
above-mentioned features to be working first.
 
Much of the documentation work by Robert Haas.
Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja
Security review by KaiGai Kohei, with a decision on how best to
implement sepgsql still pending.

What is a materialized view? In short, it is the mutant of a table and a view. A view is a projection of data in a given relation and has no storage. A table is well… A table…
Between that, a materialized view is a projection of table data and has its own storage. It uses a query to fetch its data like a view, but this data is stored like a common table. The materialized view can also be refreshed with updated data by running once again the query it uses for its projection, or have its data truncated. In the last case it is left in an non-scannable state. Also, as a materialized view has its proper storage, it can use tablespaces and its own indexes. Note that it can also be an unlogged relation.

This feature introduces four new SQL commands:

CREATE, ALTER and DROP are common DDL commands here to manipulate the definition of materialized views. What is important here is the new command REFRESH (its name has been a long debate inside the community). This command can be used to update the materialized view with fresh data by running once again the scanning query. Note that REFRESH can also be used to *truncate* (not really though) the data of the relation by running it with the clause WITH NO DATA.

Materialized views have their own advantages in many scenarios: faster access to data than needs to be brought from a remote server (read a file on postgres server through file_fdw, etc.), using data that needs to be refreshed periodically (cache system), projecting data with embedded ORDER BY from a large table, running an expensive join in background periodically, etc.

I can also imagine some nice combinations with data refresh and custom background workers. Who said that automatic data refresh on a materialized view was not possible?

Now let’s have a look at how it works.
postgres=# CREATE TABLE aa AS SELECT generate_series(1,1000000) AS a;
SELECT 1000000
postgres=# CREATE VIEW aav AS SELECT * FROM aa WHERE a <= 500000;
CREATE VIEW
postgres=# CREATE MATERIALIZED VIEW aam AS SELECT * FROM aa WHERE a <= 500000;
SELECT 500000

Here is the size that each relation uses.
postgres=# SELECT pg_relation_size('aa') AS tab_size, pg_relation_size('aav') AS view_size, pg_relation_size('aam') AS matview_size;
tab_size | view_size | matview_size
----------+-----------+--------------
36249600 | 0 | 18137088
(1 row)

A materialized view uses storage (here 18M), as much as it needs to store the data it fetched from its parent table (with size of 36M) when running the view query.

The refresh of a materialized view can be controlled really easily.
postgres=# DELETE FROM aa WHERE a <= 500000;
DELETE 500000
postgres=# SELECT count(*) FROM aam;
count
--------
500000
(1 row)
postgres=# REFRESH MATERIALIZED VIEW aam;
REFRESH MATERIALIZED VIEW
postgres=# SELECT count(*) FROM aam;
count
-------
0
(1 row)

The new status of table aa is effective on its materialized view aam only once REFRESH has been kicked. Note that at the time of this post, REFRESH uses an exclusive lock (ugh...).

A materialized view can also be set as not scannable thanks to the clause WITH NO DATA of REFRESH.
postgres=# REFRESH MATERIALIZED VIEW aam WITH NO DATA;
REFRESH MATERIALIZED VIEW
postgres=# SELECT count(*) FROM aam;
ERROR: materialized view "aam" has not been populated
HINT: Use the REFRESH MATERIALIZED VIEW command.

There is a new catalog table to help you find the current state of materialized views called pg_matviews.
postgres=# SELECT matviewname, isscannable FROM pg_matviews;
matviewname | isscannable
-------------+-------------
aam | f
(1 row)

It is also not possible to run DML queries on it. This makes sense as the data this view has might not reflect the current state of its parent relation(s). On the contrary, a simple view runs its underlying query each time it is needed, so a parent table could be modified through it (per se updatable views).
postgres=# INSERT INTO aam VALUES (1);
ERROR: cannot change materialized view "aam"
postgres=# UPDATE aam SET a = 5;
ERROR: cannot change materialized view "aam"
postgres=# DELETE FROM aam;
ERROR: cannot change materialized view "aam"

Now, a couple of words about performance improvement and degradation you can have with materialized views as you can manipulate indexes on those relations. For example, it is easily possible to improve queries on the materialized views without caring about the schema of its parent relations.
postgres=# EXPLAIN ANALYZE SELECT * FROM aam WHERE a = 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on aam (cost=0.00..8464.00 rows=1 width=4) (actual time=0.060..155.934 rows=1 loops=1)
Filter: (a = 1)
Rows Removed by Filter: 499999
Total runtime: 156.047 ms
(4 rows)
postgres=# CREATE INDEX aam_ind ON aam (a);
CREATE INDEX
postgres=# EXPLAIN ANALYZE SELECT * FROM aam WHERE a = 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Index Only Scan using aam_ind on aam (cost=0.42..8.44 rows=1 width=4) (actual time=2.096..2.101 rows=1 loops=1)
Index Cond: (a = 1)
Heap Fetches: 1
Total runtime: 2.196 ms
(4 rows)

Take care also that indexes and constraint (materialized views can have constraints!) of the parent relation are not copied with the materialized view. For example, a fast query scanning some table's primary key might finish with a deadly sequential scan if it is run on an underlying materialized view based on this table.
postgres=# INSERT INTO bb VALUES (generate_series(1,100000));
INSERT 0 100000
postgres=# EXPLAIN ANALYZE SELECT * FROM bb WHERE a = 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Only Scan using bb_pkey on bb (cost=0.29..8.31 rows=1 width=4) (actual time=0.078..0.080 rows=1 loops=1)
Index Cond: (a = 1)
Heap Fetches: 1
Total runtime: 0.159 ms
(4 rows)
postgres=# CREATE MATERIALIZED VIEW bbm AS SELECT * FROM bb;
SELECT 100000
postgres=# EXPLAIN ANALYZE SELECT * FROM bbm WHERE a = 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Seq Scan on bbm (cost=0.00..1776.00 rows=533 width=4) (actual time=0.144..41.873 rows=1 loops=1)
Filter: (a = 1)
Rows Removed by Filter: 99999
Total runtime: 41.935 ms
(4 rows)

Such designs are of course not recommended on a production system, only be aware that bad designs will badly impact your application performance (that's always the case btw).

It is really a nice thing to have particularly for caching applications! So enjoy!

Up to Postgres 9.2, the only foreign data wrapper present in core was file_fdw, allowing you to query files as remote tables. This has been corrected with the addition of a second foreign data wrapper called postgres_fdw. This one simply allows to query foreign Postgres servers and fetch results directly on your local server. It has been introduced by this commit.
commit d0d75c402217421b691050857eb3d7af82d0c770
Author: Tom Lane
Date: Thu Feb 21 05:26:23 2013 -0500
 
Add postgres_fdw contrib module.
 
There's still a lot of room for improvement, but it basically works,
and we need this to be present before we can do anything much with the
writable-foreign-tables patch. So let's commit it and get on with testing.
 
Shigeru Hanada, reviewed by KaiGai Kohei and Tom Lane

Documentation can be found here for the time being.

In order to install it from source, do the following commands from the Postgres root folder.
cd contrib/postgres_fdw
make install

Then connect to your existing Postgres server and finish the installation with CREATE EXTENSION.
postgres=# CREATE EXTENSION postgres_fdw;
CREATE EXTENSION
postgres=# \dx postgres_fdw
List of installed extensions
Name | Version | Schema | Description
--------------+---------+--------+----------------------------------------------------
postgres_fdw | 1.0 | public | foreign-data wrapper for remote PostgreSQL servers
(1 row)

Now let’s test it with the case of a simple cluster with one slave running with port 5532 on the same server as its master. Here is the configuration.
$ psql -p 5532 -c 'select pg_is_in_recovery()'
pg_is_in_recovery
-------------------
t
(1 row)

When using a foreign data wrapper, you need to create first a server.
postgres=# CREATE SERVER postgres_server
postgres=# FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', port '5532', dbname 'postgres');
CREATE SERVER
postgres=# \des
List of foreign servers
Name | Owner | Foreign-data wrapper
-----------------+--------+----------------------
postgres_server | xxxxxx | postgres_fdw
(1 row)

Then let’s move on with a user mapping and a table to query.
postgres=# CREATE USER MAPPING FOR PUBLIC SERVER postgres_server OPTIONS (password '');
CREATE USER MAPPING
postgres=# CREATE TABLE aa AS SELECT 1 AS a, generate_series(1,3) AS b;
CREATE TABLE

As the foreign server used is the slave of our master, there is no need to create this table on the second node.

What remains is the creation of the foreign table.
postgres=# CREATE FOREIGN TABLE aa_foreign (a int, b int)
postgres=# SERVER postgres_server OPTIONS (table_name 'aa');
CREATE FOREIGN TABLE

Then if you query the foreign table.
postgres=# select * from aa_foreign;
a | b
---+---
1 | 1
1 | 2
1 | 3
(3 rows)

Yeah, done!

This feature still needs more testing, so go ahead and test it by yourself you might be surprised with the things you can do with it.

pg_reorg is a postgresql module developped and maintained by NTT that allows to redistribute a table without taking locks on it.
The code is hosted by pg_foundry here.
However, pgfoundry uses CVS :( , so I am also maintaining a fork in github in sync with pgfoundry here.

What pg_reorg can do for you is to reorganize a whole table in the same fashion way as a CLUSTER or a VACUUM FULL, while allowing write operations on the table being reorganized at the same time. No locks are needed.

Once you have downloaded the code, you just need to install it on your server.
cd $CODE_FOLDER
make install

Then install the EXTENSION module (for version upper than 9.1) after connecting to the postgres server.
CREATE EXTENSION pg_reorg;

Then, it is possible to perform several types of operations.
CLUSTER reorganization on the table $TABLE.
pg_reorg --dbname $DATABASE -t $TABLE
VACUUM FULL reorganization on the table $TABLE.
pg_reorg --dbname $DATABASE -t $TABLE -n
Reorganization of an entire database.
pg_reorg --dbname $DATABASE

The main limitation of this utility is that table being redistributed needs to have a primary key or a non-null unique key.

Then, a little bit more about the technique it uses to reorganize the table.
Basically, a temporary copy of the table to be redistributed is created using a CREATE TABLE AS query. The CTAS query definition is changed depending on the distribution user wants. For example, if user wants a redistribution using a different column (option -o), the CTAS is completed with an ORDER BY clause on the wanted column. The indexes of the new table depend on what the user wants.

Then the following operations are done.

  • creation of triggers to register all the DMLs that occur on the former table to an intermediate log table
  • creation of indexes on the temporary table based on what the user wants (new column index, VACUUM FULL…)
  • Apply the logs registered during the index creation and wait for old transactions to finish
  • Swap the names between the freshly-created table and old table
  • Drop the useless objects: the old table, the old triggers and remaining objects

This functionality is particularly handy when you wish to reorganize a huge table. Performing a VACUUM/CLUSTER on it might take time, and your application might need this table to be accessible in write for a maximum amount of time. So pretty useful, uh?

©2010-2013 Michael Paquier All content is ©Copyright of Otacoo.com 2010-2013. Privacy Policy - Terms of Use