Postgres 9.3 is going to be a great release for JSON data type. After having a look at the new functions for data generation, let’s look at the new JSON features that the commit below brings.
commit a570c98d7fa0841f17bbf51d62d02d9e493c7fcc
Author: Andrew Dunstan
Date: Fri Mar 29 14:12:13 2013 -0400
 
Add new JSON processing functions and parser API.
 
The JSON parser is converted into a recursive descent parser, and
exposed for use by other modules such as extensions. The API provides
hooks for all the significant parser event such as the beginning and end
of objects and arrays, and providing functions to handle these hooks
allows for fairly simple construction of a wide variety of JSON
processing functions. A set of new basic processing functions and
operators is also added, which use this API, including operations to
extract array elements, object fields, get the length of arrays and the
set of keys of a field, deconstruct an object into a set of key/value
pairs, and create records from JSON objects and arrays of objects.
 
Catalog version bumped.
 
Andrew Dunstan, with some documentation assistance from Merlin Moncure.

Based on stored JSON data, this commit introduces a new layer of APIs, operators and functions that can be used to manipulate and process JSON data. There are 4 new operators and 8 new functions (hopefully I counted right), so as there is a lot of content this post is only focused on the new operators.

The following set of data is used for all the examples presented in this post with some subsets of data, arrays and plain variables.
postgres=# CREATE TABLE aa (a int, b json);
CREATE TABLE
postgres=# INSERT INTO aa VALUES (1, '{"f1":1,"f2":true,"f3":"Hi I''m \"Daisy\""}');
INSERT 0 1
postgres=# INSERT INTO aa VALUES (2, '{"f1":{"f11":11,"f12":12},"f2":2}');
INSERT 0 1
postgres=# INSERT INTO aa VALUES (3, '{"f1":[1,"Robert \"M\"",true],"f2":[2,"Kevin \"K\"",false]}');
INSERT 0 1

The first operator is “->”, that can be used to fetch field values directly from JSON data. It can be used with a text value identifying the key of field.
postgres=# SELECT b->'f1' AS f1, b->'f3' AS f3 FROM aa WHERE a = 1;
f1 | f3
----+--------------------
1 | "Hi I'm \"Daisy\""
(1 row)

Multiple keys can also be used in chain to retrieve data or another JSON subset of data.
postgres=# SELECT b->'f1'->'f12' AS f12 FROM aa WHERE a = 2;
f12
-----
12
(1 row)
postgres=# SELECT b->'f1' AS f1 FROM aa WHERE a = 2;
f1
---------------------
{"f11":11,"f12":12}
(1 row)

In a more interesting way, when an integer is used as key, you can fetch data directly in a stored array, like that for example:
postgres=# SELECT b->'f1'->0 as f1_0 FROM aa WHERE a = 3;
f1_0
------
1
(1 row)

The second operator added is “->>”. Contrary to “->” that returns a JSON legal text, “->>” returns plain text.
postgres=# SELECT b->>'f3' AS f1 FROM aa WHERE a = 1;
f1
----------------
Hi I'm "Daisy"
(1 row)
postgres=# SELECT b->'f3' AS f1 FROM aa WHERE a = 1;
f1
--------------------
"Hi I'm \"Daisy\""
(1 row)

Similarly to “->”, it is possible to use either an integer or a text as key. For an integer, the key represents the position of element in an array.
postgres=# SELECT b->'f1'->>1 as f1_0 FROM aa WHERE a = 3;
f1_0
------------
Robert "M"
(1 row)
postgres=# SELECT b->'f1'->1 as f1_0 FROM aa WHERE a = 3;
f1_0
----------------
"Robert \"M\""
(1 row)

Of course, you cannot fetch data from an array using a field name.
postgres=# SELECT b->'f1'->>'1' as f1_0 FROM aa WHERE a = 3;
ERROR: cannot extract field from a non-object

As well as you cannot fetch a field using an element number.
postgres=# SELECT b->1 as f1_0 FROM aa WHERE a = 3;
ERROR: cannot extract array element from a non-array

The last 2 operators added are “#>” and “#>>”. With those ones, it is possible to fetch directly an element in an array without using a combo of the type “column->’$FIELD’->$INT_INDEX. This can make your queries far more readable when manipulating arrays in JSON.
postgres=# SELECT b#>'{f1,1}' as f1_0 FROM aa WHERE a = 3;
f1_0
----------------
"Robert \"M\""
(1 row)
postgres=# SELECT b#>>'{f1,1}' as f1_0 FROM aa WHERE a = 3;
f1_0
------------
Robert "M"
(1 row)

“#>” fetches text data in a legal JSON format, and “#>>” fetches data as plain text.

In short, those operators are good meat for brain, and nice additions for many applications.

Postgres 9.2 has introduced JSON as a server data type. At this point, the data was simply stored on server side with integrated wrappers checking that data had a correct JSON format. It was a good first step in order to store directly JSON data on server side but core features in 9.2 have its limitations in terms of JSON data manipulation and transformation.

Two new sets of JSON features have been added to PostgreSQL 9.3 planned to be released this year: functions related to data generation and a new set of APIs for data processing. The one this post deals with the ability to generate JSON data based on existing data types. The second set of features (operators and new processing functions) will be explained in a future post.

So… Functions for JSON data generation have been added by this commit.
commit 38fb4d978c5bfc377ef979e2595e3472744a3b05
Author: Andrew Dunstan
Date: Sun Mar 10 17:35:36 2013 -0400
 
JSON generation improvements.
 
This adds the following:
 
 json_agg(anyrecord) -> json
 to_json(any) -> json
 hstore_to_json(hstore) -> json (also used as a cast)
 hstore_to_json_loose(hstore) -> json
 
The last provides heuristic treatment of numbers and booleans.
 
Also, in json generation, if any non-builtin type has a cast to json,
that function is used instead of the type's output function.
 
Andrew Dunstan, reviewed by Steve Singer.
Catalog version bumped.

The first function called to_json permits to return a given value as valid JSON.
postgres=# create table aa (a bool, b text);
CREATE TABLE
postgres=# INSERT INTO aa VALUES (true, 'Hello "Darling"');
INSERT 0 1
postgres=# INSERT INTO aa VALUES (false, NULL);
INSERT 0 1
postgres=# SELECT to_json(a) AS bool_json, to_json(b) AS txt_json FROM aa;
bool_json | txt_json
-----------+---------------------
true | "Hello \"Darling\""
false |
(2 rows)

Boolean values are returned as plain true/false, texts are quoted as valid JSON fields.

json_agg is a function that can transform a record into a JSON array.
postgres=# SELECT json_agg(aa) FROM aa;
json_agg
---------------------------------------
[{"a":true,"b":"Hello \"Darling\""}, +
{"a":false,"b":null}]
(1 row)

The other tools for data generation are included in the contrib module hstore. Do you remember? This module can be used to store key/value pairs in a single table column. It is now possible to cast hstore data as json with some native casting or with function hstore_to_json.
postgres=# CREATE TABLE aa (id int, txt hstore);
CREATE TABLE
postgres=# INSERT INTO aa VALUES (1, 'f1=>t, f2=>2, f3=>"Hi", f4=>NULL');
INSERT 0 1
postgres=# SELECT id, txt::json, hstore_to_json(txt) FROM aa;
id | txt | hstore_to_json
----+------------------------------------------------+------------------------------------------------
1 | {"f1": "t", "f2": "2", "f3": "Hi", "f4": null} | {"f1": "t", "f2": "2", "f3": "Hi", "f4": null}
(1 row)

Note that in this case boolean and numerical values are treated as plain text when casted.

hstore_to_json_loose can enforce the conversion of boolean and numerical values to a better format, like that:
postgres=# SELECT id, hstore_to_json_loose(txt) FROM aa;
id | hstore_to_json_loose
----+-----------------------------------------------
1 | {"f1": true, "f2": 2, "f3": "Hi", "f4": null}
(1 row)

And now boolean and integer values inserted previously have a better look, no?

Having such tools natively in Postgres core server is really a nice addition for data manipulation and transformation of values into legal JSON.
However, you need to know that this set of tools is only the top of the iceberg for the JSON features added in 9.3… There are also new operators and APIs, which will be covered in more details with examples in one of my next posts. So… TBC.

Similarly to a normal PostgreSQL child process, a custom background worker should be running using a loop that can be interrupted with signals to update a given status or simply exit the process. Two types of signals are handled by custom background workers in the PostgreSQL architecture: SIGHUP and SIGTERM.

When such signals are received by a bgworker, you should be aware that processing similar to what is done for a normal backend process might be needed. For example, if your custom bgworker uses some GUC parameters, the process needs to take proper action to reload configuration parameters. Even if there are already good examples in the PostgreSQL source code for bgworkers, I noticed that there are no examples focusing only on certain fundamentals of bgworker. As someone who likes simple things, I think that this is essential to get the essence of what this feature can do piece by piece.

So here is in this post an example of bgworker that I wrote for beginners in order to understand only the fundamentals of signal handling. First you need to know that it is necessary to store a static variable to store the status of representing if signal has been activated or not in a way similar to that:
static bool got_sigterm = false;
static bool got_sighup = false;

You also need dedicated functions to set those flags to true if a signal is received by the worker.
static void
hello_sigterm(SIGNAL_ARGS)
{
 got_sigterm = true;
}
 
static void
hello_sighup(SIGNAL_ARGS)
{
 got_sighup = true;
}

A dedicated function that is used as a main loop for processing is essential as well.
static void
hello_main(void *main_arg)
{
 /* We're now ready to receive signals */
 BackgroundWorkerUnblockSignals();
 while (true)
 {
  /* Process signals */
  if (got_sighup)
  {
   got_sighup = false;
   ereport(LOG, (errmsg("hello signal: processed SIGHUP")));
  }
 
  if (got_sigterm)
  {
   /* Simply exit */
   ereport(LOG, (errmsg("hello signal: processed SIGTERM")));
   proc_exit(0);
  }
 }
 proc_exit(0)
}

Note the call to BackgroundWorkerUnblockSignals. This is extremely important in order to allow the reception of signals by the background worker.

Once you have this basic infrastructure in place, you need to register this worker process correctly with something like that:
void
_PG_init(void)
{
 BackgroundWorker worker;
 worker.bgw_flags = 0;
 worker.bgw_start_time = BgWorkerStart_PostmasterStart;
 worker.bgw_main = hello_main;
 worker.bgw_sigterm = hello_sigterm;
 worker.bgw_sighup = hello_sighup;
 worker.bgw_name = "hello_signal";
 /* Wait 10 seconds for restart before crash */
 worker.bgw_restart_time = 10;
 worker.bgw_main_arg = NULL;
 RegisterBackgroundWorker(&worker);
}

Also don’t forget that you need a header similar to that to have this code working properly.
/* Some general headers for custom bgworker facility */
#include "postgres.h"
#include "fmgr.h"
#include "postmaster/bgworker.h"
#include "storage/ipc.h"
 
/* Allow load of this module in shared libs */
PG_MODULE_MAGIC;
 
/* Entry point of library loading */
void _PG_init(void);

Also, be sure that when you create a custom background worker, signal handling is similar to what is done for normal backend process. For example, the configuration file reload should be processed if SIGHUP is received. You can do that properly by calling ProcessConfigFile in a manner similar to that in the main loop.
if (got_sighup)
{
 /* Process config file */
 ProcessConfigFile(PGC_SIGHUP);
 got_sighup = false;
 ereport(LOG, (errmsg("hello signal: processed SIGHUP")));
}

In order to bring more fluidity to you custom worker and not have it use all the CPU of your server by running continuously, don’t forget to define a latch to control some sleep period of your worker. It can be defined with that:
/* The latch used for this worker to manage sleep correctly */
static Latch signalLatch;

Then when entering in the main loop process, initialize the latch with that:
InitializeLatchSupport();
InitLatch(&signalLatch);

Finally you need to set up your main loop to use the latch
while (true)
{
 int rc;
 
 /* Wait 1s */
 rc = WaitLatch(&signalLatch,
   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
   1000L);
 ResetLatch(&signalLatch);
 
 /* Emergency bailout if postmaster has died */
 if (rc & WL_POSTMASTER_DEATH)
  proc_exit(1);
 
 [... code for signal handling ...]
}

You can also set up the latch such as the sleep will stop immediately if a signal is received. Simply add the following call in hello_sighup and hello_sigterm to do that.
SetLatch(&signalLatch);

This code can be found on Github as repository pg_workers. I created it to group all the bgworker examples I wrote using the facility of PostgreSQL 9.3 and above. You can find the example presented in this post in the folder hello_signal.

PostgreSQL extensibility is awesome. With things like extensions or custom worker backgrounds, there are many ways for a PostgreSQL developer to create modules without having to touch a single line of the core code at all. Among those tools, PostgreSQL contains a set of hooks that can be used to plug customized code at certain points of the server processing. Hooks are not documented at all, so if you want to know more about them you need either to have a look directly at the PostgreSQL code or to read the slides of the presentation about hooks given by Guillaume Lelarge at PGcon 2012.
Personally I recommend the latter, Guillaume’s presentation being really good.

You can do many things with hooks, like creating a custom planner, outputting a custom EXPLAIN, running some personalized versions of utilities or control query processing at execution level. Hooks can be loaded as shared libraries using shared_preload_libraries or with a simple LOAD command for current session. In this post, I am going to show a restriction on DROP DATABASE using the hook for utility processing. In my case a given database “foodb” can only be dropped by a given user “foo”, our godly super-superuser.

When implementing a hook, here is how it should look for the basics.
#include "postgres.h"
#include "miscadmin.h"
#include "tcop/utility.h"
 
PG_MODULE_MAGIC;
 
void _PG_init(void);
void _PG_fini(void);
 
static char *undroppabledb = "foodb";
static char *supersuperuser = "foo";
static ProcessUtility_hook_type prev_utility_hook = NULL;
 
static void dbrestrict_utility(Node *parsetree,
  const char *queryString,
  ParamListInfo params,
  DestReceiver *dest,
  char *completionTag,
  ProcessUtilityContext context);

_PG_init and _PG_fini are respectively executed when the library is loaded and unloaded. PG_MODULE_MAGIC is necessary to define a PostgreSQL module in the case where it is loaded by server. dbrestrict_utility will be the function used instead of standard_ProcessUtility in utility.c.

Here is what you need to do with _PG_init and _PG_fini to install and uninstall correctly the hook.
void
_PG_init(void)
{
 prev_utility_hook = ProcessUtility_hook;
 ProcessUtility_hook = dbrestrict_utility;
}
void
_PG_fini(void)
{
 ProcessUtility_hook = prev_utility_hook;
}

The previous hook pointer is saved in a static variable to avoid any conflicts once the library is unloaded.

Then here is dbrestrict_utility, which performs the block on DROP DATABASE.
static
void dbrestrict_utility(Node *parsetree,
  const char *queryString,
  ParamListInfo params,
  DestReceiver *dest,
  char *completionTag,
  ProcessUtilityContext context)
{
 /* Do our custom process on drop database */
 switch (nodeTag(parsetree))
 {
  case T_DropdbStmt:
  {
   DropdbStmt *stmt = (DropdbStmt *) parsetree;
   char *username = GetUserNameFromId(GetUserId());
 
   /*
    * Check that only the authorized superuser foo can
    * drop the database undroppable_foodb.
    */
   if (strcmp(stmt->dbname, undroppabledb) == 0 &&
     strcmp(username, supersuperuser) != 0)
    ereport(ERROR,
     (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
      errmsg("Only super-superuser \"%s\" can drop database \"%s\"",
      supersuperuser, undroppabledb)));
   break;
  }
  default:
   break;
 }
 
 /* Fallback to normal process */
 standard_ProcessUtility(parsetree, queryString, params, dest,
  completionTag, context);
}

An important thing you should do as much as possible: always provide a safe exit by calling the function hook replaces at the end or the beginning of the new function to avoid weird behaviors. In the case of my example not calling standard_ProcessUtility would have resulted in blocking all the utilities… Note that this is of course not mandatory, just be sure about what you do as a hook not correctly coded can break easily a server.

Finally define a Makefile like this one and install the library (the file containing source code is called dbrestrict.c).
MODULES = dbrestrict
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

OK, now let’s test this feature with a couple of superusers.
postgres=# CREATE ROLE foo SUPERUSER LOGIN;
CREATE ROLE
postgres=# CREATE ROLE foo2 SUPERUSER LOGIN;
CREATE ROLE
postgres=# \c postgres foo2
You are now connected to database "postgres" as user "foo2".
postgres=# CREATE DATABASE foodb; -- before loading restriction
CREATE DATABASE
postgres=# DROP DATABASE foodb;
DROP DATABASE
postgres=# LOAD 'dbrestrict.so';
LOAD
postgres=# CREATE DATABASE foodb; -- after loading restriction
CREATE DATABASE
postgres=# LOAD 'dbrestrict.so';
LOAD
postgres=# DROP DATABASE foodb;
ERROR: Only super-superuser "foo" can drop database "foodb"

Note that superuser “foo2″ is not able to drop the database “foodb” once restriction has been loaded.

However user “foo” can drop it freely.
postgres=# \c postgres foo
You are now connected to database "postgres" as user "foo".
postgres=# LOAD 'dbrestrict.so';
LOAD
postgres=# DROP DATABASE foodb;
DROP DATABASE

As LOAD command is session-based, I had to reload the restriction library each time a reconnection to server was done, but you can make this change permanent by setting shared_preload_libraries appropriately in postgresql.conf.

Feel free to play with the code, it is attached to this post as dbrestrict.tar.gz.
See ya~

A new set of APIs for foreign data wrappers has been added to allow writable operations on foreign sources. This feature has been committed by Tom Lane a couple of days ago.
commit 21734d2fb896e0ecdddd3251caa72a3576e2d415
Author: Tom Lane
Date: Sun Mar 10 14:14:53 2013 -0400
 
Support writable foreign tables.
 
This patch adds the core-system infrastructure needed to support updates
on foreign tables, and extends contrib/postgres_fdw to allow updates
against remote Postgres servers. There's still a great deal of room for
improvement in optimization of remote updates, but at least there's basic
functionality there now.
 
KaiGai Kohei, reviewed by Alexander Korotkov and Laurenz Albe, and rather
heavily revised by Tom Lane.

Based on the documentation, the implementation is still very basic as nothing is done with clause shippability. Just to give some notions about that: roughly a clause in a SELECT query (LIMIT, OFFSET, GROUP BY, HAVING, ORDER BY, etc.) is shippable if this clause can be entirely evaluated on remote server, making less processing happening on local server, and reducing the tuple selectivity. A direct consequence of clause shippability limitation is that UPDATE and DELETE queries can take quite a long time if they are run on many rows because query is run in two steps:

  • Scan remote table and fetch back to local server the tuples to be manipulated
  • Process UPDATE or DELETE based on the tuples fetched

INSERT does not need such scan as in this case new data is simply sent to the remote table, the tuple values being computed before sending the query (even for immutable functions). Not really performant but it is the safest approach. Postgres-XC has similar and more advanced features for foreign DDL planning and execution in its core (some of them implemented by me), have a look for example at this article I wrote a while ago.

It is possible to test writable foreign tables with postgres_fdw as it has been extended to support this new feature. So let’s give it a try with two postgres servers using ports 5432 and 5433. Server with port 5432 has postgres_fdw installed and will interact with the remote server running under port 5433. In order to get the basics of postgres_fdw, you can refer to this article written a couple of weeks ago.

Now, it is time to test the feature. First let’s create a table on remote server.
$ psql -p 5433 -c "CREATE TABLE aa_remote (a int, b int)" postgres
CREATE TABLE

Then it is necessary to create a foreign table on the local server.
postgres=# CREATE SERVER postgres_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', port '5433', dbname 'postgres');
CREATE SERVER
postgres=# CREATE USER MAPPING FOR PUBLIC SERVER postgres_server OPTIONS (password '');
CREATE USER MAPPING
postgres=# CREATE FOREIGN TABLE aa_foreign (a int, b int) SERVER postgres_server OPTIONS (table_name 'aa_remote');
CREATE FOREIGN TABLE

Then let’s test the new feature by performing some DML operations on the foreign table from local server.
postgres=# INSERT into aa_foreign values (1,2);
INSERT 0 1
postgres=# select * from aa_foreign;
a | b
---+---
1 | 2
(1 row)
postgres=# update aa_foreign set b = 3;
UPDATE 1
postgres=# select * from aa_foreign;
a | b
---+---
1 | 3
(1 row)

Everything is going well on local side, and on remote side what happened?
$ psql -p 5433 -c 'SELECT * FROM aa_remote' --dbname postgres
a | b
---+---
1 | 3
(1 row)

So the data of the remote table has been correctly changed from local server.

Just before the tests, I explained that a scan is done for UPDATE and DELETE before actually running the DML, you can get more details about that with EXPLAIN.
postgres=# explain verbose update aa_foreign set b = 3;
QUERY PLAN
-----------------------------------------------------------------------------------
Update on public.aa_foreign (cost=100.00..182.27 rows=2409 width=10)
 Remote SQL: UPDATE public.aa_remote SET b = $2 WHERE ctid = $1
 -> Foreign Scan on public.aa_foreign (cost=100.00..182.27 rows=2409 width=10)
  Output: a, 3, ctid
  Remote SQL: SELECT a, NULL, ctid FROM public.aa_remote FOR UPDATE
(5 rows)

In the case of postgres_fdw, selectivity of tuple is done with ctid of tuple, which ensures tuple uniqueness. Note that if you implement your own foreign data wrapper, you might need to use columns having primary keys for selectivity of tuples.

There are also a couple of things to be aware of when using this feature.

  • There are risk of data incompatibility for data formatted with GUC parameters. This has been mentionned in the community but try for example to manipulate servers with different settings of datesyle…
  • Transactions are open on remote server using repeatable read.
  • UPDATE and DELETE can be costly if scan is done with a good-old-fashioned sequential scan, but well that’s a known thing
  • Things I forgot…
©2010-2013 Michael Paquier All content is ©Copyright of Otacoo.com 2010-2013. Privacy Policy - Terms of Use