Postgres 9.3 is going to be a great release for JSON data type. After having a look at the new functions for data generation, let’s look at the new JSON features that the commit below brings.
commit a570c98d7fa0841f17bbf51d62d02d9e493c7fcc
Author: Andrew Dunstan
Date: Fri Mar 29 14:12:13 2013 -0400
 
Add new JSON processing functions and parser API.
 
The JSON parser is converted into a recursive descent parser, and
exposed for use by other modules such as extensions. The API provides
hooks for all the significant parser event such as the beginning and end
of objects and arrays, and providing functions to handle these hooks
allows for fairly simple construction of a wide variety of JSON
processing functions. A set of new basic processing functions and
operators is also added, which use this API, including operations to
extract array elements, object fields, get the length of arrays and the
set of keys of a field, deconstruct an object into a set of key/value
pairs, and create records from JSON objects and arrays of objects.
 
Catalog version bumped.
 
Andrew Dunstan, with some documentation assistance from Merlin Moncure.

Based on stored JSON data, this commit introduces a new layer of APIs, operators and functions that can be used to manipulate and process JSON data. There are 4 new operators and 8 new functions (hopefully I counted right), so as there is a lot of content this post is only focused on the new operators.

The following set of data is used for all the examples presented in this post with some subsets of data, arrays and plain variables.
postgres=# CREATE TABLE aa (a int, b json);
CREATE TABLE
postgres=# INSERT INTO aa VALUES (1, '{"f1":1,"f2":true,"f3":"Hi I''m \"Daisy\""}');
INSERT 0 1
postgres=# INSERT INTO aa VALUES (2, '{"f1":{"f11":11,"f12":12},"f2":2}');
INSERT 0 1
postgres=# INSERT INTO aa VALUES (3, '{"f1":[1,"Robert \"M\"",true],"f2":[2,"Kevin \"K\"",false]}');
INSERT 0 1

The first operator is “->”, that can be used to fetch field values directly from JSON data. It can be used with a text value identifying the key of field.
postgres=# SELECT b->'f1' AS f1, b->'f3' AS f3 FROM aa WHERE a = 1;
f1 | f3
----+--------------------
1 | "Hi I'm \"Daisy\""
(1 row)

Multiple keys can also be used in chain to retrieve data or another JSON subset of data.
postgres=# SELECT b->'f1'->'f12' AS f12 FROM aa WHERE a = 2;
f12
-----
12
(1 row)
postgres=# SELECT b->'f1' AS f1 FROM aa WHERE a = 2;
f1
---------------------
{"f11":11,"f12":12}
(1 row)

In a more interesting way, when an integer is used as key, you can fetch data directly in a stored array, like that for example:
postgres=# SELECT b->'f1'->0 as f1_0 FROM aa WHERE a = 3;
f1_0
------
1
(1 row)

The second operator added is “->>”. Contrary to “->” that returns a JSON legal text, “->>” returns plain text.
postgres=# SELECT b->>'f3' AS f1 FROM aa WHERE a = 1;
f1
----------------
Hi I'm "Daisy"
(1 row)
postgres=# SELECT b->'f3' AS f1 FROM aa WHERE a = 1;
f1
--------------------
"Hi I'm \"Daisy\""
(1 row)

Similarly to “->”, it is possible to use either an integer or a text as key. For an integer, the key represents the position of element in an array.
postgres=# SELECT b->'f1'->>1 as f1_0 FROM aa WHERE a = 3;
f1_0
------------
Robert "M"
(1 row)
postgres=# SELECT b->'f1'->1 as f1_0 FROM aa WHERE a = 3;
f1_0
----------------
"Robert \"M\""
(1 row)

Of course, you cannot fetch data from an array using a field name.
postgres=# SELECT b->'f1'->>'1' as f1_0 FROM aa WHERE a = 3;
ERROR: cannot extract field from a non-object

As well as you cannot fetch a field using an element number.
postgres=# SELECT b->1 as f1_0 FROM aa WHERE a = 3;
ERROR: cannot extract array element from a non-array

The last 2 operators added are “#>” and “#>>”. With those ones, it is possible to fetch directly an element in an array without using a combo of the type “column->’$FIELD’->$INT_INDEX. This can make your queries far more readable when manipulating arrays in JSON.
postgres=# SELECT b#>'{f1,1}' as f1_0 FROM aa WHERE a = 3;
f1_0
----------------
"Robert \"M\""
(1 row)
postgres=# SELECT b#>>'{f1,1}' as f1_0 FROM aa WHERE a = 3;
f1_0
------------
Robert "M"
(1 row)

“#>” fetches text data in a legal JSON format, and “#>>” fetches data as plain text.

In short, those operators are good meat for brain, and nice additions for many applications.

hstore is a PostgreSQL contrib module in core code for a pretty long time. Its code is located in contrib/hstore in source folder. It is particularly useful to store sets of key/value in a single table column.

Since Postgres 9.1, its installation needs to be done in two phases.
First install the hstore library, here done from the source code. Please note that your postgres package normally already contains it.
So after downloading the source code and installing the core, do the following commands.
cd $PG_SOURCE_ROOT
cd contrib/hstore
make install

At this point all the libraries and files related to hstore are installaed in $INSTALL_FOLDER/share/extension.
ls $INSTALL_FOLDER/share/extension
hstore--1.0--1.1.sql hstore--1.1.sql hstore--unpackaged--1.0.sql hstore.control

Then connect to your Postgres server and finish hstore installation with CREATE EXTENSION command.
postgres=# CREATE EXTENSION hstore;
CREATE EXTENSION
postgres=# \dx hstore
List of installed extensions
Name | Version | Schema | Description
--------+---------+--------+--------------------------------------------------
hstore | 1.1 | public | data type for storing sets of (key, value) pairs
(1 row)

With a psql client, ‘\dx’ allows to check the list of extensions already installed on your server.

A new column type called hstore has been added. This is the column used to store the list of key/value pairs for the table.
Let’s take a table referencing a list of products as an example.
postgres=# create table products (id serial, characs hstore);
CREATE TABLE

The insertion of data can be done with several methods. Here are some of them.
postgres=# -- common insertion
postgres=# INSERT INTO products(characs) VALUES ('author=>Dave, date=>"Dec 2012", price=>"500", currency=>"dollar"');
INSERT 0 1
postgres=# -- array-based insertion
postgres=# INSERT INTO products (characs) VALUES (hstore(array['author','date','stock'],array['Mike','Nov 2012','200']));
INSERT 0 1
postgres=# -- single-pair insertion
postgres=# INSERT INTO products (characs) VALUES (hstore('author','Kim')); -- single-element
INSERT 0 1
postgres=# SELECT * FROM products;
id | characs
----+----------------------------------------------------------------------------
1 | "date"=>"Dec 2012", "price"=>"500", "author"=>"Dave", "currency"=>"dollar"
2 | "date"=>"Nov 2012", "stock"=>"200", "author"=>"Mike"
3 | "author"=>"Kim"
(3 rows)

Based on the existing fields, it is also possible to add or update values for a given key using a concatenate-based method.
Here is the update of a key “author”.
postgres=# UPDATE products SET characs = characs || 'author=>Sarah'::hstore where id = 1;
UPDATE 1
postgres=# select * from products where id = 1;
id | characs
----+-----------------------------------------------------------------------------
1 | "date"=>"Dec 2012", "price"=>"500", "author"=>"Sarah", "currency"=>"dollar"
(1 row)

If the key updated is not present in existing list, it is simply added as a new element.

You can also delete single elements.
postgres=# UPDATE products SET characs = delete(characs,'price') where id = 1;
UPDATE 1
postgres=# select * from products where id = 1;
id | characs
----+-------------------------------------------------------------
1 | "date"=>"Dec 2012", "author"=>"Sarah", "currency"=>"dollar"
(1 row)

SELECT query can also use key-based scan. Here for instance this query looks for the products with less than 300 stocks.
postgres=# SELECT id FROM products WHERE (characs->'stock')::int <= 300;
id
----
2
(1 row)

(OK, not the best way of doing for your application but this is just a scholar example!).

On top of that, hstore also supports gin and gist indexes for the operators @>, ?, ?& and ?, as well as btree and hash indexes for '='.
postgres=# create index products_index on products(characs);
CREATE INDEX

Hope this gives a good introduction to hstore.

With ImageMagick package (image manipulation library) installed on a Linux machine, it is possible to split a huge image file into smaller tiles with such kind of command:
convert -crop $WIDTHx$HEIGHT@ huge_file.png tile_%d.png

With the following parameters.

  • $WIDTH, the width of each tile splitted
  • $HEIGHT, the height of each tile splitted

Here is an example to split a file into tiles of size 16×32 pixels:
convert -crop 16x32@ huge_file.png tile_%d.png

If you came at this page, it means that you got interest in a cluster solution based on PostgreSQL.
Currently developed for version 0.9.7, Postgres-XC has been largely improved with the way cluster is being set.

Just lately, I committed this commit.
Support for dynamic pooler/session connection information cache reload
 
A new system function called pgxc_pool_reload has been added.
If called, this function reloads connection information to remote nodes
in a consistent way with the following process:
1) A lock is taken on pooler forbidding new connection requests
2) Database pools (user and database-dependant pools) are reloaded
depending on the node information located on catalog pgxc_node.
The following rules are followed depending on node connection
information modification:
- node whose node and port value is changed has its connections
dropped and this node pool is deleted from each database pool
- node deleted is deleted from each database pool
- node unchanged is kept as is. However, its index value is changed
depending on the new cluster configuration.
- node created is added to each database pool
3) Lock is released
4) Session that invocated pgxc_pool_reload signals all the other
server sessions to reconnect to pooler to allow each agent to update
with newest connection information and reload session information
related to remote node handles. This has as effect to abort current
transactions and to remove all the temporary and prepared objects
on session. Then a WARNING message is sent back to client to inform
about the cluster configuration modification.
5) Session that invocated pgxc_pool_reload reconnects to pooler by
itself and reloads its session information related to remote
node handles. No WARNING message is sent back to client to inform
about the session reload.
This operation is limited to local Coordinator and returns a boolean
depending on the success of the operation. If pooler data is consistent
with catalog information when pgxc_pool_reload is invocated, nothing is
done but a success message is returned.
 
This has the following simplifications for cluster settings:
- cluster_nodes.sql is deleted.
- a new mandatory option --nodename is used to specify the node name
of the node initialized. This allows to set up pgxc_node catalog
with the node itself. pgxc_node_name in postgresql.conf is also
set automatically.
- CREATE/ALTER/DROP node are launched on local Coordinator only, meaning
that when a cluster is set up, it is necessary to create node information
on each Coordinator and then upload this information to pooler and sessions
by invocaing pgxc_pool_reload.
 
This optimization avoids to have to restart a Coordinator when changing
cluster configuration and solves security problems related to cluster_nodes.sql
that could be edited with all types of SQL even if its first target was only NODE
DDL.

So what is behing this looooong commit text? Well, it is a feature that will simplify your life.
It is strongly related the feature called Node DDL that has been committed at the end of October. Just to recall, node DDL is a feature allowing to manage the cluster nodes with catalog tables such as you don’t have to bother about heavy settings in postgresql.conf. However, even if node DDL have been supported, it does not mean that dropping, creating or altering a node is visible to the connection pooling. You had to restart a node, increasing by that much the downtime of each Coordinators.

This commit, in one word, introduces this => pgxc_pool_reload. It is a new system function used to check whose details are described here used to reload all the server sessions and pooler connection information without having to restart a Coordinator. In other words, it simplifies the way to set up a cluster.

Now let’s enter in the main subject: the cluster setting, what can be done with the following steps:

  • Initialize the nodes with initdb
  • Create a global transaction manager and start it
  • Start up all the nodes
  • Connect to a Coordinator
  • Create all the nodes initialized with node DDL
  • Reload connection data with “select pgxc_node_reload();”

Here are a couple of details:

  • There is a new mandatory option in initdb called –nodename that is used to setup the name of the node being initialized. This is a Postgres-XC specific option. This option is used to define itself in pgxc_node catalog the node being initialized. It also sets automatically pgxc_node_name in postgresql.conf.
  • You can check the consistency of the information cached in pooler and catalogs by calling the system function pgxc_pool_check. It returns a boolean on operation success or failure.
  • The specifications of node DDL is located at those pages: CREATE NODE, DROP NODE and ALTER NODE
  • Invocating pgxc_pool_reload aborts the current transaction, and drops all the prepared and temporary objects in session. This is effective in all the session of the server
  • Node DDL run locally, so you need to launch the same node DDL on all Coordinators of the cluster. This allows more smoothness in case Coordinators view the same Datanode with different IPs.

It is also possible to manipulate cluster nodes even after initialization. It doesn’t matter how many times you change it as long as pgxc_pool_reload is used to update data cached in sessions and connection pool.

Here is also a bonus, a script that you can use to setup easily a cluster with a chosen number of Coordinators and Datanodes on a local machine. Port numbers are fixed, but it helps in trying Postgres-XC.
#!/bin/bash
#Otacoo.com
 
#Build cluster from scratch and run pg_regress
#1) Build the XC cluster: 1GTM with Coordinators (default 1) and Datanodes (default 2) defined
#2) Run pg_regress if wanted
 
#Take and check options
EXPECTED_ARGS=0
FLAG_REGRESS=0
NUM_COORDS=1
NUM_DATANODES=2
 
#Treat options
while getopts 'c:n:r' OPTION
do
  case $OPTION in
  c) #Number of Coordinators
    NUM_COORDS="$OPTARG"
    EXPECTED_ARGS=$(($EXPECTED_ARGS + 2))
    ;;
  n) #Number of Datanodes
    NUM_DATANODES="$OPTARG"
    EXPECTED_ARGS=$(($EXPECTED_ARGS + 2))
    ;;
  r) #Run regressions or not?
    FLAG_REGRESS=1
    EXPECTED_ARGS=$(($EXPECTED_ARGS + 1))
    ;;
  ?) echo "Usage: `basename $0` [-c num_coords] [-n num datanodes] [-r]\n"
    echo "Example: `basename $0` -c 4 -n 4 -r"
    exit 0
    ;;
  esac
done
 
#Check number of arguments
if [ $# -ne $EXPECTED_ARGS ]
then
  echo "Usage: `basename $0` [-c num_coords] [-n num datanodes] [-r]\n"
  echo "Example: `basename $0` -c 4 -n 4 -r"
  exit 1
fi
 
#Setup Default values
#GTM has a unique value
#Coordinator ports are mapped from 5432
#Datanode ports are mapped from 15432
#All the machines run on local host
COORD_PORT_START=5431
DN_PORT_START=15432
COORD_PORTS[1]=$COORD_PORT_START
DN_PORTS[1]=$DN_PORT_START
for i in $(seq 1 $NUM_COORDS)
do
  COORD_PORTS[$i]=$(($COORD_PORT_START + $i))
done
for i in $(seq 1 $NUM_DATANODES)
do
  DN_PORTS[$i]=$(($DN_PORT_START + $i))
done
GTM_PORT=7777
PSQL_FOLDER=$HOME/pgsql
 
#Finish calculating dependencies between folders
PSQL_SHARE=$PSQL_FOLDER/share
PSQL_BIN=$PSQL_FOLDER/bin
GTM_DATA=$PSQL_FOLDER/gtm
LOG_DATA=$PSQL_FOLDER/log
 
#Setup data folders
for i in $(seq 1 $NUM_COORDS)
do
  COORD_DATAS[$i]=$PSQL_FOLDER/coord$i
done
for i in $(seq 1 $NUM_DATANODES)
do
  DN_DATAS[$i]=$PSQL_FOLDER/datanode$i
done
 
#Kill all the processes that may remain
#in the most atrocious way possible as they meritated it
#OK this is not very clean...
echo "Take out Postgres-XC processes"
kill -9 `ps ux | grep "bin/gtm" | cut -d " " -f 2-3`
killall postgres gtm psql
sleep 2
 
#Check if data folders exist or not and create them
echo "Creating data folders"
for folder in $GTM_DATA $LOG_DATA ${COORD_DATAS[@]} ${DN_DATAS[@]}
do
  if [ ! -d $CODE_REPO_GIT ]
  then
    mkdir $folder
  fi
done
 
#Clean up all the data folders
echo "Clean up data folders"
for folder in $GTM_DATA $LOG_DATA ${COORD_DATAS[@]} ${DN_DATAS[@]}
do
  rm -r $folder/*
done
sleep 1
 
#OK, let's begin the show...
 
#make initialization
echo "Initializing PGXC nodes"
for i in $(seq 1 $NUM_DATANODES)
do
  $PSQL_BIN/initdb --locale=POSIX --nodename dn$i -D ${DN_DATAS[$i]}
done
for i in $(seq 1 $NUM_COORDS)
do
  $PSQL_BIN/initdb --locale=POSIX --nodename coord$i -D ${COORD_DATAS[$i]}
done
 
#copy all configuration files to remote machin
echo "Copy of configuration files"
#Create an empty GTM conf file and add host/port data
touch $GTM_DATA/gtm.conf
echo "nodename = 'one'" >> $GTM_DATA/gtm.conf
echo "listen_addresses = '*'" >> $GTM_DATA/gtm.conf
echo "port = 7777" >> $GTM_DATA/gtm.conf
echo "log_file = 'gtm.log'" >> $GTM_DATA/gtm.conf
 
#Node common settings
OPTIONS="logging_collector = on\n"\
"gtm_port = $GTM_PORT\n"\
"datestyle = 'postgres, mdy'\n"\
"timezone = 'PST8PDT'\n"\
"default_text_search_config = 'pg_catalog.english'\n"\
"log_statement = 'all'\n"\
"log_min_messages = debug1\n"\
"log_min_error_statement = debug1\n"\
"max_prepared_transactions = 20\n"
 
#Pooler options
POOLER_BASE_PORT=6667
#Coordinator settings
for i in $(seq 1 $NUM_COORDS)
do
  echo -e $OPTIONS >> ${COORD_DATAS[$i]}/postgresql.conf
  POOLER_NUM=$(($POOLER_BASE_PORT + $i))
  echo -e "pooler_port = $POOLER_NUM\n" >> ${COORD_DATAS[$i]}/postgresql.conf
done
#Datanode settings
for i in $(seq 1 $NUM_DATANODES)
do
  echo -e $OPTIONS >> ${DN_DATAS[$i]}/postgresql.conf
done
 
#launch gtm
echo "launch GTM"
$PSQL_BIN/gtm -x 10000 -D $GTM_DATA &
sleep 1
 
#launch datanodes
echo "launch Datanodes..."
for i in $(seq 1 $NUM_DATANODES)
do
  $PSQL_BIN/postgres -X -i -p ${DN_PORTS[$i]} -D ${DN_DATAS[$i]} > $LOG_DATA/datanode$i.log &
done
sleep 1
 
#launch coordinators
echo "launching Coordinators..."
for i in $(seq 1 $NUM_COORDS)
do
  $PSQL_BIN/postgres -C -i -p ${COORD_PORTS[$i]} -D ${COORD_DATAS[$i]} > $LOG_DATA/coord$i.log &
done
sleep 1
 
#Initialize Coordinators with cluster data
echo "initializing Coordinators..."
for i in $(seq 1 $NUM_COORDS)
do
  #Datanode connection info
  for j in $(seq 1 $NUM_DATANODES)
  do
    NODE_NAME=dn$j
    NODE_PORT=${DN_PORTS[$j]}
    $PSQL_BIN/psql -p ${COORD_PORTS[$i]} -c "CREATE NODE $NODE_NAME WITH (HOSTIP = 'localhost', NODE MASTER, NODEPORT = $NODE_PORT);" postgres
  done
  #Other Coordinator info
  for j in $(seq 1 $NUM_COORDS)
  do
    if [ "$i" -eq "$j" ]
    then
      continue
    fi
    NODE_NAME=coord$j
    NODE_PORT=${COORD_PORTS[$j]}
    $PSQL_BIN/psql -p ${COORD_PORTS[$i]} -c "CREATE NODE $NODE_NAME WITH (HOSTIP = 'localhost', COORDINATOR MASTER, NODEPORT = $NODE_PORT);" postgres
  done
  #reload data
  $PSQL_BIN/psql -p ${COORD_PORTS[$i]} -c "SELECT pgxc_pool_reload();" postgres
done
 
if [ "$FLAG_REGRESS" == 1 ]
then
  echo "running pg_regress"
  pgregress
fi
 
exit `echo $?`

You can also download it from this link.

Arrays can be created easily in PostgreSQL using the additional syntax [] when defining a column of a table.
CREATE TABLE aa (a int primary key, b int[]);
CREATE TABLE bb (a int primary key, b varchar(5)[]);

Arrays follow some special grammar. You can insert array data directly with ‘{data1,data2}’ format or by using things like ARRAY[data1,data2].
postgres=# INSERT INTO aa VALUES (1, '{1,2,3,4}');
INSERT 0 1
postgres=# INSERT INTO aa VALUES (2, ARRAY[1,2,3,4]);
INSERT 0 1
postgres=# select * from aa;
a | b
---+-----------
1 | {1,2,3,4}
2 | {1,2,3,4}
(2 rows)

An array in postgres does not have any dimension restrictions. You can create arrays with multiple dimensions if desired.
postgres=# INSERT INTO aa VALUES (3, '{{1,2},{3,4}}');
INSERT 0 1
postgres=# INSERT INTO aa VALUES (4, ARRAY[ARRAY[1,2],ARRAY[3,4]]);
INSERT 0 1
postgres=# select * from aa;
a | b
---+---------------
1 | {1,2,3,4}
2 | {1,2,3,4}
3 | {{1,2},{3,4}}
4 | {{1,2},{3,4}}
(4 rows)

A special function called array_dims allows to get dimensions of an array.
postgres=# select a, array_dims(b) from aa;
a | array_dims
---+------------
1 | [1:4]
2 | [1:4]
3 | [1:2][1:2]
4 | [1:2][1:2]
(4 rows)

An array length can be obtained by array_length.
postgres=# select array_length(b,1) from aa where a = 1;
array_length
--------------
5
(1 row)

There are another couple of useful functions like:

  • array_append, array_prepend, to add values directly to an array
  • array_cat, to assemble arrays

Here is an example.
postgres=# update aa set b = array_append(b, 5) where a = 1;
UPDATE 1
postgres=# select * from aa where a = 1;
a | b
---+-------------
1 | {1,2,3,4,5}
(1 row)

The contribution module int_array contains additional functions on integer arrays to sort elements.

The last function that looks useful for array manipulation are unnest and array_string. array_string returns data of a array as a string (Oh!) with a given separator.
postgres=# select array_to_string(b,';') from aa where a = 1;
array_to_string
-----------------
1;2;3;4;5
(1 row)

This is particularly useful for array manipulation on application side.

unnest decomposes array into single elements. This can be used to refer to foreign tables in IN clauses for example.
postgres=# select unnest(b) from aa where a = 1;
unnest
--------
1
2
3
4
5
(5 rows)
postgres=# create table cc (a int, b char(2));
CREATE TABLE
postgres=# insert into cc values (1, 'Aa'), (2, 'Bb'), (3, 'Cc'), (4, 'Dd'), (6, 'Ff');
INSERT 0 5
postgres=# select b from cc where a in (select unnest(b) from aa where a = 1);
b
----
Aa
Bb
Cc
Dd
(4 rows)

unnest is implemented internally since postgres 8.4. If you use an older version, you can defined it with that.
CREATE OR REPLACE FUNCTION unnest(anyarray)
RETURNS SETOF anyelement AS
$BODY$
SELECT $1[i] FROM
generate_series(array_lower($1,1),
array_upper($1,1)) i;
$BODY$
LANGUAGE 'sql' IMMUTABLE;

Hope you enjoyed this post.

©2010-2013 Michael Paquier All content is ©Copyright of Otacoo.com 2010-2013. Privacy Policy - Terms of Use