Based on my previous experience using custom background workers (new feature of PostgreSQL 9.3), here is more detailed example of bgworker doing a simple “Hello World” in the server logs.

Well, the example provided in this post does a little bit more than logging a simple “Hello World”, as logging is controlled by a loop running at a given interval of time. Also, the process can be immediately stopped even during its sleep with SIGTERM. So here is in more details how this is done.

Header files

/* Minimum set of headers */
#include "postgres.h"
#include "postmaster/bgworker.h"
#include "storage/ipc.h"
#include "storage/latch.h"
#include "storage/proc.h"
#include "fmgr.h"

This is the minimal set of header files that needs to be provided in order to have the bgworker working correctly. Note that in the case of this example, the sleep time is controlled by a Latch on the process of the background worker, explaining why latch.h and proc.h are included.

Declarations

/* Essential for shared libs! */
PG_MODULE_MAGIC;
 
/* Entry point of library loading */
void _PG_init(void);
 
/* Signal handling */
static bool got_sigterm = false;

PG_MODULE_MAGIC is absolutely necessary for libraries loaded via shared_preload_libraries or server will fail with a FATAL error. Then the only function needed in library is _PG_init, entry point to register the bgworker using the dedicated APIs. Finally a static boolean is used as a flag for SIGTERM activation.

Initialization

void
_PG_init(void)
{
 BackgroundWorker worker;
 
 /* Register the worker processes */
 worker.bgw_flags = BGWORKER_SHMEM_ACCESS;
 worker.bgw_start_time = BgWorkerStart_RecoveryFinished;
 worker.bgw_main = hello_main;
 worker.bgw_sighup = NULL;
 worker.bgw_sigterm = hello_sigterm;
 worker.bgw_name = "hello world";
 worker.bgw_restart_time = BGW_NEVER_RESTART;
 worker.bgw_main_arg = NULL;
 RegisterBackgroundWorker(&worker);
}

This portion of code is used to register the new worker. In this example bgw_start_time is set to start only once the system has reached a stable read-write state. The process is also allowed access to shared memory with the flag BGWORKER_SHMEM_ACCESS (this is used for MyProc, as this is statically included in procarray.c). Finally there are definitiosn for the functions used first as a main loop for logging of “Hello World” and for the function to kick when there is a SIGTERM on background worker. The process is requested not to restart in case of a crash.

Main loop

static void
hello_main(void *main_arg)
{
 /* We're now ready to receive signals */
 BackgroundWorkerUnblockSignals();
 while (!got_sigterm)
 {
  int rc;
  /* Wait 10s */
  rc = WaitLatch(&MyProc->procLatch,
    WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
    10000L);
  ResetLatch(&MyProc->procLatch);
  elog(LOG, "Hello World!"); /* Say Hello to the world */
 }
 proc_exit(0);
}

This is the main loop used for the background process. As long as SIGTERM is not received, the process will continue to loop and log “Hello World” every 10s, time interval being symbolized by 10000L. Note that WaitLatch can be awaken with different events: timeout of the time interval provided, Latch being set or postmaster death.

SIGTERM handler

static void
hello_sigterm(SIGNAL_ARGS)
{
 int save_errno = errno;
 got_sigterm = true;
 if (MyProc)
  SetLatch(&MyProc->procLatch);
 errno = save_errno;
}

When SIGTERM is used on the process, the sleep time previously invoked with WaitLatch is stopped with SetLatch immediately. This is controlled by flag WL_LATCH_SET that awakes the Latch when set properly.

Makefile

MODULES = hello_world
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

This is a simple Makefile. Be sure to name the file containing the code given in this post as hello_world.c. The library generated will be called hello_world.so.

Once this code is run, you will be able to see the process running with a simple ps command.
$ ps x | grep "hello"
13327 ?? Ss 0:00.00 postgres: bgworker: hello world

Be sure to set this variable in postgresql.conf to load the worker correctly.
shared_preload_libraries = 'hello_world'

PostgreSQL 9.2 has introduced a new feature related to JSON with a built-in data type. So you can now store inside your database directly JSON fields without the need of an external format checker as it is now directly inside Postgres core. The feature has been added by this commit.
commit 5384a73f98d9829725186a7b65bf4f8adb3cfaf1
Author: Robert Haas
Date: Tue Jan 31 11:48:23 2012 -0500
 
Built-in JSON data type.
 
Like the XML data type, we simply store JSON data as text, after checking
that it is valid. More complex operations such as canonicalization and
comparison may come later, but this is enough for now.
 
There are a few open issues here, such as whether we should attempt to
detect UTF-8 surrogate pairs represented as \uXXXX\uYYYY, but this gets
the basic framework in place.

A couple of system functions have also been added later to output some row or array data directly as json.
commit 39909d1d39ae57c3a655fc7010e394e26b90fec9
Author: Andrew Dunstan
Date: Fri Feb 3 12:11:16 2012 -0500
 
Add array_to_json and row_to_json functions.
 
Also move the escape_json function from explain.c to json.c where it
seems to belong.
 
Andrew Dunstan, Reviewed by Abhijit Menon-Sen.

What actually Postgres core does with JSON fields is to store them as text fields (so maximum size of 1GB) and top of that a string format check can be performed directly in core. Let’s use that in a practical use case, a table storing a list of shop items for an RPG game. In an RPG game, there are several types of items, each of them having different fields for its statistics. For example a sword will have an attack value, and a shield a defense value, the opposite being unlogical (except if the sword has a magical defense bonus and the shield some fire protection for example…). Well, what I meant is that you do not need to create multiple tables for each item type but you can possibly store this data in a unique item table thanks to the flexibility of JSON. With the format check done now in Postgres core, you do not need either to perform the string format check on application side.

postgres=# CREATE TABLE rpg_items (c1 serial, data json);
CREATE TABLE
postgres=# INSERT INTO rpg_items (data) VALUES
postgres-# ('{"name":"sword","buy":"500","sell":"200","description":"basic sword","attack":"10"}');
INSERT 0 1
postgres=# INSERT INTO rpg_items (data) VALUES
postgres-# ('{"name":"shield","buy":"200","sell":"80","description":"basic shield","defense":"7"}');
INSERT 0 1
postgres=# SELECT * FROM rpg_items;
c1 | data
----+--------------------------------------------------------------------------------------
1 | {"name":"sword","buy":"500","sell":"200","description":"basic sword","attack":"10"}
2 | {"name":"shield","buy":"200","sell":"80","description":"basic shield","defense":"7"}
(2 rows)

In case of a format error you will obtain something similar to this:
postgres=# INSERT INTO rpg_items (data) VALUES ('{"name":"dummy","buy":"200","ppo"}');
ERROR: invalid input syntax for type json
LINE 1: INSERT INTO rpg_items (data) VALUES ('{"name":"dummy","buy":...

Then, you can also manipulate existing tables and output its data to client as JSON.
postgres=# CREATE TABLE rpg_items_defense (c1 serial, buy int, sell int, description text, defense int);
CREATE TABLE
postgres=# INSERT INTO rpg_items_defense (buy, sell,description, defense)
postgres-# VALUES (200, 80, 'basic shield', 7);
INSERT 0 1
postgres=# SELECT row_to_json(row(buy,sell,description,defense)) FROM rpg_items_defense;
row_to_json
-----------------------------------------------
{"f1":200,"f2":80,"f3":"basic shield","f4":7}
(1 row)

The field names have default values generated automatically by Postgres.

Or output array values as JSON.
postgres=# CREATE TABLE rpg_items_attack(int serial, fields int[], description text);
CREATE TABLE
postgres=# INSERT INTO rpg_items_attack (fields, description) VALUES
postgres-# ('{500,200,10}','basic sword');
INSERT 0 1
postgres=# SELECT row_to_json(row(array_to_json(fields), description)) FROM rpg_items_attack;
row_to_json
----------------------------------------
{"f1":[500,200,10],"f2":"basic sword"}
(1 row)

This is of course not the only solution possible. Take care of making the good choice when designing your application.

Last week, I had an interesting discussion in the Postgres hackers mailing list about integrating pg_reorg features (possibility to reorganize a table without locks on it) directly into postgres core. Community strongly suggested that pg_reorg cannot be integrated as-is in the contribution modules of postgres core, and instead postgres should provide native ways to reorganize a table without taking heavy locks. This means that a table could be reindexed or clustered, and at the same time read and writes operations could still happen in parallel. What is particularly useful when an index is broken in a production database, as you could keep your table free of access for the other sessions running while the table is reorganized.

So, the following suggestions have been made:

  • Implementation of CLUSTER CONCURRENTLY
  • Implementation of REINDEX CONCURRENTLY
  • ALTER TABLE CONCURRENTLY
  • Extend autovacuum to perform REINDEX and CLUSTER in parallel automatically

ALTER TABLE, CLUSTER and REINDEX share a common thing: they need high-level locks to be performed. So there is a risk that the table being manipulated by one of those operations could not be accessible for a long time, especially of the table is huge. The locks taken would block read and/or write operations for the other sessions, what is not acceptable for production environment if a critical table is touched.

Working on ALTER TABLE might be a huge piece of work, CLUSTER and REINDEX look more accessible. So I took some week-end spare time while a typhoon was on Tokyo area to write some code and studied the case of REINDEX CONCURRENTLY. And I finished with a patch, yeah!

Here are more details about the feature proposed…
You can rebuild a table or an index concurrently with such commands:
REINDEX INDEX ind CONCURRENTLY;
REINDEX TABLE tab CONCURRENTLY;

REINDEX CONCURRENTLY has the following restrictions:

  • REINDEX [ DATABASE | SYSTEM ] cannot be run concurrently.
  • REINDEX CONCURRENTLY cannot run inside a transaction block.
  • Shared tables cannot be reindexed concurrently
  • indexes for exclusion constraints cannot be reindexed concurrently.
  • toast relations are reindexed non-concurrently when table reindex is done and that this table has toast relations

Here are more details about the algorithm used. Roughly, a secondary index is created in parallel of the first one, it is completed. Then the old and fresh indexes are switched. For a more complete description (the beginning of the process is similar to CREATE INDEX CONCURRENTLY):

  1. creation of a new index based on the same columns and restrictions as the index that is rebuilt (called here old index). This new index has as name $OLDINDEX_cct. So only a suffix _cct is added. It is marked as invalid and not ready
  2. Take session locks on old and new index(es), and the parent table to prevent unfortunate drops
  3. Commit and start a new transaction
  4. Wait until no running transactions could have the table open with the old list of indexes
  5. Build the new indexes. All the new indexes are marked as indisready
  6. Commit and start a new transaction
  7. Wait until no running transactions could have the table open with the old list of indexes
  8. Take a reference snapshot and validate the new indexes
  9. Wait for the old snapshots based on the reference snapshot
  10. mark the new indexes as indisvalid
  11. Commit and start a new transaction. At this point the old and new indexes are both valid
  12. Take a new reference snapshot and wait for the old snapshots to insure that old indexes are not corrupted,
  13. Mark the old indexes as invalid
  14. Swap new and old indexes, consisting here in switching their names.
  15. Old indexes are marked as invalid.
  16. Commit and start a new transaction
  17. Wait for transactions that might use the old indexes
  18. Old indexes are marked as not ready
  19. Commit and start a new transaction
  20. Drop the old indexes

This feature will be normally submitted for review to the PostgreSQL 9.3 commit fest. For the time being patch has been given to community.

Some technical details…

  • A new set of functions has been created in index.c to manage concurrent operations.
  • Code is relying a maximum on existing index creation, building and validation functions for maintainability.
  • Documentation, as well as regression tests have been added in the first version of the patch.
  • Concurrent operations are longer, require additional CPU, IO and memory but they are lock free. The parent relation and indexes cannot be dropped during process.
  • If an error occurs during process, the table will finish with invalid indexes (marked with suffix _cct in their names). It is the responsability of the user to drop them.
  • If you are looking for the patch, have a look here.

Please note that those specification notes are based on the first version of the patch proposed, and are subject to change depending on the community and reviewers’ feedback.

Edit 2012/10/14: A new version of the patch has been submitted with the following enhancements:

  • Support for toast relations to be reindexed concurrently as well as other indexes
  • Correction of drop behavior for constraint indexes
  • Correction of bugs
  • Support for exclusion constraints, looks to work as far as tested

The patch has been submitted to pgsql-hackers in this email.

After obtaining a machine with MacOS Lion in it, here are the applications that look absolutely mandatory.
This list is more a memo than anything, but you could find it useful.

  • ClamXav, a free open-source antivirus. This can be downloaded from the AppleStore. Scans are light and quick.
  • LibreOffice, great free ressource for documentation, and has no Oracle logo included with it.
  • Xcode, a free development kit for Mac applications. It includes a gcc compiler, and can be found in the AppleStore.
  • iterm(2), alternative for terminal knowing that the default terminal in MacOS is a pain to use. iterm is not fully compatible with MacOSX Lion but iterm2 can be a solution.
  • vlc, a video viewer ressource. I’m with it for a couple of years, and it has always been astonishing light and fast.
  • MacPorts, an alternative way to manage packages with command line.
  • Minecraft, no comments on this one…

And that’s all…

©2010-2013 Michael Paquier All content is ©Copyright of Otacoo.com 2010-2013. Privacy Policy - Terms of Use