pg_top is a monitoring tool designed for PostgreSQL in a way similar to top. When using it you can get a grab at the process activity of your server with an output similar to that:
last pid: 425; load avg: 0.07, 0.03, 0.02; up 0+00:52:08 12:22:02
1 processes: 1 sleeping
CPU states: 0.5% user, 0.0% nice, 0.1% system, 99.8% idle, 0.0% iowait
Memory: 581M used, 15G free, 18M buffers, 211M cached
Swap:
 
 PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
 426 postgres 20 0 171M 5388K sleep 0:00 0.00% 0.00% postgres: postgres postgres [local] idle

The project has a dedicated GIT repository on postgresql.org as well as a mirror on github. There is also a mailing list on pgfoundry dedicated to the project.

It has many customization options:

  • Connection to a given PostgreSQL server with a certain IP, port, username, etc.
  • Possibility to monitor server from remote
  • An interactive mode with the possibility change the output script or even to kill a list of processes

pg_top supports many OSes (OSX, Linux, FreeBSD, hpux, aix). It is designed with a structure such as there is one set of generic APIs designed to get results for different purposes (process-related information mainly), the same function being written multiple times for each different OS but with the same spec. It is then decided which file to include in compilation at configure step by choosing among files named as machine/m_$OS_NAME.c.

These days I have been working on improving the user experience with pg_top, the first patch I sent consisting in improving a bit the documentation, the help message when an incorrect option is used and add support for long options. This was just based on my first impressions with this module, that looks to be powerful but difficult to apprehend for a newcomer.

However, the plan is not really to stop to that… Based on some optimizations added in the vPostgres version of pg_top, a couple of extra features have been proposed.

1. Database activity

The idea here is to query pg_stat_database and get back raw statistics based on transactions committed, rollbacked, tuples inserted, etc. Then those fields are analyzed and recompiled based on the time diff between two displays to get some TPS values. This is platform-independent as it relies entirely on pg_stat_database and libpq, so a generic function available to all the OS supported would be fine. The output could look like that:
Activity: 1 tps, 0 rollbs/s, 0 buffer r/s, 100 hit%, 42 row r/s, 0 row w/s

2. Disk space of PGDATA

Here, the plan is to get the disk space available for the partition where data folder of server is located, similarly to the output you can get with a plain df command. The result would look like that:
PGDATA disk: 48.6 GB total, 14.2 GB free
This would be OS-dependent, so an additional type of API in the set dedicated to the machine/* files would be necessary, and return an error if this is not supported for a certain OS.

3. Disk I/O

In order to do that, it is necessary to gather statistics from for example the parsing of a file like /proc/diskstats. The output could look like that:
Disk I/O: 0 reads/s, 0 KB/s, 0 writes/s, 32 KB/s
Once again this is OS-dependent, so it would need an extra generic API.

All of those additional outputs could be printed to screen using some dedicated options, making the default the layer the same as the current one.

And you, do you have other ideas about what could be added in pg_top?

On top of bringing the latest PostgreSQL version at hand inside a virtual machine, vFabric Postgres 9.2.4 includes some new scripts to facilitate the integration of replication features of community version of PostgreSQL.

Here is a list of the new functionalities:

  • Separate virtual disk for archives with a default size of 2G mounted on /var/vmware/vpostgres/9.2-archive.
  • New default parameters in postgresql.conf to support replication by default (wal_level, max_wal_senders, archive_command, archive_mode, etc.).
  • New scripts for the management of replication between vFabric Postgres nodes: slave creation, node promotion, replication monitoring. Those scripts are located in folder /opt/vmware/vpostgres/current/share.

The replication configuration that can be achieved thanks to those scripts is really simple and convenient for most of real-world applications.

  • All the slaves use streaming replication to catch up with master in asynchronous mode
  • If a slave is disconnected for a too long time from the cluster and it cannot get necessary WAL files from archives of master, a new base backup is needed. This helps in maintaining the archives at a low size level (size by the way customizable by changing the disk size of archives in the virtual machine settings) by keeping in memory only the WAL files that are needed by slaves to catch up with a master
  • Recovery default is the latest timeline. This is to ensure that a slave trying to reconnect to a node freshly promoted will catch up with the latest changes that occurred in cluster
  • Slave/slave connections are possible with cascading replication

Here are more details about the scripts implemented. They are aimed to be used only with vFabric Postgres nodes whose version is higher than 9.2.4.

run_as_replica

Transform the existing vFabric Postgres server into a slave by getting it in sync with a given root node (either slave or master), or reconnect to an existing node in the cluster. When creating a new slave on a freshly-installed vFabric Postgres 9.2.4, you need to use options -b to take a new base backup and -W to specify password to connect to remote node, recommended user being “postgres”.

A single command based on this script is enough to set up a read-only slave becoming the replica of an existing root node in order to leverage read activity of an application with load balancing through multiple virtual machines of vFabric Postgres.
$ /opt/vmware/vpostgres/current/share/run_as_replica -h $IP_MASTER -b -W -U postgres
This script registers automatically SSH authorization key of slave node on root node for archive transfer between nodes so there is no need to worry in doing additional settings for pg_hba.conf, recovery.conf, postgresql.conf or SSH on root node side. Note that as default value max_wal_senders is set to 3, so you might want to increase this value on root node when connecting setting up more slave on a given root node.
This script can also be used to reconnect an existing slave to a new root node. In the case where slave node is not able to catch up with the root node because of missing WAL archive files or because the slave node was in advance compared to the root node in term of WAL replay, a new base backup is necessary.

A generated recovery.conf looks like that:
standby_mode= 'on'
primary_conninfo = 'host=''$IP_ADDRESS'' user=''postgres'' application_name=''localhost.localdom'' password=''$PASSWORD'''
recovery_target_timeline = 'latest'
restore_command = 'scp $IP_ADDRESS:/var/vmware/vpostgres/9.2-archive/%f %p'

$IP_ADDRESS is the IP used by slave node to connect to a root node (either slave or master). $PASSWORD is the password that has been specified when calling run_as_replica.

show_replication_status

This script can be used to monitor the WAL replay activity of slave nodes connected to the node where script is launched. The following query is used on the server.
with xl as (
select pg_last_xlog_receive_location() as log_receive_position,
pg_last_xlog_replay_location() as log_replay_position)
select xl.log_receive_position as log_receive_position,
xl.log_replay_position as log_replay_position,
pg_xlog_location_diff(xl.log_receive_position,
xl.log_replay_position) as replay_delta
from xl;

Compared to the default columns of pg_stat_replication (catalog table used to report activity of replicated nodes) a new column called replay_delta is used to monitor how a given slave is late compared to a master node when replaying WAL.
When monitoring replication activity, results similar to that are obtained:
postgres@localhost:~> /opt/vmware/vpostgres/current/share/show_replication_status
sync_priority | slave | sync_state | log_receive_position | log_replay_position | receive_delta | replay_delta
---------------+--------------------+------------+----------------------+---------------------+---------------+--------------
0 | localhost.localdom | async | 0/8000000 | 0/8000000 | 0 | 0
(1 row)

The lower the values of receive_delta and replay_delta are, the closer slave node is getting to the master in term of WAL replay (replication state).

promote_replica_to_primary

This script allows promoting a given slave to master using the default settings in recovery.conf.
$ /opt/vmware/vpostgres/current/share/promote_replica_to_primary
server promoting

Other slave nodes can reconnect to a new master using run_as_replica.

archive_command

This script is used as a command for archiving WAL files (set by archive_command in postgresql.conf, kicked by vFabric Postgres server each time a WAL file is ready to be archived, or archive_timeout is reached). This script includes some checks on the existence of WAL file to be archived and some checks on the size of archive disk to ensure that only required WAL files are maintained in archives based on the size of disk available for archives. WAL files are archived if root node is part of a cluster as primary or replica.

create_replication_user

This script can be used to create a new replication user on a master node. It generates a CREATE USER query using the password asked at prompt.

Having those scripts already in place inside the vFabric Postgres appliance and RPMs has several advantages, two of them being:

  • Having unique and consistent scripts used for the management of vFabric Postgres cluster without deploying any additional and home-made tools
  • Minimizing node replication/promotion operations, only a basic understanding of internal mechanics of database server is enough: failover, reconnection and slave/master structure

As a DBA and an operator, such things make the management of clusters and applications really easier in an ESX cluster of hundreds of vFabric Postgres servers for example.

IF EXISTS and IF NOT EXISTS are clauses allowing to return a notice message instead of an error if a DDL query running on a given object already exists or not depending on the DDL action done. If a given query tries to create an object when IF NOT EXISTS is specified, a notice message is returned to client if the object has already been created and nothing is done on server side. If the object is altered or dropped when IF EXISTS is used, a notice message is returned back to client if the object does not exist and nothing is done.

Here is what simply happens when a table that exists is created:
postgres=# CREATE TABLE IF NOT EXISTS aa (a int);
CREATE TABLE
postgres=# CREATE TABLE IF NOT EXISTS aa (a int);
NOTICE: relation "aa" already exists, skipping
CREATE TABLE

Similarly, when dropping this table based on its existence.
postgres=# DROP TABLE IF EXISTS aa;
DROP TABLE
postgres=# DROP TABLE IF EXISTS aa;
NOTICE: table "aa" does not exist, skipping
DROP TABLE

Prior to 9.3, PostgreSQL already proposed this feature with many objects: tables, index, functions, triggers, language, etc. Such SQL extensions are useful when running several times the same script several times and avoiding errors on environments already installed.

9.3 introduces some new flavors of IF [NOT] EXISTS completing a bit more the set of objects already supported.

  • CREATE SCHEMA [IF NOT EXISTS]
  • ALTER TYPE ADD VALUE [IF NOT EXISTS]
  • Extension of DROP TABLE IF EXISTS such as it succeeds if the specified schema does not exists

Note also that the new materialized views are also supported with IF [NOT] EXISTS for CREATE, ALTER and DROP.

The extension of CREATE SCHEMA with IF NOT EXISTS is pretty simple. Similarly to the other objects, command succeeds if the schema already exists and a notice message about the existence of schema is sent back to client.
postgres=# CREATE SCHEMA foo;
ERROR: schema "foo" already exists
postgres=# CREATE SCHEMA IF NOT EXISTS foo;
NOTICE: schema "foo" already exists, skipping
CREATE SCHEMA

Note that subsequent schema elements cannot be used with this option.
postgres=# CREATE SCHEMA IF NOT EXISTS foo CREATE TABLE aa (a int);
ERROR: CREATE SCHEMA IF NOT EXISTS cannot include schema elements
LINE 1: CREATE SCHEMA IF NOT EXISTS foo CREATE TABLE aa (a int);

The second addition, ALTER TYPE ADD VALUE [IF NOT EXISTS] is useful in the case of enumeration types to condition the addition of new values.
postgres=# CREATE TYPE character_type AS ENUM ('warrior', 'priest', 'sorcerer');
CREATE TYPE
postgres=# ALTER TYPE character_type ADD VALUE IF NOT EXISTS 'magician';
ALTER TYPE
postgres=# ALTER TYPE character_type ADD VALUE IF NOT EXISTS 'magician';
NOTICE: enum label "magician" already exists, skipping
ALTER TYPE

The last improvement is also a nice thing to have. Here is what you could obtain prior to 9.3 when trying to use DROP TABLE IF EXISTS on a table using a schema that did not exist.
postgres=# DROP TABLE IF EXISTS foosch.foo;
ERROR: schema "foosch" does not exist

And here is what you get now:
postgres=# DROP TABLE IF EXISTS foosch.foo;
NOTICE: table "foo" does not exist, skipping
DROP TABLE

Those are definitely nice additions, especially the new extension of IF NOT EXISTS on schemas which was really missing in the existing set.

Before launching make for a raw build, Postgres does some preprocessing with configure to setup the installation based on the environment and the different options given by user. If you don’t do that before trying a make, your build will fail with a critical hit of this type:
$ make
You need to run the 'configure' program first. See the file
'INSTALL' for installation instructions.
make: *** [all] Error 1

It is important to get familiar with this way of doing before actually working or developing Postgres, so be sure to go through the notes of this post to learn one thing or two, or even complete those notes with comments at the bottom.

This process is launched with ./configure, located at the root of the code path (it can even be called outside of code folder in the case of a vpath installation) and generated directly from configure.in using autoconf. A couple of scripts located in config/ containing additional methods not in autoconf are also used when generating ./configure from ./configure.in for many things:

  • Check presence of necessary library dependencies, some of them being optional (documentation) and others mandatory as core process need them (flex, bison)
  • Check availability of some languages (perl.m4, python.m4)
  • Test C-related functionalities (c-compiler.m4, c-library.m4)
  • etc.

Note that those files are included through ./aclocal.m4 like that:
m4_include([config/c-compiler.m4])
./aclocal.m4 is included itself automatically by autoconf as default.

If you are a PostgreSQL developer, you might at some point create a fork of Postgres for a private project. In this case, you should definitely modify your project to be a maximum consistent with Postgres itself in order to facilitate merges with future versions. The preprocessing Postgres uses has few chances to change but it usually includes fixes that may be platform dependent, so be sure to fetch the fixes when they are here. Here are also some advices about how you should manage a fork regarding configure:

  • Modify in priority configure.in and not configure. Generate configure based on your modifications of configure.in.
  • When extra preprocessing is needed, add your own m4 procedures in config/ in separate files, except if they overlap with existing checks, so as if a fix happens on Postgres itself you will be able to at least detect easily if there is a conflict with what your own stuff.
  • Declare additional m4 files in ./aclocal.m4
  • Similarly, do not forget to update Makefile.global.in when adding some new variables, and use them!

Here is a short example of how you could add your own .in file with some code dedicated to configure.in understandable by autoconf.
#Some initialization
AC_INIT([mypgfork], [1.0devel], [joe@example.com])
AC_CONFIG_AUX_DIR(config)
 
# Setup a default prefix
AC_PREFIX_DEFAULT(/usr/local/psql)
 
# Addition of my own customized file
AC_CONFIG_FILES([foo.cfg.in])
 
# Option to enforce optimize to a wanted flag
PGAC_ARG_BOOL(enable, optimize, no,
  [build with optimize symbol (-O2)])
 
# supply -g if --enable-super-debug
if test "$enable_optimize" = yes; then
 CFLAGS="-O2"
fi
 
# Substitute CFLAGS value in .in files
AC_SUBST(CFLAGS)
 
# Generate the output to files
AC_OUTPUT

In this case a file called foo.cfg *is* generated… Feel free to play with this piece of code btw.

For a vanilla Postgres build, two files that are used for code make and installation are generated: src/Makefile.global.in (containing all the variables that are environment-dependent as well as all the configuration parameters that user has set when launching ./configure) and GNUMakefile.in.

When generating Postgres from raw code, there are some options in ./configure you should absolutely know about if you are a developer:

  • –prefix to define the folder where library and binaries will be installed (bin, share, lib, include).
  • –enable-cassert, to enable assertions inside a build (avoid that for a production build, this option is good only or development).
  • –enable-debug, making the code to compile with -g in CFLAGS, the default being -O2 for production builds.
  • –enable-depend, to enable automatic dependency tracking, useful to recompile all the objects affected by a header modification.

There are also some options that you might need if you develop an application based on Postgres:

  • –with-perl, to enable PL/Perl on server side
  • –with-python, to enable PL/Python on server side
  • –with-blocksize, defining the default block size used by table pages, default begin 8kB. Note that using binaries compiled with a given block size is not compatible with a existing server that has been initialized with a different block size.

Do not forget to have a look at all the options available here, configure might have many things that remained hidden to you until now.

psql is the native client of PostgreSQL widely used by application developers and database administrators on a daily-basis for common operations when interacting with a PostgreSQL server. With a full set of integrated functionalities, it is among the most popular (if not the number one) client applications in the Postgres community. If you are a Postgres nerd (highly possible if you are reading this page), you know that the feeling of discovering a new functionality in psql is close to the excitement you can have when opening a christmas present as you know that such a feature went though the strict community review process. The upcoming release 9.3 of Postgres is not an exception and brings a new useful command called \watch. This feature has been introduced by this commit:
commit c6a3fce7dd4dae6e1a005e5b09cdd7c1d7f9c4f4
Author: Tom Lane
Date: Thu Apr 4 19:56:33 2013 -0400
 
Add \watch [SEC] command to psql.
 
This allows convenient re-execution of commands.
 
Will Leinweber, reviewed by Peter Eisentraut, Daniel Farina, and Tom Lane

Close to what \g can do when replaying the last query stored in buffer, \watch offers the possibility to replay the same query with a given interval of time as follows:
\watch [ seconds ]

Compared to a wrapper on psql that would run repetitively the same query, \watch does not need to acquire a new connection each time the query is executed, saving some execution overhead. Also, the watch automatically stops if a failure occurs for the query.

Note that \watch can only be used at the end of the query you want to run.
postgres=# \watch 2 "SELECT 1"
\watch cannot be used with an empty query

This error does not appear if a query is stored in buffer.

If no query is specified it will use the latest query in buffer.
postgres=# \watch 10
Watch every 10s Thu May 2 13:15:14 2013
 
?column?
----------
1
(1 row)

For example, simply check if your server is alive using psql:
postgres=# select 1; \watch 1
?column?
----------
1
(1 row)
Watch every 1s Thu May 2 13:06:53 2013
 
?column?
----------
1
(1 row)
 
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

Oops, the connection has been closed…

Here is an other example: check every minute the latest query that ran on server.
postgres=# SELECT datname, query, usename FROM pg_stat_activity ORDER BY query_start DESC LIMIT 1; \watch 60
datname | query | usename
----------+-----------------------------------------------------------------------------------------+----------
postgres | SELECT datname, query, usename FROM pg_stat_activity ORDER BY query_start DESC LIMIT 1; | postgres
(1 row)
 
Watch every 60s Thu May 2 13:40:49 2013
 
datname | query | usename
----------+-----------------------------------------------------------------------------------------+----------
postgres | SELECT datname, query, usename FROM pg_stat_activity ORDER BY query_start DESC LIMIT 1; | postgres
(1 row)

A last example: kill periodically backends (here every 60s) that have not been activated for a given period of time (here 30s).
postgres=# SELECT pg_terminate_backend(pid) as status FROM pg_stat_activity
postgres=# WHERE now() - state_change > interval '30 s' AND
postgres=# pid != pg_backend_pid();
status
--------
(0 rows)
 
postgres=# \watch 60
Watch every 60s Thu May 2 13:51:04 2013
 
status
--------
(0 rows)

In short, a lot of things are doable with \watch, you can automatize for example actions easily with a psql client, like the refresh of a materialized view. At least I won’t need anymore to type 50 times the same query when developping an application using Postgres or creating a new feature.

©2010-2013 Michael Paquier All content is ©Copyright of Otacoo.com 2010-2013. Privacy Policy - Terms of Use