Based on a 2nd version of the patch implementing MVCC catalog access, here is a second set of results after the first try done last week.

The same tests as last week are done, aka the backend startup time and the CREATE/DROP of multiple objects. The formula used to calculate performance comparison is the same as last week. The vanilla measurement use SnapshotNow, while MVCC uses the MVCC catalogs (Oh surprise!).


Series of DROP/CREATE
CREATE (ms) DROP (ms)
Connections MVCC vanilla % perf MVCC vanilla % perf
250 25685.094 24846.060 3.32 31492.377 30391.713 3.55
500 28557.882 24339.449 15.94 33673.266 30084.741 11.25
1000 32210.093 26007.960 21.30 36754.888 31019.918 16.92
2000 40374.754 26900.324 40.05 43442.528 30741.989 34.24
Backend startup time
Connections MVCC real CPU (s) vanilla real CPU (s) % perf
250 9.067 8.993 0.82
500 9.034 9.004 0.33
1000 9.303 9.072 2.15
2000 9.916 9.257 6.87

Except for the numbers of backend startup case for 500 connections which look to be victim of some noise, performance degradation is acceptable for the backend test, topping at 6% for 2000 connections. During the first battery of tests, this reached 10% of performance degradation.

For the CREATE/DROP test, performance drops up to 40% for 2000 connections which is a little bit worse than the 32% degradation noticed after the 1st battery of tests.

Still, all those results look to be acceptable even for a high number of connections.

A couple of days ago I saw an email about MVCC catalog access on the PostgreSQL hackers mailing list. When evaluating if a given heap tuple of a table is valid or not, PostgreSQL uses a transaction snapshot which is more or less a list of active transactions IDs, but this snapshot can be used in many flavors to check if the heap tuple satisfies a given visibility condition or not. One of the tuple validation methods using snapshots is called SnapshotNow which is not MVCC-safe. Why? Because the validation using SnapshotNow includes the effects of all the committed transactions as of the instant when validation is done, while MVCC validation considers the transactions that were in progress or unstarted when snapshot was taken as uncommitted.

Validation with SnapshotNow is good for performance as it limits the interactions within backends when operations are done on system tables, and consistency is protected by the necessary lock levels taken when performing a DDL operation.

Trying to solve such problems is particularly interesting for operations like concurrent DDLs (and even for other things) that need to be performed with low level locks, meaning that such DDLs can run while other database operations like common write/read process on a table. The first set of features coming to my mind that could be implemented on top of MVCC catalogs are: REINDEX CONCURRENTLY, CLUSTER CONCURRENTLY and ALTER TABLE CONCURRENTLY. Note that those operations can now be performed out-of-the-box up to a certain level with extensions like pg_reorg or pg_repack, but even those operations need to take an exclusive lock on the objects reorganized when performing switch on the old and new objects.

MVCC catalog access is a necessary milestone in making such features bullet-proof from the consistency point of view, but people are concerned about the performance impact it could have. Using the set of tests created and written by Robert Haas on the mail previously linked, I did some measurements with a 32-core machine to try to have results with large snapshots by having open transactions with up to 2000 connections. The results of those measurements are posted on the mailing list, but they are in a raw shape, not really user-friendly. So I thought that having them in a better reshaped on tables with performance comparison would be nice.

In order to test the backup startup, the following test is done.
$ time for s in `seq 1 1000`; do rm -f
bin/pgsql/master/pg_internal.init && psql -c 'SELECT 2+2' >/dev/null;
done

The measurements have the following characteristics.

  • Tests done while 250, 500, 1000 or 2000 remain open with a transaction ID acquired
  • Test 1 is the creation and drop of 100,000 within 1 backend
  • Test 2 is measure the time to open a backend and run a simple query on it with no relcache file
  • Performance difference percentage is evaluated with a simple formula: abs(nonMVCC – VMCC)/avg(nonVMCC, MVCC)

Those measurements do not really take advantage of the multi-core architecture of the mechine used as they run on a single backend, but well… That’s a good machine…

After re-treating the data…


Series of DROP/CREATE
CREATE (ms) DROP (ms)
Connections MVCC Non-MVCC % perf MVCC Non-MVCC % perf
250 26904.755 24554.849 9.133012372 32891.556 29755.146 10.01300914
500 29105.713 25872.886 11.76031059 33674.336 30434.019 10.10887582
1000 33281.246 28178.21 16.60618669 36618.166 31747.451 14.24901936
2000 39987.815 28708.095 32.83956789 43157.006 32510.057 28.14156802
Backend startup time
Connections MVCC real CPU (s) Non-MVCC real CPU (s) % perf
250 9.297 9.124 1.878291081
500 9.592 9.221 3.94408122
1000 9.746 9.302 4.661906762
2000 9.974 9.113 9.021847331

With a really high number of connections, the performance loss is close 30% for the DDL case and 10% for the connection case. Those test cases are really the worst scenarios possible when working with DDL, as you would hardly find someone dropping that many objects at the same time (btw why not?), and users would use a connection pooling facility before reaching such a number of active connections. For a *low* number of open connections, this performance loss is really low for the backend startup case. For the DDL case, a loss of 10% can be seen even with a low number of connections but just by thinking the possible functionality gain if MVCC catalogs are implemented this looks acceptable, no?

Modifying the format name of dump file in a Linux system can be made with sysctl like this.
sysctl -w kernel.core_pattern=core.%e.%p
However, making this modification command-based will not make it effective at next reboot.

In order to make the modification permanent, you need to edit the file /etc/sysctl.conf. Here the core file has the executable name %e and the process ID %p.
kernel.core_pattern = core.%e.%p

Here is a list of the possible keywords usable:

  • %p, PID of dumped process
  • %u, (numeric) real UID of dumped process
  • %g, (numeric) real GID of dumped process
  • %s, number of signal causing dump
  • %t time of dump, expressed as seconds since the Epoch, 1970-01-01 00:00:00 +0000 (UTC)
  • %h, hostname (same as nodename returned by uname(2))
  • %e, executable filename (without path prefix)
  • %c, core file size soft resource limit of crashing process (since Linux 2.6.24)
©2010-2013 Michael Paquier All content is ©Copyright of Otacoo.com 2010-2013. Privacy Policy - Terms of Use