Based on a 2nd version of the patch implementing MVCC catalog access, here is a second set of results after the first try done last week.

The same tests as last week are done, aka the backend startup time and the CREATE/DROP of multiple objects. The formula used to calculate performance comparison is the same as last week. The vanilla measurement use SnapshotNow, while MVCC uses the MVCC catalogs (Oh surprise!).


Series of DROP/CREATE
CREATE (ms) DROP (ms)
Connections MVCC vanilla % perf MVCC vanilla % perf
250 25685.094 24846.060 3.32 31492.377 30391.713 3.55
500 28557.882 24339.449 15.94 33673.266 30084.741 11.25
1000 32210.093 26007.960 21.30 36754.888 31019.918 16.92
2000 40374.754 26900.324 40.05 43442.528 30741.989 34.24
Backend startup time
Connections MVCC real CPU (s) vanilla real CPU (s) % perf
250 9.067 8.993 0.82
500 9.034 9.004 0.33
1000 9.303 9.072 2.15
2000 9.916 9.257 6.87

Except for the numbers of backend startup case for 500 connections which look to be victim of some noise, performance degradation is acceptable for the backend test, topping at 6% for 2000 connections. During the first battery of tests, this reached 10% of performance degradation.

For the CREATE/DROP test, performance drops up to 40% for 2000 connections which is a little bit worse than the 32% degradation noticed after the 1st battery of tests.

Still, all those results look to be acceptable even for a high number of connections.

A couple of days ago I saw an email about MVCC catalog access on the PostgreSQL hackers mailing list. When evaluating if a given heap tuple of a table is valid or not, PostgreSQL uses a transaction snapshot which is more or less a list of active transactions IDs, but this snapshot can be used in many flavors to check if the heap tuple satisfies a given visibility condition or not. One of the tuple validation methods using snapshots is called SnapshotNow which is not MVCC-safe. Why? Because the validation using SnapshotNow includes the effects of all the committed transactions as of the instant when validation is done, while MVCC validation considers the transactions that were in progress or unstarted when snapshot was taken as uncommitted.

Validation with SnapshotNow is good for performance as it limits the interactions within backends when operations are done on system tables, and consistency is protected by the necessary lock levels taken when performing a DDL operation.

Trying to solve such problems is particularly interesting for operations like concurrent DDLs (and even for other things) that need to be performed with low level locks, meaning that such DDLs can run while other database operations like common write/read process on a table. The first set of features coming to my mind that could be implemented on top of MVCC catalogs are: REINDEX CONCURRENTLY, CLUSTER CONCURRENTLY and ALTER TABLE CONCURRENTLY. Note that those operations can now be performed out-of-the-box up to a certain level with extensions like pg_reorg or pg_repack, but even those operations need to take an exclusive lock on the objects reorganized when performing switch on the old and new objects.

MVCC catalog access is a necessary milestone in making such features bullet-proof from the consistency point of view, but people are concerned about the performance impact it could have. Using the set of tests created and written by Robert Haas on the mail previously linked, I did some measurements with a 32-core machine to try to have results with large snapshots by having open transactions with up to 2000 connections. The results of those measurements are posted on the mailing list, but they are in a raw shape, not really user-friendly. So I thought that having them in a better reshaped on tables with performance comparison would be nice.

In order to test the backup startup, the following test is done.
$ time for s in `seq 1 1000`; do rm -f
bin/pgsql/master/pg_internal.init && psql -c 'SELECT 2+2' >/dev/null;
done

The measurements have the following characteristics.

  • Tests done while 250, 500, 1000 or 2000 remain open with a transaction ID acquired
  • Test 1 is the creation and drop of 100,000 within 1 backend
  • Test 2 is measure the time to open a backend and run a simple query on it with no relcache file
  • Performance difference percentage is evaluated with a simple formula: abs(nonMVCC – VMCC)/avg(nonVMCC, MVCC)

Those measurements do not really take advantage of the multi-core architecture of the mechine used as they run on a single backend, but well… That’s a good machine…

After re-treating the data…


Series of DROP/CREATE
CREATE (ms) DROP (ms)
Connections MVCC Non-MVCC % perf MVCC Non-MVCC % perf
250 26904.755 24554.849 9.133012372 32891.556 29755.146 10.01300914
500 29105.713 25872.886 11.76031059 33674.336 30434.019 10.10887582
1000 33281.246 28178.21 16.60618669 36618.166 31747.451 14.24901936
2000 39987.815 28708.095 32.83956789 43157.006 32510.057 28.14156802
Backend startup time
Connections MVCC real CPU (s) Non-MVCC real CPU (s) % perf
250 9.297 9.124 1.878291081
500 9.592 9.221 3.94408122
1000 9.746 9.302 4.661906762
2000 9.974 9.113 9.021847331

With a really high number of connections, the performance loss is close 30% for the DDL case and 10% for the connection case. Those test cases are really the worst scenarios possible when working with DDL, as you would hardly find someone dropping that many objects at the same time (btw why not?), and users would use a connection pooling facility before reaching such a number of active connections. For a *low* number of open connections, this performance loss is really low for the backend startup case. For the DDL case, a loss of 10% can be seen even with a low number of connections but just by thinking the possible functionality gain if MVCC catalogs are implemented this looks acceptable, no?

Prior to PostgreSQL 9.3, there are two levels of locks allowing to control DML operations on a given set of rows for a transaction by using SELECT FOR SHARE and FOR UPDATE. Such locks taken on rows in a transaction block allow blocking INSERT/DELETE/UPDATE on those rows.
There is also a protocol between those lock levels. FOR UPDATE is equivalent to an exclusive lock on the row selected, meaning that no other backend can take a FOR UPDATE lock on the same row and waits until the other other transaction finishes. FOR SHARE means that all the other backends can take a FOR SHARE lock on those rows. No FOR UPDATE locks can be taken on rows already locked with FOR SHARE. It is also possible to use the NOWAIT option, making the server return an error if there is a wait situation.

PostgreSQL 9.3 introduces two new levels of locks: FOR KEY SHARE and FOR NO KEY UPDATE. This feature has been committed thanks to the perseverance of Alvaro Herrera after two years of effort. Really congratulations to Alvaro!
commit 0ac5ad5134f2769ccbaefec73844f8504c4d6182
Author: Alvaro Herrera
Date: Wed Jan 23 12:04:59 2013 -0300
 
Improve concurrency of foreign key locking
 
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
 
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.

The main point of this feature is to reduce lock contention for foreign key triggers, as now those ones use FOR KEY SHARE instead of FOR SHARE. Also, UPDATE commands that do not update columns related to the key of the tuple now take now a FOR NO KEY UPDATE, explaining the name of the lock. With this level of locking, UPDATE queries that do not involve columns of the tuple key can perform concurrently.

Honestly, with now 4 levels of locks, it is becoming complicated to remember which operation blocks or allows the other on the same tuple. So let’s make a couple of tests to determine what blocks what with a simple table with some data:
postgres=# CREATE TABLE aa AS SELECT 1 AS a;
SELECT 1

The test scenario is pretty simple: two client sessions trying to take a lock on the same tuple. Session 1 launches its commands first, then session 2, the goal being to see if session 2 takes the lock or waits for it.
Session 1:
BEGIN;
SELECT * FROM aa FOR $LOCK;

Then session 2 does that
SELECT * FROM aa FOR $LOCK;
$LOCK can be either FOR SHARE, FOR UPDATE, FOR NO KEY UPDATE or FOR KEY SHARE.

Here are the results:

Session 1
Locks UPDATE NO KEY UPDATE SHARE KEY SHARE
Session 2 UPDATE Waits Waits Waits Waits
NO KEY UPDATE Waits Waits Waits OK
SHARE Waits Waits OK OK
KEY SHARE Waits OK OK OK

I hope this table helps. Have fun.

Continuing with the new features planned for PostgreSQL 9.3, here are some explanations about a new COPY mode called FREEZE. This feature has been introduced by this commit.
commit 8de72b66a2edcf12c812de0a73bd50b6b7d81d62
Author: Simon Riggs
Date: Sat Dec 1 12:54:20 2012 +0000
 
COPY FREEZE and mark committed on fresh tables.
When a relfilenode is created in this subtransaction or
a committed child transaction and it cannot otherwise
be seen by our own process, mark tuples committed ahead
of transaction commit for all COPY commands in same
transaction. If FREEZE specified on COPY
and pre-conditions met then rows will also be frozen.
Both options designed to avoid revisiting rows after commit,
increasing performance of subsequent commands after
data load and upgrade. pg_restore changes later.
 
Simon Riggs, review comments from Heikki Linnakangas, Noah Misch and design
input from Tom Lane, Robert Haas and Kevin Grittner

With an additional tweak for CREATE TABLE.
commit 1f023f929702efc9fd4230267b0f0e8d72ba5067
Author: Simon Riggs
Date: Fri Dec 7 13:26:52 2012 +0000
 
Optimize COPY FREEZE with CREATE TABLE also.
 
Jeff Davis, additional test by me

This feature allows to insert rows already *frozen* during the COPY process, the same thing can be done with a VACUUM FREEZE after doing a normal COPY, but it is always nice to have the possibility to do that during the process, especially if a lot of data is loaded. Such *frozen* rows are already marked as committed at insert, which is good for performance but not that much for visibility, as the rows loaded are viewable from other sessions while being loaded. So this feature is limited to some particular conditions:

  • Table been freshly created or truncated in current subtransaction
  • No older snapshots
  • No open cursors

If one of those conditions is not satisfied, COPY will fail silently and return to a normal process.

Let’s see how this works with a simple set of data like that.
$ cat ~/desktop/data.txt
1,2
3,4
5,6
7,8

In the case of a normal COPY, you would get something like that (replace $HOME by your own local folder).
postgres=# CREATE TABLE aa (a int, b int);
CREATE TABLE
postgres=# COPY aa FROM '$HOME/desktop/data.txt' DELIMITER ',';
COPY 4
postgres=# SELECT xmin,xmax,a,b FROM aa WHERE a = 1;
xmin | xmax | a | b
------+------+---+---
687 | 0 | 1 | 2
(1 row)

Have a look at the xmin value, it has been set to the XID of the transaction used.

A COPY FREEZE needs to be done on a fresh table, so you can use it like this.
postgres=# BEGIN;
BEGIN
postgres=# CREATE TABLE aa (a int, b int);
CREATE TABLE
postgres=# COPY aa FROM '$HOME/desktop/data.txt' DELIMITER ',' FREEZE;
COPY 4
postgres=# COMMIT;
COMMIT
postgres=# SELECT xmin,xmax,a,b FROM aa WHERE a = 1;
xmin | xmax | a | b
------+------+---+---
2 | 0 | 1 | 2
(1 row)

xmin has been aggressively set to 2.

Note that this also works with TRUNCATE.
postgres=# BEGIN;
BEGIN
postgres=# TRUNCATE aa;
TRUNCATE TABLE
postgres=# COPY aa FROM '/home/ioltas/desktop/data.txt' DELIMITER ',' FREEZE;
COPY 4
postgres=# COMMIT;
COMMIT
postgres=# SELECT xmin,xmax,a,b FROM aa WHERE a = 1;
xmin | xmax | a | b
------+------+---+---
2 | 0 | 1 | 2
(1 row)

Let’s see if the conditions to perform the FREEZE are not met:
postgres=# TRUNCATE aa;
TRUNCATE TABLE
postgres=# BEGIN;
BEGIN
postgres=# COPY aa FROM '/home/ioltas/desktop/data.txt' DELIMITER ',' FREEZE;
COPY 4
postgres=# COMMIT;
COMMIT
postgres=# SELECT xmin,xmax,a,b FROM aa WHERE a = 1;
xmin | xmax | a | b
------+------+---+---
692 | 0 | 1 | 2
(1 row)

Note once again that xmin is set normally.

And it looks to be all for this new feature, have fun with it.

PostgreSQL 9.2 introduces a new performance feature called Index-Only scans, which was really something missing in core for performance of scan index.
Here is the commit that introduced the feature in core.
commit a2822fb9337a21f98ac4ce850bb4145acf47ca27
Author: Tom Lane
Date: Fri Oct 7 20:13:02 2011 -0400
 
Support index-only scans using the visibility map to avoid heap fetches.
 
When a btree index contains all columns required by the query, and the
visibility map shows that all tuples on a target heap page are
visible-to-all, we don't need to fetch that heap page. This patch depends
on the previous patches that made the visibility map reliable.
 
There's a fair amount left to do here, notably trying to figure out a less
chintzy way of estimating the cost of an index-only scan, but the core
functionality seems ready to commit.
 
Robert Haas and Ibrar Ahmed, with some previous work by Heikki Linnakangas.

In a couple of words, what does this feature do?
Well, when reading data from a tuple in PostgreSQL which is part of an index, you need to perform an operation called an Index Scan.
This scan will return an index to tuples that might be part of the result. Why might? Because at the moment you are running your read query, the indexed tuples might have been modified by other transactions with a DML (INSERT, UPDATE, DELETE). As you are not sure that the data indexed is really the one you can use or not, you need to fetch the page of the table data and check if the wanted tuple row is visible to your transaction or not.

The commit message talks about “visibility map”, which is a feature implemented since PostgreSQL 8.4, which allows to keep tracking of which pages contains only tuples that are visible to all the transactions (no data modified since latest vacuum cleanup for example). What this commit simply does is to check if the page that needs to be consulted is older than the transaction running.
If page is older, it means that the tuple on this page is visible and you do not need to fetch the page and the data, improving your performance due to the data you do not fetch directly from page. This operation of skipping the page scan is called an “Index-only scan”.
If page is newer, well it means that the tuple to be consulted has been modified by another transaction and you need to fetch the latest information to protect data consistency. This is equivalent to a simple “Index Scan”.

First, let’s take a simple example to help understanding it (tests with PostgreSQL 9.2beta2).
postgres=# CREATE TABLE aa (a int, b int, c int);
CREATE TABLE
postgres=# INSERT INTO aa VALUES
postgres=# (1,generate_series(1,1000000),generate_series(1,1000000));
INSERT 0 1000000
postgres=# CREATE INDEX aa_i ON aa (a,b,c);
CREATE INDEX
postgres=# SELECT a,b,c FROM aa WHERE a = 1 order by b;

In the case of the SELECT query on table aa, the index you would instinctively define is on columns a and b. The SELECT query is performing a scan on those columns values, so it is enough to have an index on them and fetch related data directly.
However, you might also consider to define an index directly on columns a, b and c, and then use the Index-only scan feature to avoid having to fetch all the tuples in your table if not necessary. One of the disadvantages is that you create a larger index, so you should consider case by case if your performance gain is worth using this functionality or not.

Just for reference, the EXPLAIN plan changes as follows regarding the cases for the two cases.
postgres=# EXPLAIN SELECT a,b,c FROM aa WHERE a = 1 ORDER BY b;
QUERY PLAN
----------------------------------------------------------------------------------
 Index Only Scan using aa_indexonly on aa (cost=0.00..169.29 rows=5000 width=12)
  Index Cond: (a = 1)
(2 rows)
postgres=# SET enable_indexonlyscan TO false;
SET
postgres=# EXPLAIN SELECT a,b,c FROM aa WHERE a = 1 ORDER BY b;
QUERY PLAN
--------------------------------------------------------------------------
 Index Scan using aa_i on aa (cost=0.00..45416.85 rows=1000000 width=12)
  Index Cond: (a = 1)
(2 rows)

enable_indexonlyscan is a switch that can be used to control this feature. The difference between the two plans is the use of the keyword “Only”.

Then, what about the performance gain with this feature? Let’s use the example above of table aa with 10,000,000 rows inserted. (Note: a scan on so many tuples is not recommended in an application, this example is only used to show the performance of Index-only scans),
postgres=# insert into aa values
postgres=# (1,generate_series(1,10000000),generate_series(1,10000000));
INSERT 0 10000000
postgres=# vacuum;
VACUUM
postgres=# EXPLAIN ANALYZE SELECT a,b,c FROM aa WHERE a = 1 ORDER BY b;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Sort (cost=61745.60..61870.60 rows=50000 width=12) (actual time=8108.138..9426.384 rows=10000000 loops=1)
  Sort Key: b
  Sort Method: external sort Disk: 215064kB
  -> Bitmap Heap Scan on aa (cost=1175.15..56985.69 rows=50000 width=12) (actual time=1113.548..2937.281 rows=10000000 loops=1)
    Recheck Cond: (a = 1)
    -> Bitmap Index Scan on aa_i (cost=0.00..1162.65 rows=50000 width=0) (actual time=1111.961..1111.961 rows=10000000 loops=1)
      Index Cond: (a = 1)
Total runtime: 9849.555 ms
(8 rows)
postgres=# SET enable_indexonlyscan TO true;
SET
postgres=# EXPLAIN ANALYZE SELECT a,b,c FROM aa WHERE a = 1 ORDER BY b;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using aa_i on aa (cost=0.00..329041.35 rows=10000097 width=12) (actual time=0.039..1701.827 rows=10000000 loops=1)
  Index Cond: (a = 1)
  Heap Fetches: 0
Total runtime: 2092.925 ms
(4 rows)

It took 5 times less to perform the scan on the whole table by scanning only index, and no tuples have been fetched in the case of Index-only scan.
Once again and to conclude this post, this feature is a great performance achievement. But never forget to consider the balance between creating larger indexes and the performance Index-Only scans will make you gain.

©2010-2013 Michael Paquier All content is ©Copyright of Otacoo.com 2010-2013. Privacy Policy - Terms of Use