hstore is a PostgreSQL contrib module in core code for a pretty long time. Its code is located in contrib/hstore in source folder. It is particularly useful to store sets of key/value in a single table column.

Since Postgres 9.1, its installation needs to be done in two phases.
First install the hstore library, here done from the source code. Please note that your postgres package normally already contains it.
So after downloading the source code and installing the core, do the following commands.
cd $PG_SOURCE_ROOT
cd contrib/hstore
make install

At this point all the libraries and files related to hstore are installaed in $INSTALL_FOLDER/share/extension.
ls $INSTALL_FOLDER/share/extension
hstore--1.0--1.1.sql hstore--1.1.sql hstore--unpackaged--1.0.sql hstore.control

Then connect to your Postgres server and finish hstore installation with CREATE EXTENSION command.
postgres=# CREATE EXTENSION hstore;
CREATE EXTENSION
postgres=# \dx hstore
List of installed extensions
Name | Version | Schema | Description
--------+---------+--------+--------------------------------------------------
hstore | 1.1 | public | data type for storing sets of (key, value) pairs
(1 row)

With a psql client, ‘\dx’ allows to check the list of extensions already installed on your server.

A new column type called hstore has been added. This is the column used to store the list of key/value pairs for the table.
Let’s take a table referencing a list of products as an example.
postgres=# create table products (id serial, characs hstore);
CREATE TABLE

The insertion of data can be done with several methods. Here are some of them.
postgres=# -- common insertion
postgres=# INSERT INTO products(characs) VALUES ('author=>Dave, date=>"Dec 2012", price=>"500", currency=>"dollar"');
INSERT 0 1
postgres=# -- array-based insertion
postgres=# INSERT INTO products (characs) VALUES (hstore(array['author','date','stock'],array['Mike','Nov 2012','200']));
INSERT 0 1
postgres=# -- single-pair insertion
postgres=# INSERT INTO products (characs) VALUES (hstore('author','Kim')); -- single-element
INSERT 0 1
postgres=# SELECT * FROM products;
id | characs
----+----------------------------------------------------------------------------
1 | "date"=>"Dec 2012", "price"=>"500", "author"=>"Dave", "currency"=>"dollar"
2 | "date"=>"Nov 2012", "stock"=>"200", "author"=>"Mike"
3 | "author"=>"Kim"
(3 rows)

Based on the existing fields, it is also possible to add or update values for a given key using a concatenate-based method.
Here is the update of a key “author”.
postgres=# UPDATE products SET characs = characs || 'author=>Sarah'::hstore where id = 1;
UPDATE 1
postgres=# select * from products where id = 1;
id | characs
----+-----------------------------------------------------------------------------
1 | "date"=>"Dec 2012", "price"=>"500", "author"=>"Sarah", "currency"=>"dollar"
(1 row)

If the key updated is not present in existing list, it is simply added as a new element.

You can also delete single elements.
postgres=# UPDATE products SET characs = delete(characs,'price') where id = 1;
UPDATE 1
postgres=# select * from products where id = 1;
id | characs
----+-------------------------------------------------------------
1 | "date"=>"Dec 2012", "author"=>"Sarah", "currency"=>"dollar"
(1 row)

SELECT query can also use key-based scan. Here for instance this query looks for the products with less than 300 stocks.
postgres=# SELECT id FROM products WHERE (characs->'stock')::int <= 300;
id
----
2
(1 row)

(OK, not the best way of doing for your application but this is just a scholar example!).

On top of that, hstore also supports gin and gist indexes for the operators @>, ?, ?& and ?, as well as btree and hash indexes for '='.
postgres=# create index products_index on products(characs);
CREATE INDEX

Hope this gives a good introduction to hstore.

Last week, I had an interesting discussion in the Postgres hackers mailing list about integrating pg_reorg features (possibility to reorganize a table without locks on it) directly into postgres core. Community strongly suggested that pg_reorg cannot be integrated as-is in the contribution modules of postgres core, and instead postgres should provide native ways to reorganize a table without taking heavy locks. This means that a table could be reindexed or clustered, and at the same time read and writes operations could still happen in parallel. What is particularly useful when an index is broken in a production database, as you could keep your table free of access for the other sessions running while the table is reorganized.

So, the following suggestions have been made:

  • Implementation of CLUSTER CONCURRENTLY
  • Implementation of REINDEX CONCURRENTLY
  • ALTER TABLE CONCURRENTLY
  • Extend autovacuum to perform REINDEX and CLUSTER in parallel automatically

ALTER TABLE, CLUSTER and REINDEX share a common thing: they need high-level locks to be performed. So there is a risk that the table being manipulated by one of those operations could not be accessible for a long time, especially of the table is huge. The locks taken would block read and/or write operations for the other sessions, what is not acceptable for production environment if a critical table is touched.

Working on ALTER TABLE might be a huge piece of work, CLUSTER and REINDEX look more accessible. So I took some week-end spare time while a typhoon was on Tokyo area to write some code and studied the case of REINDEX CONCURRENTLY. And I finished with a patch, yeah!

Here are more details about the feature proposed…
You can rebuild a table or an index concurrently with such commands:
REINDEX INDEX ind CONCURRENTLY;
REINDEX TABLE tab CONCURRENTLY;

REINDEX CONCURRENTLY has the following restrictions:

  • REINDEX [ DATABASE | SYSTEM ] cannot be run concurrently.
  • REINDEX CONCURRENTLY cannot run inside a transaction block.
  • Shared tables cannot be reindexed concurrently
  • indexes for exclusion constraints cannot be reindexed concurrently.
  • toast relations are reindexed non-concurrently when table reindex is done and that this table has toast relations

Here are more details about the algorithm used. Roughly, a secondary index is created in parallel of the first one, it is completed. Then the old and fresh indexes are switched. For a more complete description (the beginning of the process is similar to CREATE INDEX CONCURRENTLY):

  1. creation of a new index based on the same columns and restrictions as the index that is rebuilt (called here old index). This new index has as name $OLDINDEX_cct. So only a suffix _cct is added. It is marked as invalid and not ready
  2. Take session locks on old and new index(es), and the parent table to prevent unfortunate drops
  3. Commit and start a new transaction
  4. Wait until no running transactions could have the table open with the old list of indexes
  5. Build the new indexes. All the new indexes are marked as indisready
  6. Commit and start a new transaction
  7. Wait until no running transactions could have the table open with the old list of indexes
  8. Take a reference snapshot and validate the new indexes
  9. Wait for the old snapshots based on the reference snapshot
  10. mark the new indexes as indisvalid
  11. Commit and start a new transaction. At this point the old and new indexes are both valid
  12. Take a new reference snapshot and wait for the old snapshots to insure that old indexes are not corrupted,
  13. Mark the old indexes as invalid
  14. Swap new and old indexes, consisting here in switching their names.
  15. Old indexes are marked as invalid.
  16. Commit and start a new transaction
  17. Wait for transactions that might use the old indexes
  18. Old indexes are marked as not ready
  19. Commit and start a new transaction
  20. Drop the old indexes

This feature will be normally submitted for review to the PostgreSQL 9.3 commit fest. For the time being patch has been given to community.

Some technical details…

  • A new set of functions has been created in index.c to manage concurrent operations.
  • Code is relying a maximum on existing index creation, building and validation functions for maintainability.
  • Documentation, as well as regression tests have been added in the first version of the patch.
  • Concurrent operations are longer, require additional CPU, IO and memory but they are lock free. The parent relation and indexes cannot be dropped during process.
  • If an error occurs during process, the table will finish with invalid indexes (marked with suffix _cct in their names). It is the responsability of the user to drop them.
  • If you are looking for the patch, have a look here.

Please note that those specification notes are based on the first version of the patch proposed, and are subject to change depending on the community and reviewers’ feedback.

Edit 2012/10/14: A new version of the patch has been submitted with the following enhancements:

  • Support for toast relations to be reindexed concurrently as well as other indexes
  • Correction of drop behavior for constraint indexes
  • Correction of bugs
  • Support for exclusion constraints, looks to work as far as tested

The patch has been submitted to pgsql-hackers in this email.

pg_reorg is a postgresql module developped and maintained by NTT that allows to redistribute a table without taking locks on it.
The code is hosted by pg_foundry here.
However, pgfoundry uses CVS :( , so I am also maintaining a fork in github in sync with pgfoundry here.

What pg_reorg can do for you is to reorganize a whole table in the same fashion way as a CLUSTER or a VACUUM FULL, while allowing write operations on the table being reorganized at the same time. No locks are needed.

Once you have downloaded the code, you just need to install it on your server.
cd $CODE_FOLDER
make install

Then install the EXTENSION module (for version upper than 9.1) after connecting to the postgres server.
CREATE EXTENSION pg_reorg;

Then, it is possible to perform several types of operations.
CLUSTER reorganization on the table $TABLE.
pg_reorg --dbname $DATABASE -t $TABLE
VACUUM FULL reorganization on the table $TABLE.
pg_reorg --dbname $DATABASE -t $TABLE -n
Reorganization of an entire database.
pg_reorg --dbname $DATABASE

The main limitation of this utility is that table being redistributed needs to have a primary key or a non-null unique key.

Then, a little bit more about the technique it uses to reorganize the table.
Basically, a temporary copy of the table to be redistributed is created using a CREATE TABLE AS query. The CTAS query definition is changed depending on the distribution user wants. For example, if user wants a redistribution using a different column (option -o), the CTAS is completed with an ORDER BY clause on the wanted column. The indexes of the new table depend on what the user wants.

Then the following operations are done.

  • creation of triggers to register all the DMLs that occur on the former table to an intermediate log table
  • creation of indexes on the temporary table based on what the user wants (new column index, VACUUM FULL…)
  • Apply the logs registered during the index creation and wait for old transactions to finish
  • Swap the names between the freshly-created table and old table
  • Drop the useless objects: the old table, the old triggers and remaining objects

This functionality is particularly handy when you wish to reorganize a huge table. Performing a VACUUM/CLUSTER on it might take time, and your application might need this table to be accessible in write for a maximum amount of time. So pretty useful, uh?

PostgreSQL 9.2 introduces a new performance feature called Index-Only scans, which was really something missing in core for performance of scan index.
Here is the commit that introduced the feature in core.
commit a2822fb9337a21f98ac4ce850bb4145acf47ca27
Author: Tom Lane
Date: Fri Oct 7 20:13:02 2011 -0400
 
Support index-only scans using the visibility map to avoid heap fetches.
 
When a btree index contains all columns required by the query, and the
visibility map shows that all tuples on a target heap page are
visible-to-all, we don't need to fetch that heap page. This patch depends
on the previous patches that made the visibility map reliable.
 
There's a fair amount left to do here, notably trying to figure out a less
chintzy way of estimating the cost of an index-only scan, but the core
functionality seems ready to commit.
 
Robert Haas and Ibrar Ahmed, with some previous work by Heikki Linnakangas.

In a couple of words, what does this feature do?
Well, when reading data from a tuple in PostgreSQL which is part of an index, you need to perform an operation called an Index Scan.
This scan will return an index to tuples that might be part of the result. Why might? Because at the moment you are running your read query, the indexed tuples might have been modified by other transactions with a DML (INSERT, UPDATE, DELETE). As you are not sure that the data indexed is really the one you can use or not, you need to fetch the page of the table data and check if the wanted tuple row is visible to your transaction or not.

The commit message talks about “visibility map”, which is a feature implemented since PostgreSQL 8.4, which allows to keep tracking of which pages contains only tuples that are visible to all the transactions (no data modified since latest vacuum cleanup for example). What this commit simply does is to check if the page that needs to be consulted is older than the transaction running.
If page is older, it means that the tuple on this page is visible and you do not need to fetch the page and the data, improving your performance due to the data you do not fetch directly from page. This operation of skipping the page scan is called an “Index-only scan”.
If page is newer, well it means that the tuple to be consulted has been modified by another transaction and you need to fetch the latest information to protect data consistency. This is equivalent to a simple “Index Scan”.

First, let’s take a simple example to help understanding it (tests with PostgreSQL 9.2beta2).
postgres=# CREATE TABLE aa (a int, b int, c int);
CREATE TABLE
postgres=# INSERT INTO aa VALUES
postgres=# (1,generate_series(1,1000000),generate_series(1,1000000));
INSERT 0 1000000
postgres=# CREATE INDEX aa_i ON aa (a,b,c);
CREATE INDEX
postgres=# SELECT a,b,c FROM aa WHERE a = 1 order by b;

In the case of the SELECT query on table aa, the index you would instinctively define is on columns a and b. The SELECT query is performing a scan on those columns values, so it is enough to have an index on them and fetch related data directly.
However, you might also consider to define an index directly on columns a, b and c, and then use the Index-only scan feature to avoid having to fetch all the tuples in your table if not necessary. One of the disadvantages is that you create a larger index, so you should consider case by case if your performance gain is worth using this functionality or not.

Just for reference, the EXPLAIN plan changes as follows regarding the cases for the two cases.
postgres=# EXPLAIN SELECT a,b,c FROM aa WHERE a = 1 ORDER BY b;
QUERY PLAN
----------------------------------------------------------------------------------
 Index Only Scan using aa_indexonly on aa (cost=0.00..169.29 rows=5000 width=12)
  Index Cond: (a = 1)
(2 rows)
postgres=# SET enable_indexonlyscan TO false;
SET
postgres=# EXPLAIN SELECT a,b,c FROM aa WHERE a = 1 ORDER BY b;
QUERY PLAN
--------------------------------------------------------------------------
 Index Scan using aa_i on aa (cost=0.00..45416.85 rows=1000000 width=12)
  Index Cond: (a = 1)
(2 rows)

enable_indexonlyscan is a switch that can be used to control this feature. The difference between the two plans is the use of the keyword “Only”.

Then, what about the performance gain with this feature? Let’s use the example above of table aa with 10,000,000 rows inserted. (Note: a scan on so many tuples is not recommended in an application, this example is only used to show the performance of Index-only scans),
postgres=# insert into aa values
postgres=# (1,generate_series(1,10000000),generate_series(1,10000000));
INSERT 0 10000000
postgres=# vacuum;
VACUUM
postgres=# EXPLAIN ANALYZE SELECT a,b,c FROM aa WHERE a = 1 ORDER BY b;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Sort (cost=61745.60..61870.60 rows=50000 width=12) (actual time=8108.138..9426.384 rows=10000000 loops=1)
  Sort Key: b
  Sort Method: external sort Disk: 215064kB
  -> Bitmap Heap Scan on aa (cost=1175.15..56985.69 rows=50000 width=12) (actual time=1113.548..2937.281 rows=10000000 loops=1)
    Recheck Cond: (a = 1)
    -> Bitmap Index Scan on aa_i (cost=0.00..1162.65 rows=50000 width=0) (actual time=1111.961..1111.961 rows=10000000 loops=1)
      Index Cond: (a = 1)
Total runtime: 9849.555 ms
(8 rows)
postgres=# SET enable_indexonlyscan TO true;
SET
postgres=# EXPLAIN ANALYZE SELECT a,b,c FROM aa WHERE a = 1 ORDER BY b;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using aa_i on aa (cost=0.00..329041.35 rows=10000097 width=12) (actual time=0.039..1701.827 rows=10000000 loops=1)
  Index Cond: (a = 1)
  Heap Fetches: 0
Total runtime: 2092.925 ms
(4 rows)

It took 5 times less to perform the scan on the whole table by scanning only index, and no tuples have been fetched in the case of Index-only scan.
Once again and to conclude this post, this feature is a great performance achievement. But never forget to consider the balance between creating larger indexes and the performance Index-Only scans will make you gain.

19th and 20th of May have been days of the PostgreSQL conference.
During those 2 days, I saw some good presentations and stuff like that may help (perhaps) in increasing my own database knowledge related to PostgreSQL.
By the way, for sure, what I heard from this conference will help me not only for my current work but also it could give ideas for future design tasks.

As a lucky one, there were never two presentations happening at the same time even if conference was on 3 tracks. So let’s take chronologically each presentation I had the chance to see. In this post I don’t give my impression about everything seen, but just on the main matters that I think have a relative importance to facilitate your lecture.

On the 19th I saw first a presentation about Sharding for unlimited growth, given by Robert Treat. Sharding is a technique that could bring scale growth to large database systems (millions of operations, users) through an horizontal scaling (scaling in/out, increase the number of database nodes in terms of servers, and not in term of local resources). The idea behind it is to try to bring to a database the possibility to grow without losing its scalability and resolve SPOF (single point of failure) problems within a database system. For this purpose a couple of solutions were proposed based on data mapping or on the division of applications data into various databases located on multiple nodes (for example a website application may have its user data and forum data on separated nodes). All the ideas based their assumptions on making the application taking care of data mapping, so the database does not need to do anything but just deal with data. So I would say that Sharding is an up-layer of a database application that is focused on the optimization of applications running on top of database node(s).

Then there was an interesting presentation about the review of patches by Stephen Frost. The goal of this presentation was to teach the attendance about all the tools and formats used within and for PostgreSQL. Useful stuff such as when you want to send a patch, who you should contact. If you want to help in reviewing a patch, you can contact such or such person. This presentation told also about the formatting used in PostgreSQL: code refactoring, code quality, code duplication. Still, if a project has no such an organization, it for sure can become a mess quickly. So I personally keep a good impression about it.

The first day was full of surprises, there were another 2 presentations that caught my attention: something about Foreign data wrapper and another by Tom Lane, “How to hack the planner”.
Always by being focused on what I do for Postgres-XC, I am not very familiar with the functionalities introduced since 9.0. So it was a pleasure to find a presentation that introduced the foreign data wrapper functionality and some additional stuff a Japanese functionality is developing based on the feature of 9.0. A foreign data wrapper adds functionalities to enable a Postgres server to interact with a remote database or remote data files and show it in a nice way in your PostgreSQL instance. For instance you can show cvs files stored somewhere directly on a psql terminal. By the way, the presentation by Yotaro Nakayama shew a couple of additional features for foreign data wrappers: the capacity to interact with additional database systems and not only Postgres instances. His team has developed some extra features to be able to create foreign tables that can be seen from Oracle or MySQL instances. This consists more or less of taking into account the specificities used in each db softwares and to translate them in a Postgres-way. Fascinating. For the impression I keep, it looked that the development was at a fairly advanced stage but it wasn’t in the plans of Nakayama’s team to release publicly the work done :( .

By the end of the day came a presentation about PostgreSQL planner. For sure the presentation which was the most difficult to access to not only by the level of understanding which is necessary to understand what is dealt about but also by the quantity of information that has been discussed about. So in two terms this presentations can be qualified as: qualitative and quantitative. The planner of PostgreSQL is perhaps the hardest part of the code in terms of complexity, so making a presentation about it is even more complex. The presentation begun with some general explanation about Postgres’ parser/rewriter/planner/executor but after a couple of minutes quickly came the main dish, and the audience became aware of how planner is complicated not by its general way of working, but by all the cases that have to be taken into account in the most generalized way in their implementation to increase dependencies between each case. However, some cases such as the analysis of JOIN planning made the comprehension even easier. Some general explanation about the key structures also came at the good time to light up the basics of planning. The part that personally caught the most my attention was about costs planning of queries, and particularly the fact that sometimes a cost estimation could lead to cost higher than expected (case of LIMIT). However, to conclude on it, there are still areas of improvement of planner and Postgres is in need of people who could work on it.

The second day, one presentation in particular caught my attention. PostgreSQL 9.1 introduces SSI level, serializable snapshot isolation. One result is particular is amazing… Let’s tell more about that. In serializable transactions, you have to take care of cycles of transactions due to their read/write conflicts. For example, let’s imagine that you have a transaction performing a read on a tuple being written (by DML, UPDATE, DELETE, INSERT) by another transaction, you need to check if the transaction performing the write does not perform a read on a tuple being modified by a third transaction… This continuing until you know that there are no transactions trying to perform a read on something that has been modified by the first transaction. If you have a read/write conflict cycle, you need to abort one transaction to break the cycle and save all the other transactions from a deadlock condition. However, in order to check that, you have to go through all the transactions that could enter in the cycle, which is really resource consuming. By the way, the idea that caught my mind was that you do not need to check that all the cycle of transactions. You just need to check if the transaction you are on has not at the same time an in and an out read/write conflict. An in read/write conflict means that your transaction reads something that is being updated. An out read/write conflict means that what is updated by your transaction is not read. In case your transaction has at the same time and in and out read/write conflict, you need to abort something on the cycle it is on. However, if such a check is made on each transaction, doesn’t it increase the number of transaction being aborted, as there could be transactions in a semi cycle not closed, what would not need to be aborted, but would be aborted to satisfy the SSI check?

©2010-2013 Michael Paquier All content is ©Copyright of Otacoo.com 2010-2013. Privacy Policy - Terms of Use