Oct 21, 2016 8:14 pm EDT

Choose your Postgres

Choose your Elasticsearch

Choose your Linux


This is ZomboDB v3.1. It builds on the improvements brought by v3.0, but provides much better performance under high concurrency along with removing the dependency on Postgres' "visibility map".

v3.1 will also be able to work with Citus Community and Citus Enterprise, version 6.0 when they're released.


If upgrading from previous v3.1.0_BETAs, a REINDEX is required in order to resolve possible effects from Issue #152

Upgrade Notes

A REINDEX of all USING zombodb indexes is required after upgrading.

It's also advised that you VACUUM FREEZE all your tables before you REINDEX any ZomboDB indexes. ZomboDB stores the transaction id of all committed transactions that touched rows in the base table. Freezing the table will make that list of transaction ids a size of one, which is a good place to start.

Resolved Issues

  • Issue #35: The "shard_size" can now be set when using zdb_tally()
  • Issue #141: Proximity queries that use an OR list can now be highlighted
  • Issue #145: Elasticsearch index names are now forced to lower-case
  • Issue #148: Array fields that contain NULL elements can now be searched
  • Issue #143: Can now include arbitrary Elasticsearch query JSON in any ZomboDB query
  • Issue #150: Any Elasticsearch aggregation can be used via direct JSON support (zdb_json_aggregate())
  • Issue #152: (auto)vacuum won't occasionally corrupt replica shards anymore

Architectural Changes

v3.0, when it needs to resolve row visibility, actually crawls through the entire heap (with the help of Postgres' "visibility map") to build a set of rows that are invisible to the current transaction. It then sends this list to Elasticsearch as part of aggregate and count estimation queries so Elasticsearch can exclude those rows.

This turned out to be terrible for performance under high UPDATE load for two major reasons. The first is that it simply takes a lot of time (and disk I/O) to walk the entire heap (especially for very large tables). The other reason is that Elasticsearch isn't necessarily good at looking up potentially millions of terms just to exclude them from the search results.

v3.1 takes a drastically different approach that allows Elasticsearch to resolve record visibility in a horizontally-scalable manner.

ZomboDB v3.1 carefully controls document _routing in the Elasticsearch indexes such that subsequent UPDATEs to a document are placed on the same shard as the original INSERTed document. It does this by a) placing a BEFORE UPDATE trigger on each table with a ZomboDB index that tracks the ctid of the row being UPDATEd, and b) implementing a custom _bulk endpoint that looks up the _routing value for the previous ctid. The routing value is applied to the new, updated document, and is carried forward as subsequent updates occur.

As a search-performance optimization, when a document is being UPDATEd, ZomboDB also creates a 1-field document (in a type called state) that simply contains the routing value. This document is also routed to the same shard as the underlying document it represents. ZomboDB tracks this state separately so it can quickly identify document groups (docs with the same _routing value) that have been UPDATEd, which allows it to perform much less work when resolving row visibility.

Additionally, ZomboDB stores the transaction id (as an epoch-encoded 64bit integer) that INSERTed or UPDATEd the document. This is a field added to the actual document.

In order for ZomboDB to resolve visibility in Elasticsearch it also needs to know which transactions have actually committed in Postgres. As such, ZomboDB creates a type in each index called committed that contains a 1-field document for each transaction id that has committed. This document is added to each shard, so that committed transaction ids are available to all shards during a search.

When a query requires that visibility be resolved outside of Postgres (generally just aggregates and count estimation), ZomboDB includes the minimum transaction id that is known-to-be visible to that transaction (xmin), the maximum transaction id above which no rows are visible (xmax), and the list of active transactions.

Through the use of a custom Lucene Query object, ZomboDB then builds a BitSet of documents that are invisible to the transaction, and as such, picks the most recent version of any UPDATEd document that is visible to the current transaction. "most recent version" is determined by the _xid of the document. The document with the highest _xid, that has been committed, isn't currently active, and is below the xmin threshold, is the most recent version that is visible to the current transaction.

This process happens on each shard. So if you have multiple shards (default=5), ZomboDB resolves row visibility in parallel across your cluster.

When a VACUUM occurs (either manually or via autovacuum), the dead rows are removed from the data type in the index and their pointers in the state type are also removed. And then Postgres marks those ctids as free in the heap.

A Note on DELETE

v3.1 does not track rows that are DELETEd. They remain in the index until a VACUUM (or autovacuum) removes them.

This is a significant change from v3.0 (and the v2.6 series) in that aggregates and count estimations will continue to include deleted rows in their results.

Normal SELECT statements will never return DELETEd rows because Postgres resolves row visibility in this case.

This seems like a fair trade-off for the drastically improved performance v3.1 brings, especially with large tables.

A Note on autovacuum

It's still important that autovacuum be configured to be aggressive. The fewer dead rows any given heap (and as such, ZomboDB indexes) has, the better query performance is going to be.


Big thanks go to TCDI for their continued support and testing, @pickypg for answering all sorts of random questions, and @nz for playing the role of a rubber duck.

Stop waiting.
Start searching.

Get Started Today