This is ZomboDB v3.1. It builds on the improvements brought by v3.0, but provides much better performance under high concurrency along with removing the dependency on Postgres' "visibility map".
v3.1 will also be able to work with Citus Community and Citus Enterprise, version 6.0 when they're released.
REINDEX of all
USING zombodb indexes is required after upgrading.
NULLelements can now be searched
v3.0, when it needs to resolve row visibility, actually crawls through the entire heap (with the help of Postgres' "visibility map") to build a set of rows that are invisible to the current transaction. It then sends this list to Elasticsearch as part of aggregate and count estimation queries so Elasticsearch can exclude those rows.
This turned out to be terrible for performance under high UPDATE load for two major reasons. The first is that it simply takes a lot of time (and disk I/O) to walk the entire heap (especially for very large tables). The other reason is that Elasticsearch isn't necessarily good at looking up potentially millions of terms just to exclude them from the search results.
v3.1 takes a drastically different approach that allows Elasticsearch to resolve record visibility in a horizontally-scalable manner.
ZomboDB v3.1 carefully controls document
_routing in the Elasticsearch indexes such that subsequent UPDATEs to a document are placed on the same shard as the original INSERTed document. It does this by a) placing a BEFORE UPDATE trigger on each table with a ZomboDB index that tracks the
ctid of the row being UPDATEd, and b) implementing a custom
_bulk endpoint that looks up the
_routing value for the previous
ctid. The routing value is applied to the new, updated document, and is carried forward as subsequent updates occur.
As a search-performance optimization, when a document is being UPDATEd, ZomboDB also creates a 1-field document (in a type called
state) that simply contains the routing value. This document is also routed to the same shard as the underlying document it represents. ZomboDB tracks this state separately so it can quickly identify document groups (docs with the same
_routing value) that have been UPDATEd, which allows it to perform much less work when resolving row visibility.
Additionally, ZomboDB stores the transaction id (as an epoch-encoded 64bit integer) that INSERTed or UPDATEd the document. This is a field added to the actual document.
In order for ZomboDB to resolve visibility in Elasticsearch it also needs to know which transactions have actually committed in Postgres. As such, ZomboDB creates a type in each index called
committed that contains a 1-field document for each transaction id that has committed. This document is added to each shard, so that committed transaction ids are available to all shards during a search.
When a query requires that visibility be resolved outside of Postgres (generally just aggregates and count estimation), ZomboDB includes the minimum transaction id that is known-to-be visible to that transaction (
xmin), the maximum transaction id above which no rows are visible (
xmax), and the list of active transactions.
Through the use of a custom Lucene Query object, ZomboDB then builds a BitSet of documents that are invisible to the transaction, and as such, picks the most recent version of any UPDATEd document that is visible to the current transaction. "most recent version" is determined by the
_xid of the document. The document with the highest
_xid, that has been committed, isn't currently active, and is below the
xmin threshold, is the most recent version that is visible to the current transaction.
This process happens on each shard. So if you have multiple shards (default=5), ZomboDB resolves row visibility in parallel across your cluster.
VACUUM occurs (either manually or via autovacuum), the dead rows are removed from the
data type in the index and their pointers in the
state type are also removed. And then Postgres marks those
ctids as free in the heap.
v3.1 does not track rows that are DELETEd. They remain in the index until a
VACUUM (or autovacuum) removes them.
This is a significant change from v3.0 (and the v2.6 series) in that aggregates and count estimations will continue to include deleted rows in their results.
SELECT statements will never return DELETEd rows because Postgres resolves row visibility in this case.
This seems like a fair trade-off for the drastically improved performance v3.1 brings, especially with large tables.
It's still important that autovacuum be configured to be aggressive. The fewer dead rows any given heap (and as such, ZomboDB indexes) has, the better query performance is going to be.
Big thanks go to TCDI for their continued support and testing, @pickypg for answering all sorts of random questions, and @nz for playing the role of a rubber duck.