This is ZomboDB v3.2.0. The primary focus of this release is improving general search and aggregation function performance, especially when the system is under high write load.
This release does necessitate that your existing
USING zombodb indexes be dropped and re-created as fundamental changes to the underlying Elasticsearch index mappings and types have been made.
The process for upgrading from a previous release should follow these steps:
First off, make sure you have exclusive access to the databases that use ZomboDB during the entire upgrade process. This is important so we can ensure no other Postgres sessions are trying to modify your tables.
DROP INDEXfor every existing ZomboDB index in all databases. Make sure you have the
CREATE INDEXstatements saved somewhere, or use
pg_get_indexdef('index_name'::regclass)before dropping them
VACUUM TABLEfor every table that had a ZomboDB index. The purpose of this is to remove all dead rows from the table so they're not indexed by ZomboDB
Make sure you've closed any open
psql (or other admin tool) sessions that might still be open from above.
USING zombodbindexes you dropped from above
zdb_estimate_count(). Previous versions of ZomboDB would include deleted rows in aggregation results until (auto)vacuum removed them from the table
zdb_index_stats_fastto show document counts for the new
Without going into great detail about how Postgres manages MVCC, this section will attempt to explain how ZomboDB maps Postgres' MVCC into Elasticsearch so that concurrent sessions always see the right rows, even when Elasticsearch needs to full resolve the MVCC visibility rules.
First off, ZomboDB defines three types within each Elasticsearch index –
data contains the actual row data from the table, along with some tracking columns, most importantly the Postgres system columns
_cmin in the ES index) and a compact, byte-encoded representation of the Postgres heap tuple, named
_zdb_encoded_tuple. ZomboDB also tracks, as a separate field, the BlockNumber from the Postgres table where the row resides (in a field named
_zdb_blockno). There is one doc in this type for every row in the Postgres table.
xmax contains the Postgres system columns
_cmax in the ES index). Docs for this type are only created when Postgres changes the
xmax of a row. This happens to the original row during an UPDATE, and to the exact row during a DELETE. Similarly to the "data" type, "xmax" also contains a compact, byte-encoded representation named
_zdb_encoded_tuple. The changing of "xmax" values is tracked through update and delete triggers automatically installed on every table with a ZomboDB index.
For both types above, the Elasticsearch
_id field is set to the Postgres heap tuple item pointer, which (for ZomboDB) is in the form of
BlockNumber-OffsetNumber. This is the same value (slightly reformatted) as the Postgres system column
As such, when an "xmax" doc is created, it is routed to the same shard that its corresponding "data" doc already lives. This is important for resolving visibility through a custom Lucene Query – more later.
abortedcontains a transient set of (64-bit epoch encoded) Postgres TransactionIds, where each is explicitly copied to each shard (10 shards means 10 copies of the doc). A doc is added to this type when an index is first modified (INSERT,UPDATE,DELETE) in a transaction, and if the transaction commits, the doc is then removed. The
_idfield is simply the transaction id, and there's another field named
_zdb_xidthat contains the same value.
When executing a query that requires ZomboDB to resolve MVCC visibility entirely within Elasticsearch (typically aggregation functions or queries with advanced constructs such as index linking or
#expand()), ZomboDB's custom Lucene Query executes on each shard to decide which docs in the index are not visible to the current transaction.
First of all, the custom Lucene Query gets a list of all the transaction ids (
_zdb_xid) from the "aborted" type. In general, this will be a small list.
Then, the custom Query collects all the docs from "xmax". These tell our visibility resolution logic which documents have been modified in some way, and lets it determine if the
_xmax value has made the document invisible to the current transaction.
Next the Query then collects all the docs from "data" that reside on the same Postgres heap block as any of the docs found above from the "xmax" type. Depending on the number of "xmax" docs relative to the number of "data" docs, ZomboDB may decide it's just more efficient to collect all the "data" docs.
It then merges this information together to represent a full view of the "heap tuple" as it would be in the Postgres table heap at that point in time.
From there, each of the collected "data" docs, with their corresponding "xmax" values (if any), and the set of aborted transaction ids, are evaluated using this logic
(xmin == myXid && cmin < myCommand && (xmax_is_null || (xmax == myXid && cmax >= myCommand)))
(xmin_is_committed && (xmax_is_null || (xmax == myXid && cmax >= myCommand) || (xmax != myXid && !xmax_is_committed)))
myCommand represent the values of the Postgres Snapshot executing the query.
In all of the above, the
_zdb_encoded_tuple field is used to efficiently determine the xmin/xmax/cmin/cmax values for each doc (this has proved to be extremely efficient in situations where there are millions of "xmax" documents to evaluate).
If the combined tuple passes the test, it's visible to the transaction. If it doesn't, it isn't visible and is marked as such by the custom Lucene Query.
When a vacuum (or autovacuum) comes along, ZomboDB asks it about each
_id (ctid) that it indexes, and if VACUUM says it's dead, we delete it from both the "data" and "xmax" types. Additionally, when we can prove that an
_xmax is known-to-be-considered-aborted-by-all-current-and-future-transactions, we can delete every doc in "xmax" with that transaction id along with the entries from the "aborted" type. This keeps the total number of docs we need to track and evaluate for MVCC visibility in Elasticsearch to the bare minimum.
ZomboDB assumes that all transactions are aborted (until they actually commit), tracks xmax changes in real time via table triggers, and resolves visibility across all shards in parallel. It then cleans itself up when vacuum runs.
Previous versions of ZomboDB wanted autovacuum to be pretty aggressive. The reason for this was that ZomboDB wasn't necessarily very fast at resolving visibility when the table/index contained lots of dead rows. With v3.2.0, the landscape is quite a bit different.
Depending on the complexity of your query, and other factors (of course), v3.2.0 performs in millisecond/tens-of-millisecond times even when there are tens of thousands of dead rows in the index. As such, you might consider dialing up the
autovacuum_vacuum_threshold settings in
postgresql.conf. ZomboDB can now stand up to more dead rows than in the past, and it's also capable of using Elasticsearch's query cache in more situations. When queries are answered from the cache, no searching is actually done so it doesn't really matter how many dead rows are in the table.
Additionally, lessening the vacuum load on the Postgres server, which also reduces the number of Elasticsearch index refreshes ZomboDB needs to perform, can generally help all around with disk I/O. Note that it's still super important that your tables be vacuumed often for all the reasons that vacuuming is good – v3.2.0 just gives you a little more wiggle room around it.
zdb_index_stats are very useful for watching the state of the backing Elasticsearch index. The
_fast view gives you an estimate of the number of rows in the Postgres table, so it's generally preferred as doing a "count(*)" (what the non-
_fast view does) on a multi-million row tables can be really slow.
Paying attention to the
aborted_count columns from those views will give you an idea of how much "garbage" is still in the backing Elasticsearch index. Under concurrent load, with autovacuum configured, you should see these rise values rise during writes and drop when autovacuum runs.
Note that the
aborted_count column can be divided by the number of shards in your index to get the actual count of deleted transaction ids ZomboDB is tracking. As mentioned above, aborted transaction ids are copied to every shard.
When Postgres is idle (in terms of writes), you should see both of these columns reach zero. Because the "xmax" and "aborted" docs are removed during the "cleanup" phase of vacuum, it may be necessary to execute an explicit
VACUUM table before you'll see the columns actually hit zero.
Yeah, sorry. The new structure v3.2.0 uses to track and resolve MVCC visibility, which is really the central feature of ZomboDB, is really solid and hopefully future-proof. The hope and intent is that it'll be quite a long time before ZomboDB asks its users to reindex their data again.
Despite the need to reindex, I highly recommend that everyone upgrade, especially if you're on v3.1.13-3.1.15. v3.1.13 introduced some corruption bugs, and while v3.1.15 fixed them, it made the entire thing much slower.
v3.2.0 gets all this stuff right.