# Aggregate Functions

ZomboDB exposes nearly all of Elasticsearch's aggregates as type-checked SQL functions that return tables and discreet values, as opposed to json blobs.

In all cases, unless explicitly otherwise noted, the results returned from all of the below aggregate functions are MVCC-correct. This means that the functions only operate against records that are considered visible to the current transaction.

## Arbitrary Aggregate Support

```
FUNCTION zdb.arbitrary_agg(
index regclass,
query zdbquery,
agg_json json)
RETURNS json
```

See Elastic's Aggregations documentation.

This function is a direct-path for executing any arbitrary aggregate search request that Elasticsearch supports.

The result is a json blob that can be processed in your application code or otherwise manipulated using Postgres json support functions.

## Single-Value Aggregates

```
FUNCTION zdb.avg(
index regclass,
field text,
query zdbquery)
RETURNS numeric
```

See Elastic's Avg Aggregation documentation.

A single-value metrics aggregation that computes the average of numeric values that are extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents.

```
FUNCTION zdb.cardinality(
index regclass,
field text,
query zdbquery)
RETURNS numeric
```

See Elastic's Cardinality Aggregation documentation.

A single-value metrics aggregation that calculates an approximate count of distinct values. Values can be extracted either from specific fields in the document.

```
FUNCTION zdb.count(
index regclass,
query zdbquery)
RETURNS bigint
```

See Elastic's Count API documentation.

Not an aggregate per se, this function is mapped to Elasticsearch's `_count`

endpoint and simply returns the number of documents that match the provided query. The result is MVCC-correct.

```
FUNCTION zdb.raw_count(
index regclass,
query zdbquery)
RETURNS bigint SET zdb.ignore_visibility = true
```

Similar to `zdb.count()`

above, but it ignores MVCC visibility rules, and the result is the actual count of documents matching the query, including deleted documents, documents from aborted transactions, old versions of documents from an UPDATE statement, and new/updated docs from in-flight transactions.

Generally you'll want to use `zdb.count()`

instead.

```
FUNCTION zdb.max(
index regclass,
field text,
query zdbquery)
RETURNS numeric
```

See Elastic's Max Aggregation documentation.

A single-value metrics aggregation that keeps track and returns the maximum value among the numeric values extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents.

```
FUNCTION zdb.min(
index regclass,
field text,
query zdbquery)
RETURNS numeric
```

See Elastic's Min Aggregation documentation.

A single-value metrics aggregation that keeps track and returns the minimum value among numeric values extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents.

```
FUNCTION zdb.missing(
index regclass,
field text,
query zdbquery)
RETURNS numeric
```

See Elastic's Missing Aggregation documentation.

A field data based single bucket aggregation, that creates a bucket of all documents in the current document set context that are missing a field value (effectively, missing a field or having the configured NULL value set).

```
FUNCTION zdb.sum(
index regclass,
field text,
query zdbquery)
RETURNS numeric
```

See Elastic's Sum Aggregation documentation.

A single-value metrics aggregation that sums up numeric values that are extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents.

```
FUNCTION zdb.value_count(
index regclass,
field text,
query zdbquery)
RETURNS numeric
```

See Elastic's Value Count Aggregation documentation.

A single-value metrics aggregation that counts the number of values that are extracted from the aggregated documents. These values can be extracted either from specific fields in the documents.

## Multi-Row/Column Aggregates

The following aggregates transform the results from Elasticsearch into "tables", and should all be queried as such. For example:

```
SELECT * FROM zdb.terms('idxproducts', 'tags', dsl.match_all());
```

```
FUNCTION zdb.adjacency_matrix(
index regclass,
labels text[],
filters zdbquery[])
RETURNS TABLE (
key text,
doc_count bigint)
```

See Elastic's Adjacency Matrix Aggregation documentation.

A bucket aggregation returning a form of adjacency matrix. The request provides a collection of named filter expressions, similar to the filters aggregation request. Each bucket in the response represents a non-empty cell in the matrix of intersecting filters.

The `labels`

and `filters`

arguments must have the same bounds.

```
FUNCTION zdb.adjacency_matrix_2x2(
index regclass,
labels text[],
filters zdbquery[])
RETURNS TABLE (
"-" text,
"1" text,
"2" text)
```

This is a table-based version of `zdb.adjacency_matrix()`

that outputs a 2x2 matrix.

The `labels`

and `filters`

arguments must have the same bounds.

```
FUNCTION zdb.adjacency_matrix_3x3(
index regclass,
labels text[],
filters zdbquery[])
RETURNS TABLE (
"-" text,
"1" text,
"2" text,
"3" text)
```

This is a table-based version of `zdb.adjacency_matrix()`

that outputs a 3x3 matrix.

The `labels`

and `filters`

arguments must have the same bounds.

```
FUNCTION zdb.adjacency_matrix_4x4(
index regclass,
labels text[],
filters zdbquery[])
RETURNS TABLE (
"-" text,
"1" text,
"2" text,
"3" text,
"4" text)
```

This is a table-based version of `zdb.adjacency_matrix()`

that outputs a 4x4 matrix.

The `labels`

and `filters`

arguments must have the same bounds.

```
FUNCTION zdb.adjacency_matrix_5x5(
index regclass,
labels text[],
filters zdbquery[])
RETURNS TABLE (
"-" text,
"1" text,
"2" text,
"3" text,
"4" text,
"5" text)
```

This is a table-based version of `zdb.adjacency_matrix()`

that outputs a 5x5 matrix.

The `labels`

and `filters`

arguments must have the same bounds.

```
FUNCTION zdb.date_histogram(
index regclass,
field text,
query zdbquery,
"interval" text,
format text DEFAULT 'yyyy-MM-dd')
RETURNS TABLE (
key numeric,
key_as_string text,
doc_count bigint)
```

See Elastic's Date Histogram Aggregation documentation.

A multi-bucket aggregation similar to the histogram except it can only be applied on date values. Since dates are represented in Elasticsearch internally as long values, it is possible to use the normal histogram on dates as well, though accuracy will be compromised. The reason for this is in the fact that time based intervals are not fixed (think of leap years and on the number of days in a month). For this reason, we need special support for time based data. From a functionality perspective, this histogram supports the same features as the normal histogram. The main difference is that the interval can be specified by date/time expressions.

```
FUNCTION zdb.date_range(
index regclass,
field text,
query zdbquery,
date_ranges_array json)
RETURNS TABLE (
key text,
"from" numeric,
from_as_string timestamp with time zone,
"to" numeric,
to_as_string timestamp with time zone,
doc_count bigint)
```

See Elastic's Date Range Aggregation documentation.

A range aggregation that is dedicated for date values. The main difference between this aggregation and the normal range aggregation is that the from and to values can be expressed in Date Math expressions, and it is also possible to specify a date format by which the from and to response fields will be returned. Note that this aggregation includes the from value and excludes the to value for each range.

```
FUNCTION zdb.extended_stats(
index regclass,
field text,
query zdbquery,
sigma int DEFAULT 0)
RETURNS TABLE (
count bigint,
min numeric,
max numeric,
avg numeric,
sum numeric,
sum_of_squares numeric,
variance numeric,
stddev numeric,
stddev_upper numeric,
stddev_lower numeric)
```

See Elastic's Extended Stats Aggregation documentation.

A multi-value metrics aggregation that computes stats over numeric values extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents.

```
FUNCTION zdb.filters(
index regclass,
labels text[],
filters zdbquery[])
RETURNS TABLE (
label text,
doc_count bigint)
```

See Elastic's Filters Aggregation documentation.

Similar to `zdb.count()`

except multiple queries (filters) are supported.

The `labels`

and `filters`

arguments must have the same bounds.

```
FUNCTION zdb.histogram(
index regclass,
field text,
query zdbquery,
"interval" float8)
RETURNS TABLE (
key numeric,
doc_count bigint)
```

See Elastic's Histogram Aggregation documentation.

A multi-bucket values source based aggregation that can be applied on numeric values extracted from the documents. It dynamically builds fixed size (a.k.a. interval) buckets over the values. For example, if the documents have a field that holds a price (numeric), we can configure this aggregation to dynamically build buckets with interval 5 (in case of price it may represent $5).

```
FUNCTION zdb.ip_range(
index regclass,
field text,
query zdbquery,
ip_ranges_array json)
RETURNS TABLE (
key text,
"from" inet,
"to" inet,
doc_count bigint)
```

See Elastic's IP Range Aggregation documentation.

Just like the dedicated date range aggregation, there is also a dedicated range aggregation for IP typed fields.

```
FUNCTION zdb.matrix_stats(
index regclass,
fields text[],
query zdbquery)
RETURNS TABLE (
name text,
count bigint,
mean numeric,
variance numeric,
skewness numeric,
kurtosis numeric,
covariance json,
correlation json)
```

See Elastic's Matrix Stats Aggregation documentation.

The matrix_stats aggregation is a numeric aggregation that computes various statistics over a set of document fields.

```
FUNCTION zdb.percentile_ranks(
index regclass,
field text,
query zdbquery,
"values" text DEFAULT '')
RETURNS TABLE (
percentile numeric,
value numeric)
```

See Elastic's Percentile Rank Aggregation documentation.

A multi-value metrics aggregation that calculates one or more percentile ranks over numeric values extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents.

```
FUNCTION zdb.percentiles(
index regclass,
field text,
query zdbquery,
percents text DEFAULT '')
RETURNS TABLE (
percentile numeric,
value numeric)
```

See Elastic's Percentile Aggregation documentation.

A multi-value metrics aggregation that calculates one or more percentiles over numeric values extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents.

```
FUNCTION zdb.range(
index regclass,
field text,
query zdbquery,
ranges_array json)
RETURNS TABLE (
key text,
"from" numeric,
"to" numeric,
doc_count bigint)
```

See Elastic's Range Aggregation documentation.

A multi-bucket value source based aggregation that enables the user to define a set of ranges - each representing a bucket. During the aggregation process, the values extracted from each document will be checked against each bucket range and "bucket" the relevant/matching document. Note that this aggregation includes the from value and excludes the to value for each range.

```
FUNCTION zdb.significant_terms(
index regclass,
field text,
query zdbquery)
RETURNS TABLE (
term text,
doc_count bigint,
score numeric,
bg_count bigint)
```

See Elastic's Significant Terms Aggregation documentation.

An aggregation that returns interesting or unusual occurrences of terms in a set.

```
FUNCTION zdb.significant_terms_two_level(
index regclass,
first_field text,
second_field text,
query zdbquery,
size bigint DEFAULT 0)
RETURNS TABLE (
first_term text,
second_term text,
doc_count bigint,
score numeric,
bg_count bigint,
doc_count_error_upper_bound bigint,
sum_other_doc_count bigint)
```

An adaption of `zdb.significant_terms()`

where it uses `zdb.terms()`

for the terms from `first_field`

and `zdb.significant_terms()`

for the terms for `second_field`

.

```
FUNCTION zdb.significant_text(
index regclass,
field text,
query zdbquery,
sample_size int DEFAULT 0,
filter_duplicate_text boolean DEFAULT true)
RETURNS TABLE (
term text,
doc_count bigint,
score numeric,
bg_count bigint)
```

See Elastic's Significant Text Aggregation documentation.

An aggregation that returns interesting or unusual occurrences of free-text terms in a set. It is like the significant terms aggregation but differs in that:

- It is specifically designed for use on type text fields
- It does not require field data or doc-values
- It re-analyzes text content on-the-fly meaning it can also filter duplicate sections of noisy text that otherwise tend to skew statistics.

This aggregate is only supported by Elasticsearch 6+ clusters.

```
FUNCTION zdb.stats(
index regclass,
field text,
query zdbquery)
RETURNS TABLE (
count bigint,
min numeric,
max numeric,
avg numeric,
sum numeric)
```

See Elastic's Stats Aggregation documentation.

A multi-value metrics aggregation that computes stats over numeric values extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents.

```
CREATE TYPE terms_order AS ENUM (
'count',
'term',
'reverse_count',
'reverse_term');
FUNCTION zdb.terms(
index regclass,
field text,
query zdbquery,
size_limit bigint DEFAULT 0,
order_by terms_order DEFAULT 'count')
RETURNS TABLE (
term text,
doc_count bigint)
```

See Elastic's Terms Aggregation documentation.

A multi-bucket value source based aggregation where buckets are dynamically built - one per unique value.

```
FUNCTION zdb.terms_array(
index regclass,
field text,
query zdbquery,
size_limit bigint DEFAULT 0,
order_by terms_order DEFAULT 'count')
RETURNS text[]
```

A version of `zdb.terms()`

that instead returns only the terms as a `text[]`

.

```
FUNCTION zdb.terms_two_level(
index regclass,
first_field text,
second_field text,
query zdbquery,
order_by terms_order DEFAULT 'count',
size bigint DEFAULT 0)
RETURNS TABLE (
first_term text,
second_term text,
doc_count bigint)
```

Similar to `zdb.significant_terms_two_level()`

, this is an adaption of `zdb.terms()`

to provide a two-level nested hierarchy of terms from two different fields.

```
FUNCTION zdb.top_hits(
index regclass,
fields text[],
query zdbquery,
size int)
RETURNS TABLE (
ctid tid,
score float4,
source json)
```

See Elastic's Top Hits Aggregation documentation.

A top_hits metric aggregator keeps track of the most relevant document being aggregated. This aggregator is intended to be used as a sub aggregator, so that the top matching documents can be aggregated per bucket.

```
FUNCTION zdb.top_hits_with_id(
index regclass,
fields text[],
query zdbquery,
size int)
RETURNS TABLE (
_id text,
score float4,
source json)
```

Similar to `zdb.top_hits()`

above, but returns the Elasticsearch document `_id`

value for each hit rather than the corresponding Postgres `ctid`

value.

This function is designed to be used with ZomboDB's low-level-api to aid in searching low-level indices, as normal SELECT statements don't return results from such indices.