Elasticsearch in Java projects – aggregations
This article, in contrary to previous ones, is not dedicated to search. It presents another powerful aspect of the Elasticsearch – Aggregations, that let users analyze and summarize the set of data.
Introduction
So far the previous articles were dedicated to search where there is a query and the task is to find a subset of documents that match the query. Aggregations give us the possibility to look over the data from different perspective. Instead of looking for individual documents, they let analyze and summarize the set of data. However, even though the functionality is completely different from search, aggregations are executed as quickly as search. This article presents the basics about aggregations – concepts and usage in java projects in practise on the demo application.
Buckets and metrics
Every aggregation is simply based on two concepts: buckets and metrics. It is a combination of one or more buckets and zero or more metrics.
Buckets
A bucket is a collection of documents that meet specified criteria. For instance, a driver may land in either active or inactive bucket. When aggregations are executed, each document is validated based on field values to check whether it matches a bucket’s criteria or not. Once it matches, it is 'placed’ inside the bucket. Bucket aggregations can also have sub-aggregations so that it gives a hierarchy of buckets where the „parent” bucket contains others.
Metrics
A metric, in most cases, is a mathematical operation like sum or max, that is calculated based on the documents values. The calculated quantity can be a number of drivers from Poland for instance.
Demo project
Until now, the application API returns the list of drivers information enriched with career statistics. Now the application response will be extended with some aggregation result and contains information about drivers for each country or group drives by the number of earned trophies. You can find example response below that beside the documents that meet query criteria contains four additional aggregations.
{
"count": 1,
"drivers": [
{
"driverId": "kubica",
"code": "KUB",
"givenName": "Robert",
"familyName": "Kubica",
"dateOfBirth": "1984-12-07",
"nationality": "Polish",
"active": false,
"permanentNumber": 88,
"statistics": {
"races": 96,
"wins": 1,
"titles": 0
}
}
],
"byNationality": {
"Polish": {
"count": 1,
"wins": 1,
"titles": 0,
"avgRaces": 96.0,
"drivers": [
{
"driverId": "kubica",
"givenName": "Robert",
"familyName": "Kubica"
}
]
}
},
"byTitles": {
"0": {
"count": 1,
"drivers": [
{
"driverId": "kubica",
"givenName": "Robert",
"familyName": "Kubica"
}
]
}
},
"byActive": {
"inactive": {
"count": 1,
"drivers": [
{
"driverId": "kubica",
"givenName": "Robert",
"familyName": "Kubica"
}
]
}
}
}
Java API
Create the aggregations
Elasticsearch provides a full Java API to play with aggregations. Each aggregation is represented by the object that extends the factory class AggregationBuilder. Aggregation builders can be created using multiple helper methods of AggregationBuilders class. Code snippet below shows the creation of the term aggregation 'byNationality’ that has four sub-aggregation: – sum aggregation that calculates the total sum of bucketed document’s field value specified by the field parameter – avg aggregation that calculates the average of bucketed document’s field value specified by the field parameter – topHist aggregation that return the hits of the documents that land in 'byNationality’ bucket.
TermsAggregationBuilder byNationality = AggregationBuilders.terms("byNationality").field("nationality")
.subAggregation(AggregationBuilders.sum("totalTitles").field("statistics.titles"))
.subAggregation(AggregationBuilders.sum("totalWins").field("statistics.wins"))
.subAggregation(AggregationBuilders.avg("avgRaces").field("statistics.races"))
.subAggregation(AggregationBuilders.topHits("byHits").size(MAX_SIZE));
Once the aggregation builder is created it just needs to be added to the search request so that it will be evaluated once the request is executed. The request expects the list of aggregations so that the search response contains the result for each of them at once.
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
for (AggregationBuilder aggregation : aggregations) {
searchSourceBuilder.aggregation(aggregation);
}
Processing the search response
The SearchResponse object, which is the result of the executing the search request, provides the method getAggregations() where the list of evaluated aggregations is returned. The access to each aggregation results is done via its defined name. As a result the list of buckets is available and it can be further processed. Code snippet shown below presents how the aggregations results can be taken for the first bucket. In practise the collection of buckets can be processed using Stream API.
Aggregations aggregations = searchResponse.getAggregations();
// get list of all buckets under the aggregation
List<!--? extends Terms.Bucket--> buckets = aggregations.<terms>get("byNationality").getBuckets();
// get all of the sub-aggregations
Terms.Bucket bucket = buckets.get(0);
Sum totalTitles = (Sum) bucket.getAggregations().get("totalTitles");
Sum totalWins = (Sum) bucket.getAggregations().get("totalWins");
Avg avgRaces = (Avg) bucket.getAggregations().get("avgRaces");
TopHits byHits = (TopHits) bucket.getAggregations().get("byHits");
// read the values for each of the sub-aggregations
int totalTitlesValue = (int) totalTitles.getValue();
int totalWinsValue = (int) totalWins.getValue();
int value = (int) avgRaces.getValue();
SearchHit[] bucketHits = byHits.getHits().getHits();
// get the number of documents in the bucket
int docCount = (int) bucket.getDocCount();
Once the aggregation result are retrieved, they can be provided further for other functionalities.
Summary
This article presents the aggregations that let look on the data from other perspective – not search documents but analyze the complete collection that match the query. In the demo application it is shown in practise how aggregations can be created, executed and how to process the result to get the expected numbers. The application uses only a few of the available aggregation types but the others are created in the same way and the choice which to use is driven by the specified requirements.
Reference
https://www.elastic.co/guide/en/elasticsearch/client/java-api/7.15/java-aggs.html
Poznaj mageek of j‑labs
i daj się zadziwić,
jak może wyglądać praca z j‑people!
Skontaktuj się z nami



