What are the benefits of aggregating text fields?

Text fields, by default, are analysed before indexing so that a value like "red car" can be found by searching for both "red" and "car". However, you may want to perform calculations on words rather than simply search using them and this is where aggregation is useful.

Making a text field aggregatable means that all values of all documents for the text field are loaded into memory. This process is called Bucketing.

What is bucketing?

Bucket aggregations create buckets of documents. In our example of a red car, an aggregation on the field will return a "red" bucket and a "car" bucket. Any document with a mention of the word red in this text field will be added to the "red" bucket and the same for the word car and the "car" bucket. Obviously, if we have lots of words in a text field over many documents we will end up with a lot of buckets. Some documents will be found in more than one bucket depending on the content of the field whilst others may not.

What are the benefits?

The benefit is that you are now able to perform calculations on the documents. So, for example, if we have another field in our document that contains a country we would be able to calculate the number of red cars by country. Perhaps we have another field that contains a year. We can work out the average number of red cars in a country by year. We can then visualise our calculations using Kibana.

How can I make a text aggregatable?

If you think that making a text field aggregatable is something that you require then please take a look at How can I make a text field in Elasticsearch and Kibana aggregatable? This article explains everything that you need to do to set this up.

Tip: Please remember that aggregating on a text field using fielddata is very expensive because values for all documents of that field are loaded into memory which can cause performance problems

What's next

Did this answer your question?