Elasticsearch Storage Optimization

Jump to bottom Edit New page

colonD edited this page Jan 4, 2012 · 7 revisions

Use fewer indices

10:59 < kimchy> first, if I am not mistaken, logstash uses daily rollover indices, with the default number of shards ES assigns to an index, which is 5, this is, most times, too many shards for too little data, so either moving to a longer rolling cycle (weekly rolling with 5 shards) or smaller number of shards (1 shard for a daily rolling index)
11:00 < kimchy> this will help with size, because of the nature of how an inverted index works, and the fact that each shard is its own inverted index (a Lucene index)

Optimize Old Indices

11:05 < kimchy> also, each index that you are "done" with, optimize it, note, just calling optimize will not do much, you need to specify the number of segments to optimize down to, I suggest using max_num_segments with a value of 2 or 3 (http://www.elasticsearch.org/guide/reference/api/admin-indices-optimize.html)
11:06 < kimchy> this optimize proces will be IO heavy, so try and do it on "quiet" times if you can
11:22 < kimchy> optimize call: https://gist.github.com/1556171

-- Example daily cron script: https://gist.github.com/1561224

Compress _source

11:00 < kimchy> compressing source in the mapping
11:10 < kimchy> _source compression I mentioned, you can easily do that by setting an index tempalte on a mapping named default, which matches all logstash indices
11:11 < kimchy> here is a sample on setting compression on _source for all: https://gist.github.com/1556130
11:11 < drawks> the _source compression stuff barely saved any space for me in practice
11:11 < drawks> something like 15%

Disable _all field if feasible

11:12 < kimchy> another optimization option is to disable _all field, the _all field is a special field which basically aggregates all the other field content and make it searchable
11:13 < kimchy> if you don't mind explicitly stating which fields to search on, you can disable that as well
11:13 < kimchy> like @message:something
11:15 < kimchy> here is a template mapping that disables _all: https://gist.github.com/1556146

Make sure no fields are duplicated

11:14 < kimchy> also, make sure no duplicate fields indexed, like @message and message, each field is inverted on its own