Skip to content

Elasticsearch Storage Optimization

colonD edited this page Jan 4, 2012 · 7 revisions

Use fewer indices

  • 10:59 < kimchy> first, if I am not mistaken, logstash uses daily rollover indices, with the default number of shards ES assigns to an index, which is 5, this is, most times, too many shards for too little data, so either moving to a longer rolling cycle (weekly rolling with 5 shards) or smaller number of shards (1 shard for a daily rolling index)
  • 11:00 < kimchy> this will help with size, because of the nature of how an inverted index works, and the fact that each shard is its own inverted index (a Lucene index)

Optimize Old Indices

-- Example daily cron script: https://gist.github.com/1561224

Compress _source

  • 11:00 < kimchy> compressing source in the mapping
  • 11:10 < kimchy> _source compression I mentioned, you can easily do that by setting an index tempalte on a mapping named default, which matches all logstash indices
  • 11:11 < kimchy> here is a sample on setting compression on _source for all: https://gist.github.com/1556130
  • 11:11 < drawks> the _source compression stuff barely saved any space for me in practice
  • 11:11 < drawks> something like 15%

Disable _all field if feasible

  • 11:12 < kimchy> another optimization option is to disable _all field, the _all field is a special field which basically aggregates all the other field content and make it searchable
  • 11:13 < kimchy> if you don't mind explicitly stating which fields to search on, you can disable that as well
  • 11:13 < kimchy> like @message:something
  • 11:15 < kimchy> here is a template mapping that disables _all: https://gist.github.com/1556146

Make sure no fields are duplicated

  • 11:14 < kimchy> also, make sure no duplicate fields indexed, like @message and message, each field is inverted on its own

Clone this wiki locally