forked from elastic/logstash
-
Notifications
You must be signed in to change notification settings - Fork 0
Elasticsearch Storage Optimization
colonD edited this page Jan 4, 2012
·
7 revisions
- 10:59 < kimchy> first, if I am not mistaken, logstash uses daily rollover indices, with the default number of shards ES assigns to an index, which is 5, this is, most times, too many shards for too little data, so either moving to a longer rolling cycle (weekly rolling with 5 shards) or smaller number of shards (1 shard for a daily rolling index)
- 11:00 < kimchy> this will help with size, because of the nature of how an inverted index works, and the fact that each shard is its own inverted index (a Lucene index)
- 11:05 < kimchy> also, each index that you are "done" with, optimize it, note, just calling optimize will not do much, you need to specify the number of segments to optimize down to, I suggest using max_num_segments with a value of 2 or 3 (http://www.elasticsearch.org/guide/reference/api/admin-indices-optimize.html)
- 11:06 < kimchy> this optimize proces will be IO heavy, so try and do it on "quiet" times if you can
- 11:22 < kimchy> optimize call: https://gist.github.com/1556171
-- Example daily cron script: https://gist.github.com/1561224
- 11:00 < kimchy> compressing source in the mapping
- 11:10 < kimchy> _source compression I mentioned, you can easily do that by setting an index tempalte on a mapping named default, which matches all logstash indices
- 11:11 < kimchy> here is a sample on setting compression on _source for all: https://gist.github.com/1556130
- 11:11 < drawks> the _source compression stuff barely saved any space for me in practice
- 11:11 < drawks> something like 15%
- 11:12 < kimchy> another optimization option is to disable _all field, the _all field is a special field which basically aggregates all the other field content and make it searchable
- 11:13 < kimchy> if you don't mind explicitly stating which fields to search on, you can disable that as well
- 11:13 < kimchy> like @message:something
- 11:15 < kimchy> here is a template mapping that disables _all: https://gist.github.com/1556146
- 11:14 < kimchy> also, make sure no duplicate fields indexed, like @message and message, each field is inverted on its own