@@ -5,27 +5,31 @@ Image by `DaPino <http://www.iconarchive.com/show/fishing-equipment-icons-by-dap
55Elasticsearch Knapsack Plugin
66=============================
77
8- Knapsack is an index export/import plugin for `Elasticsearch <http://github.com/elasticsearch/elasticsearch >`_.
8+ Knapsack is an export/import plugin for `Elasticsearch <http://github.com/elasticsearch/elasticsearch >`_.
99
10- It uses tar archive format and gzip compression for input/output.
10+ It uses archive formats (tar, zip, cpio) and compression algorithms (gzip, bzip2, lzf, xz) for transfer.
11+
12+ A direct copy of indexes or index types, or any search results with stored fields is also supported.
13+
14+ Optionally, you can transfer archives to Amazon S3.
1115
1216Installation
1317------------
1418
15- Current version of the plugin is **2.1.5 ** (Nov 6, 2013)
16-
1719.. image :: https://travis-ci.org/jprante/elasticsearch-knapsack.png
1820
1921Prerequisites::
2022
21- Elasticsearch 0.90.5 +
23+ Elasticsearch 0.90+
2224
23- ============= ========= ================= ===========================================================
24- ES version Plugin Release date Command
25- ------------- --------- ----------------- -----------------------------------------------------------
26- 0.90.5 **2.1.4 ** Oct 28, 2013 ./bin/plugin --install knapsack --url http://bit.ly/1ipne90
27- 0.90.6 **2.1.5 ** Nov 6, 2013 ./bin/plugin --install knapsack --url http://bit.ly/17cn710
28- ============= ========= ================= ===========================================================
25+ ============= ================= ================= ===========================================================
26+ ES version Plugin Release date Command
27+ ------------- ----------------- ----------------- -----------------------------------------------------------
28+ 0.90.9 0.90.9.1 Jan 9, 2013 ./bin/plugin --install knapsack --url http://bit.ly/1e81hwh
29+ 0.90.9 0.90.9.1 (S3) Jan 9, 2013 ./bin/plugin --install knapsack --url http://bit.ly/K8QwOJ
30+ ============= ================= ================= ===========================================================
31+
32+ The S3 version includes Amazon AWS API support, it can optionally transfer archives to S3.
2933
3034Do not forget to restart the node after installation.
3135
@@ -37,93 +41,180 @@ The Maven project site is available at `Github <http://jprante.github.io/elastic
3741Binaries
3842--------
3943
40- Binaries are available at `Bintray <https://bintray.com/pkg/show/general/jprante/elasticsearch-plugins/elasticsearch-knapsack >`_
44+ Binaries (also older versions) are available at `Bintray <https://bintray.com/pkg/show/general/jprante/elasticsearch-plugins/elasticsearch-knapsack >`_
45+
46+ Overview
47+ ========
48+
49+ .. image :: ../../../elasticsearch-knapsack/raw/master/src/site/resources/knapsack-diagram.png
4150
42- Documentation
43- =============
4451
45- Note: you must have the _source field enabled, otherwise the Knapsack export will not work.
52+ Example
53+ =======
4654
4755Let's assume a simple index::
4856
4957 curl -XDELETE localhost:9200/test
5058 curl -XPUT localhost:9200/test/test/1 -d '{"key":"value 1"}'
5159 curl -XPUT localhost:9200/test/test/2 -d '{"key":"value 2"}'
5260
53- Exporting
54- ---------
61+ Exporting to archive
62+ --------------------
5563
5664You can export this Elasticsearch index with::
5765
5866 curl -XPOST localhost:9200/test/test/_export
67+ {"running":true,"mode":"export","archive":"tar","path":"file:test_test.tar.gz"}
5968
6069The result is a file in the Elasticsearch folder::
6170
62- -rw-r--r-- 1 joerg staff 296 9 Dez 14:56 test_test.tar.gz
71+ -rw-r--r-- 1 es staff 341 8 Jan 22:25 test_test.tar.gz
6372
64- Check with tar utility, the settings and the mapping is also exported::
73+ Check with tar utility, the settings and the mapping is also exported::
6574
66- tar ztvf test_test.tar.gz
67- -rw-r--r-- 0 0 0 116 9 Dez 14:56 test/_settings
68- -rw-r--r-- 0 0 0 49 9 Dez 14:56 test/test/_mapping
69- -rw-r--r-- 0 0 0 17 9 Dez 14:56 test/test/1
70- -rw-r--r-- 0 0 0 17 9 Dez 14:56 test/test/2
75+ tar ztvf test_test.tar.gz
76+ ---------- 0 es 0 132 8 Jan 22:25 test/_settings/null/null
77+ ---------- 0 es 0 49 8 Jan 22:25 test/test/_mapping/null
78+ ---------- 0 es 0 17 8 Jan 22:25 test/test/2/_source
79+ ---------- 0 es 0 17 8 Jan 22:25 test/test/1/_source
7180
72- Also, you can export with::
81+ Also, you can export a whole index with::
7382
7483 curl -XPOST localhost:9200/test/_export
7584
76- with the result file test.tar.gz, or even all data with::
85+ with the result file test.tar.gz, or even all cluster indices with::
7786
78- curl -XPOST localhost:9200/_export
87+ curl -XPOST ' localhost:9200/_export'
7988
80- with the result file _all.tar.gz
89+ to the file _all.tar.gz
8190
82- Importing
83- ---------
91+ By default, the archive format is `tar ` with compression `gz ` (gzip).
8492
85- You can import the file with::
93+ You can also export to `zip ` or `cpio ` archive or use another compression scheme.
94+ Available are `bz2 ` (bzip2), `xz ` (Xz), or `lzf ` (LZF)
8695
87- curl -XPOST localhost:9200/test/test/_import
96+ Export search results
97+ ----------------------
8898
89- Be sure that the index does not exist. You must delete an index by hand. Knapsack does not delete or overwrite data.
99+ You can add a query to the ` _export ` endpoint just like you would do for searching in Elasticsearch::
90100
91- You can import the file to a new index with renaming your file to test2_test2.tar.gz and executing the import command::
101+ curl -XPOST 'localhost:9200/test/test/_export' -d '{
102+ "query" : {
103+ "match" : {
104+ "myfield" : "myvalue"
105+ }
106+ },
107+ "fields" : [ "_parent", "_source" ]
108+ }'
92109
93- mv test_test.tar.gz test2_test2.tar.gz
94- curl -XPOST localhost:9200/test2/test2/_import
110+ Export to an archive with a given path name
111+ -------------------------------------------
95112
96- and check you have copied the data to a new index with::
113+ You can configure an archive path with the parameter ` path `
97114
98- curl -XGET localhost:9200/test2/test2/1
99- {"_index":"test2","_type":"test2","_id":"1","_version":1,"exists":true, "_source" : {"key":"value 1"}}
115+ curl -XPOST 'localhost:9200/test/_export?path=/tmp/myarchive.zip'
100116
117+ If ELasticsearch can not write an archive to the path, an error message will appear
118+ and no export will take place.
101119
102- State
103- -----
120+ Renaming indexes and index types
121+ --------------------------------
104122
105- While exports or imports or running, you can check the state with::
123+ You can rename indexes and index types by adding a `map ` parameter that contains a JSON
124+ object with old and new index (and index/type) names::
106125
107- curl -XGET localhost:9200/_export/state
126+ curl -XPOST ' localhost:9200/test/type/ _export?map=\{"test":"testcopy","test/type":"testcopy/typecopy"\}'
108127
109- or::
128+ Copy to local or remote cluster
129+ -------------------------------
110130
111- curl -XGET localhost:9200/_import/state
131+ If your requirement is not saving data to an archive at all, but only copying, Knapsack is your friend.
112132
133+ You can copy an index in the local cluster or to a remote cluster with the `_export/copy ` endpoint.
134+ Preconditions are: you have the same Java JVM version and the same Elasticsearch version.
135+
136+ Example for a local cluster copy of the index `test `::
137+
138+ curl -XPOST 'localhost:9200/test/_export/copy?map=\{"test":"testcopy"\}'
139+
140+ Example for a remote cluster copy of the index ``test by using the parameters `cluster`, `host`, and `port` ::
141+
142+ curl -XPOST 'localhost:9200/test/_export/copy?&cluster=remote&host=127.0.0.1&port=9201'
143+
144+ This is a complete example that illustrates how to filter an index by timestamp and copy this part to
145+ another index::
146+
147+ curl -XDELETE 'localhost:9200/test'
148+ curl -XDELETE 'localhost:9200/testcopy'
149+ curl -XPUT 'localhost:9200/test/' -d '
150+ {
151+ "mappings" : {
152+ "_default_": {
153+ "_timestamp" : { "enabled" : true, "store" : true, "path" : "date" }
154+ }
155+ }
156+ }
157+ '
158+ curl -XPUT 'localhost:9200/test/doc/1' -d '
159+ {
160+ "date" : "2014-01-01T00:00:00",
161+ "sentence" : "Hi!",
162+ "value" : 1
163+ }
164+ '
165+ curl -XPUT 'localhost:9200/test/doc/2' -d '
166+ {
167+ "date" : "2014-01-02T00:00:00",
168+ "sentence" : "Hello World!",
169+ "value" : 2
170+ }
171+ '
172+ curl -XPUT 'localhost:9200/test/doc/3' -d '
173+ {
174+ "date" : "2014-01-03T00:00:00",
175+ "sentence" : "Welcome!",
176+ "value" : 3
177+ }
178+ '
179+ curl 'localhost:9200/test/_refresh'
180+ curl -XPOST 'localhost:9200/test/_export/copy?map=\{"test":"testcopy"\}' -d '
181+ {
182+ "fields" : [ "_timestamp", "_source" ],
183+ "query" : {
184+ "filtered" : {
185+ "query" : {
186+ "match_all" : {
187+ }
188+ },
189+ "filter" : {
190+ "range": {
191+ "_timestamp" : {
192+ "from" : "2014-01-02"
193+ }
194+ }
195+ }
196+ }
197+ }
198+ }
199+ '
200+ curl '0:9200/test/_search?fields=_timestamp&pretty'
201+ # wait for bulk flush interval
202+ sleep 10
203+ curl '0:9200/testcopy/_search?fields=_timestamp&pretty'
113204
114- Choosing a different location
115- -----------------------------
205+ Import
206+ ------
116207
117- With the `` target `` parameter, you can choose a path and alternative name for your tar archive. Example ::
208+ You can import the file with ::
118209
119- curl -XPOST 'localhost:9200/_export?target=/big/space/archive.tar.gz '
210+ curl -XPOST 'localhost:9200/test/test/_import '
120211
121- Compression
122- -----------
212+ Knapsack does not delete or overwrite data by default.
213+ But ou can use the parameter ` createIndex ` with the value ` false ` to allow indexing to indexes that exist.
123214
124- You can select a `` .tar.gz ``, `` .tar.bz2 ``, or `` .tar.xz `` suffix for the corresponding compression algorithm. Example::
215+ When importing, you can map your indexes or index/types to your favorite ones.
125216
126- curl -XPOST 'localhost:9200/_export?target=/my/archive.tar.bz2 '
217+ curl -XPOST 'localhost:9200/test/_import?map= \{ "test":"testcopy" \} '
127218
128219Modifying settings and mappings
129220-------------------------------
@@ -176,15 +267,61 @@ The result is::
176267 }
177268 }
178269
270+ Transferring archives to Amazon S3
271+ ----------------------------------
272+
273+ By using special plugin releases including the Amazon AWS S3 API, you can optionally transfer archives
274+ to S3 or fetch one before importing. You can use the endpoints `_export/s3 ` and _import/s3` for that.
275+
276+ Export example::
277+
278+ curl -XPOST 'localhost:9200/test/_export/s3?uri=s3://accesskey:secretkey@awshostname&bucketName=mybucket&key=mykey'
279+
280+ Import example::
281+
282+ curl -XPOST 'localhost:9200/test/_import/s3?uri=s3://accesskey:secretkey@awshostname&bucketName=mybucket&key=mykey'
283+
284+ Note, the file name which is used for downloading from S3 is `mybucket/mykey ` and the directory will be created
285+ if it does not exist.
286+
287+
288+ Check the state of running import/export
289+ ----------------------------------------
290+
291+ While exports or imports or running, you can check the state with::
292+
293+ curl -XGET 'localhost:9200/_export/state'
294+
295+ or::
296+
297+ curl -XGET localhost:9200/_import/state
298+
179299
180300Caution
181301=======
182302
183- Knapsack is very simple and works without locking or index snapshots.
184- So it is up to you to organize the safe export and import.
185- If the index changes while Knapsack is exporting, you may lose data in the export.
186- Do not run Knapsack in parallel on the same export.
303+ Knapsack is very simple and works without locks or snapshots. This means, if Elasticsearch is
304+ allowed to write to the part of your data in the export while it runs, you may lose data in the export.
305+ So it is up to you to organize the safe export and import with this plugin.
306+
307+ If you want a snapshot/restore feature, please use the standard napshot/restore in the upcoming
308+ Elasticsearch 1.0 release.
309+
310+ Credits
311+ =======
312+
313+ Knapsack contains derived work of Apache Common Compress
314+ http://commons.apache.org/proper/commons-compress/
315+
316+ The code in this component has many origins:
317+ The bzip2, tar and zip support came from Avalon's Excalibur, but originally
318+ from Ant, as far as life in Apache goes. The tar package is originally Tim Endres'
319+ public domain package. The bzip2 package is based on the work done by Keiron Liddle as
320+ well as Julian Seward's libbzip2. It has migrated via:
321+ Ant -> Avalon-Excalibur -> Commons-IO -> Commons-Compress.
322+ The cpio package has been contributed by Michael Kuss and the jRPM project.
187323
324+ Thanks to `nicktgr15 <https://github.com/nicktgr15> ` for extending Knapsack to support Amazon S3.
188325
189326License
190327=======
0 commit comments