Skip to content

Commit 43a30c5

Browse files
add elasticsearch-shard tool (elastic#33848)
Relates elastic#31389 (cherry picked from commit a3e8b83)
1 parent 0e658b7 commit 43a30c5

File tree

29 files changed

+2326
-723
lines changed

29 files changed

+2326
-723
lines changed
+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
#!/bin/bash
2+
3+
ES_MAIN_CLASS=org.elasticsearch.index.shard.ShardToolCli \
4+
"`dirname "$0"`"/elasticsearch-cli \
5+
"$@"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
@echo off
2+
3+
setlocal enabledelayedexpansion
4+
setlocal enableextensions
5+
6+
set ES_MAIN_CLASS=org.elasticsearch.index.shard.ShardToolCli
7+
call "%~dp0elasticsearch-cli.bat" ^
8+
%%* ^
9+
|| exit /b 1
10+
11+
endlocal
12+
endlocal

docs/reference/commands/index.asciidoc

+2
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ tasks from the command line:
1212
* <<migrate-tool>>
1313
* <<saml-metadata>>
1414
* <<setup-passwords>>
15+
* <<shard-tool>>
1516
* <<syskeygen>>
1617
* <<users-command>>
1718

@@ -22,5 +23,6 @@ include::certutil.asciidoc[]
2223
include::migrate-tool.asciidoc[]
2324
include::saml-metadata.asciidoc[]
2425
include::setup-passwords.asciidoc[]
26+
include::shard-tool.asciidoc[]
2527
include::syskeygen.asciidoc[]
2628
include::users-command.asciidoc[]
+107
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
[[shard-tool]]
2+
== elasticsearch-shard
3+
4+
In some cases the Lucene index or translog of a shard copy can become
5+
corrupted. The `elasticsearch-shard` command enables you to remove corrupted
6+
parts of the shard if a good copy of the shard cannot be recovered
7+
automatically or restored from backup.
8+
9+
[WARNING]
10+
You will lose the corrupted data when you run `elasticsearch-shard`. This tool
11+
should only be used as a last resort if there is no way to recover from another
12+
copy of the shard or restore a snapshot.
13+
14+
When Elasticsearch detects that a shard's data is corrupted, it fails that
15+
shard copy and refuses to use it. Under normal conditions, the shard is
16+
automatically recovered from another copy. If no good copy of the shard is
17+
available and you cannot restore from backup, you can use `elasticsearch-shard`
18+
to remove the corrupted data and restore access to any remaining data in
19+
unaffected segments.
20+
21+
[WARNING]
22+
Stop Elasticsearch before running `elasticsearch-shard`.
23+
24+
To remove corrupted shard data use the `remove-corrupted-data` subcommand.
25+
26+
There are two ways to specify the path:
27+
28+
* Specify the index name and shard name with the `--index` and `--shard-id`
29+
options.
30+
* Use the `--dir` option to specify the full path to the corrupted index or
31+
translog files.
32+
33+
[float]
34+
=== Removing corrupted data
35+
36+
`elasticsearch-shard` analyses the shard copy and provides an overview of the
37+
corruption found. To proceed you must then confirm that you want to remove the
38+
corrupted data.
39+
40+
[WARNING]
41+
Back up your data before running `elasticsearch-shard`. This is a destructive
42+
operation that removes corrupted data from the shard.
43+
44+
[source,txt]
45+
--------------------------------------------------
46+
$ bin/elasticsearch-shard remove-corrupted-data --index twitter --shard-id 0
47+
48+
49+
WARNING: Elasticsearch MUST be stopped before running this tool.
50+
51+
Please make a complete backup of your index before using this tool.
52+
53+
54+
Opening Lucene index at /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/index/
55+
56+
>> Lucene index is corrupted at /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/index/
57+
58+
Opening translog at /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/
59+
60+
61+
>> Translog is clean at /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/translog/
62+
63+
64+
Corrupted Lucene index segments found - 32 documents will be lost.
65+
66+
WARNING: YOU WILL LOSE DATA.
67+
68+
Continue and remove docs from the index ? Y
69+
70+
WARNING: 1 broken segments (containing 32 documents) detected
71+
Took 0.056 sec total.
72+
Writing...
73+
OK
74+
Wrote new segments file "segments_c"
75+
Marking index with the new history uuid : 0pIBd9VTSOeMfzYT6p0AsA
76+
Changing allocation id V8QXk-QXSZinZMT-NvEq4w to tjm9Ve6uTBewVFAlfUMWjA
77+
78+
You should run the following command to allocate this shard:
79+
80+
POST /_cluster/reroute
81+
{
82+
"commands" : [
83+
{
84+
"allocate_stale_primary" : {
85+
"index" : "index42",
86+
"shard" : 0,
87+
"node" : "II47uXW2QvqzHBnMcl2o_Q",
88+
"accept_data_loss" : false
89+
}
90+
}
91+
]
92+
}
93+
94+
You must accept the possibility of data loss by changing parameter `accept_data_loss` to `true`.
95+
96+
Deleted corrupt marker corrupted_FzTSBSuxT7i3Tls_TgwEag from /var/lib/elasticsearchdata/nodes/0/indices/P45vf_YQRhqjfwLMUvSqDw/0/index/
97+
98+
--------------------------------------------------
99+
100+
When you use `elasticsearch-shard` to drop the corrupted data, the shard's
101+
allocation ID changes. After restarting the node, you must use the
102+
<<cluster-reroute,cluster reroute API>> to tell Elasticsearch to use the new
103+
ID. The `elasticsearch-shard` command shows the request that
104+
you need to submit.
105+
106+
You can also use the `-h` option to get a list of all options and parameters
107+
that the `elasticsearch-shard` tool supports.

docs/reference/index-modules/translog.asciidoc

+4
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,10 @@ The maximum duration for which translog files will be kept. Defaults to `12h`.
9292
[[corrupt-translog-truncation]]
9393
=== What to do if the translog becomes corrupted?
9494

95+
[WARNING]
96+
This tool is deprecated and will be completely removed in 7.0.
97+
Use the <<shard-tool,elasticsearch-shard tool>> instead of this one.
98+
9599
In some cases (a bad drive, user error) the translog on a shard copy can become
96100
corrupted. When this corruption is detected by Elasticsearch due to mismatching
97101
checksums, Elasticsearch will fail that shard copy and refuse to use that copy

libs/cli/src/main/java/org/elasticsearch/cli/Terminal.java

+6-1
Original file line numberDiff line numberDiff line change
@@ -85,12 +85,17 @@ public final void println(Verbosity verbosity, String msg) {
8585

8686
/** Prints message to the terminal at {@code verbosity} level, without a newline. */
8787
public final void print(Verbosity verbosity, String msg) {
88-
if (this.verbosity.ordinal() >= verbosity.ordinal()) {
88+
if (isPrintable(verbosity)) {
8989
getWriter().print(msg);
9090
getWriter().flush();
9191
}
9292
}
9393

94+
/** Checks if is enough {@code verbosity} level to be printed */
95+
public final boolean isPrintable(Verbosity verbosity) {
96+
return this.verbosity.ordinal() >= verbosity.ordinal();
97+
}
98+
9499
/**
95100
* Prompt for a yes or no answer from the user. This method will loop until 'y' or 'n'
96101
* (or the default empty value) is entered.

qa/vagrant/src/main/java/org/elasticsearch/packaging/test/ArchiveTestCase.java

+17
Original file line numberDiff line numberDiff line change
@@ -325,4 +325,21 @@ public void test90SecurityCliPackaging() {
325325
}
326326
}
327327

328+
public void test100RepairIndexCliPackaging() {
329+
assumeThat(installation, is(notNullValue()));
330+
331+
final Installation.Executables bin = installation.executables();
332+
final Shell sh = new Shell();
333+
334+
Platforms.PlatformAction action = () -> {
335+
final Result result = sh.run(bin.elasticsearchShard + " help");
336+
assertThat(result.stdout, containsString("A CLI tool to remove corrupted parts of unrecoverable shards"));
337+
};
338+
339+
if (distribution().equals(Distribution.DEFAULT_TAR) || distribution().equals(Distribution.DEFAULT_ZIP)) {
340+
Platforms.onLinux(action);
341+
Platforms.onWindows(action);
342+
}
343+
}
344+
328345
}

qa/vagrant/src/main/java/org/elasticsearch/packaging/util/Archives.java

+1
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,7 @@ private static void verifyOssInstallation(Installation es, Distribution distribu
186186
"elasticsearch-env",
187187
"elasticsearch-keystore",
188188
"elasticsearch-plugin",
189+
"elasticsearch-shard",
189190
"elasticsearch-translog"
190191
).forEach(executable -> {
191192

qa/vagrant/src/main/java/org/elasticsearch/packaging/util/Installation.java

+2-1
Original file line numberDiff line numberDiff line change
@@ -100,8 +100,9 @@ public class Executables {
100100
public final Path elasticsearch = platformExecutable("elasticsearch");
101101
public final Path elasticsearchPlugin = platformExecutable("elasticsearch-plugin");
102102
public final Path elasticsearchKeystore = platformExecutable("elasticsearch-keystore");
103-
public final Path elasticsearchTranslog = platformExecutable("elasticsearch-translog");
104103
public final Path elasticsearchCertutil = platformExecutable("elasticsearch-certutil");
104+
public final Path elasticsearchShard = platformExecutable("elasticsearch-shard");
105+
public final Path elasticsearchTranslog = platformExecutable("elasticsearch-translog");
105106

106107
private Path platformExecutable(String name) {
107108
final String platformExecutableName = Platforms.WINDOWS

qa/vagrant/src/main/java/org/elasticsearch/packaging/util/Packages.java

+1
Original file line numberDiff line numberDiff line change
@@ -187,6 +187,7 @@ private static void verifyOssInstallation(Installation es, Distribution distribu
187187
"elasticsearch",
188188
"elasticsearch-plugin",
189189
"elasticsearch-keystore",
190+
"elasticsearch-shard",
190191
"elasticsearch-translog"
191192
).forEach(executable -> assertThat(es.bin(executable), file(File, "root", "root", p755)));
192193

qa/vagrant/src/test/resources/packaging/utils/packages.bash

+1
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@ verify_package_installation() {
9595
assert_file "$ESHOME/bin" d root root 755
9696
assert_file "$ESHOME/bin/elasticsearch" f root root 755
9797
assert_file "$ESHOME/bin/elasticsearch-plugin" f root root 755
98+
assert_file "$ESHOME/bin/elasticsearch-shard" f root root 755
9899
assert_file "$ESHOME/bin/elasticsearch-translog" f root root 755
99100
assert_file "$ESHOME/lib" d root root 755
100101
assert_file "$ESCONFIG" d root elasticsearch 2750

qa/vagrant/src/test/resources/packaging/utils/tar.bash

+1
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,7 @@ verify_archive_installation() {
9494
assert_file "$ESHOME/bin/elasticsearch-env" f elasticsearch elasticsearch 755
9595
assert_file "$ESHOME/bin/elasticsearch-keystore" f elasticsearch elasticsearch 755
9696
assert_file "$ESHOME/bin/elasticsearch-plugin" f elasticsearch elasticsearch 755
97+
assert_file "$ESHOME/bin/elasticsearch-shard" f elasticsearch elasticsearch 755
9798
assert_file "$ESHOME/bin/elasticsearch-translog" f elasticsearch elasticsearch 755
9899
assert_file "$ESCONFIG" d elasticsearch elasticsearch 755
99100
assert_file "$ESCONFIG/elasticsearch.yml" f elasticsearch elasticsearch 660

0 commit comments

Comments
 (0)