Skip to content

Commit a4536f1

Browse files
fix external links (#2364)
1 parent 958ca1c commit a4536f1

File tree

5 files changed

+18
-18
lines changed

5 files changed

+18
-18
lines changed

docs/reference/apache-spark-support.md

+12-12
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ $ ./bin/spark-submit --conf spark.es.resource=index/type ... <1>
4444

4545
#### Writing data to {{es}} [spark-write]
4646

47-
With elasticsearch-hadoop, any `RDD` can be saved to {{es}} as long as its content can be translated into documents. In practice this means the `RDD` type needs to be a `Map` (whether a Scala or a Java one), a [`JavaBean`](http://docs.oracle.com/javase/tutorial/javabeans/) or a Scala [case class](http://docs.scala-lang.org/tutorials/tour/case-classes.md). When that is not the case, one can easily *transform* the data in Spark or plug-in their own custom [`ValueWriter`](/reference/configuration.md#configuration-serialization).
47+
With elasticsearch-hadoop, any `RDD` can be saved to {{es}} as long as its content can be translated into documents. In practice this means the `RDD` type needs to be a `Map` (whether a Scala or a Java one), a [`JavaBean`](http://docs.oracle.com/javase/tutorial/javabeans/) or a Scala [case class](http://docs.scala-lang.org/tutorials/tour/case-classes.html). When that is not the case, one can easily *transform* the data in Spark or plug-in their own custom [`ValueWriter`](/reference/configuration.md#configuration-serialization).
4848

4949
##### Scala [spark-write-scala]
5050

@@ -267,7 +267,7 @@ saveToEs(javaRDD, "my-collection-{media_type}/doc"); <1>
267267

268268
#### Handling document metadata [spark-write-meta]
269269

270-
{{es}} allows each document to have its own [metadata](elasticsearch://reference/elasticsearch/mapping-reference/document-metadata-fields.md). As explained above, through the various [mapping](/reference/configuration.md#cfg-mapping) options one can customize these parameters so that their values are extracted from their belonging document. Further more, one can even include/exclude what parts of the data are sent back to {{es}}. In Spark, elasticsearch-hadoop extends this functionality allowing metadata to be supplied *outside* the document itself through the use of [*pair* `RDD`s](http://spark.apache.org/docs/latest/programming-guide.md#working-with-key-value-pairs). In other words, for `RDD`s containing a key-value tuple, the metadata can be extracted from the key and the value used as the document source.
270+
{{es}} allows each document to have its own [metadata](elasticsearch://reference/elasticsearch/mapping-reference/document-metadata-fields.md). As explained above, through the various [mapping](/reference/configuration.md#cfg-mapping) options one can customize these parameters so that their values are extracted from their belonging document. Further more, one can even include/exclude what parts of the data are sent back to {{es}}. In Spark, elasticsearch-hadoop extends this functionality allowing metadata to be supplied *outside* the document itself through the use of [*pair* `RDD`s](http://spark.apache.org/docs/latest/programming-guide.html#working-with-key-value-pairs). In other words, for `RDD`s containing a key-value tuple, the metadata can be extracted from the key and the value used as the document source.
271271

272272
The metadata is described through the `Metadata` Java [enum](http://docs.oracle.com/javase/tutorial/java/javaOO/enum.md) within `org.elasticsearch.spark.rdd` package which identifies its type - `id`, `ttl`, `version`, etc…​ Thus an `RDD` keys can be a `Map` containing the `Metadata` for each document and its associated values. If `RDD` key is not of type `Map`, elasticsearch-hadoop will consider the object as representing the document id and use it accordingly. This sounds more complicated than it is, so let us see some examples.
273273

@@ -433,7 +433,7 @@ sc.esRDD("radio/artists", "?q=me*") <1>
433433
1. create an `RDD` streaming all the documents matching `me*` from index `radio/artists`
434434

435435

436-
The documents from {{es}} are returned, by default, as a `Tuple2` containing as the first element the document id and the second element the actual document represented through Scala [collections](http://docs.scala-lang.org/overviews/collections/overview.md), namely one `Map[String, Any]`where the keys represent the field names and the value their respective values.
436+
The documents from {{es}} are returned, by default, as a `Tuple2` containing as the first element the document id and the second element the actual document represented through Scala [collections](http://docs.scala-lang.org/overviews/collections/overview.html), namely one `Map[String, Any]`where the keys represent the field names and the value their respective values.
437437

438438

439439
##### Java [spark-read-java]
@@ -575,7 +575,7 @@ Spark Streaming support provides special optimizations to allow for conservation
575575

576576
#### Writing `DStream` to {{es}} [spark-streaming-write]
577577

578-
Like `RDD`s, any `DStream` can be saved to {{es}} as long as its content can be translated into documents. In practice this means the `DStream` type needs to be a `Map` (either a Scala or a Java one), a [`JavaBean`](http://docs.oracle.com/javase/tutorial/javabeans/) or a Scala [case class](http://docs.scala-lang.org/tutorials/tour/case-classes.md). When that is not the case, one can easily *transform* the data in Spark or plug-in their own custom [`ValueWriter`](/reference/configuration.md#configuration-serialization).
578+
Like `RDD`s, any `DStream` can be saved to {{es}} as long as its content can be translated into documents. In practice this means the `DStream` type needs to be a `Map` (either a Scala or a Java one), a [`JavaBean`](http://docs.oracle.com/javase/tutorial/javabeans/) or a Scala [case class](http://docs.scala-lang.org/tutorials/tour/case-classes.html). When that is not the case, one can easily *transform* the data in Spark or plug-in their own custom [`ValueWriter`](/reference/configuration.md#configuration-serialization).
579579

580580

581581
##### Scala [spark-streaming-write-scala]
@@ -854,7 +854,7 @@ jssc.start();
854854

855855
#### Handling document metadata [spark-streaming-write-meta]
856856

857-
{{es}} allows each document to have its own [metadata](elasticsearch://reference/elasticsearch/mapping-reference/document-metadata-fields.md). As explained above, through the various [mapping](/reference/configuration.md#cfg-mapping) options one can customize these parameters so that their values are extracted from their belonging document. Further more, one can even include/exclude what parts of the data are sent back to {{es}}. In Spark, elasticsearch-hadoop extends this functionality allowing metadata to be supplied *outside* the document itself through the use of [*pair* `RDD`s](http://spark.apache.org/docs/latest/programming-guide.md#working-with-key-value-pairs).
857+
{{es}} allows each document to have its own [metadata](elasticsearch://reference/elasticsearch/mapping-reference/document-metadata-fields.md). As explained above, through the various [mapping](/reference/configuration.md#cfg-mapping) options one can customize these parameters so that their values are extracted from their belonging document. Further more, one can even include/exclude what parts of the data are sent back to {{es}}. In Spark, elasticsearch-hadoop extends this functionality allowing metadata to be supplied *outside* the document itself through the use of [*pair* `RDD`s](http://spark.apache.org/docs/latest/programming-guide.html#working-with-key-value-pairs).
858858

859859
This is no different in Spark Streaming. For `DStreams`s containing a key-value tuple, the metadata can be extracted from the key and the value used as the document source.
860860

@@ -1085,7 +1085,7 @@ Spark SQL works with *structured* data - in other words, all entries are expecte
10851085

10861086
#### Supported Spark SQL versions [spark-sql-versions]
10871087

1088-
Spark SQL while becoming a mature component, is still going through significant changes between releases. Spark SQL became a stable component in version 1.3, however it is [**not** backwards compatible](https://spark.apache.org/docs/latest/sql-programming-guide.md#migration-guide) with the previous releases. Further more Spark 2.0 introduced significant changed which broke backwards compatibility, through the `Dataset` API. elasticsearch-hadoop supports both version Spark SQL 1.3-1.6 and Spark SQL 2.0 through two different jars: `elasticsearch-spark-1.x-<version>.jar` and `elasticsearch-hadoop-<version>.jar` support Spark SQL 1.3-1.6 (or higher) while `elasticsearch-spark-2.0-<version>.jar` supports Spark SQL 2.0. In other words, unless you are using Spark 2.0, use `elasticsearch-spark-1.x-<version>.jar`
1088+
Spark SQL while becoming a mature component, is still going through significant changes between releases. Spark SQL became a stable component in version 1.3, however it is [**not** backwards compatible](https://spark.apache.org/docs/latest/sql-programming-guide.html#migration-guide) with the previous releases. Further more Spark 2.0 introduced significant changed which broke backwards compatibility, through the `Dataset` API. elasticsearch-hadoop supports both version Spark SQL 1.3-1.6 and Spark SQL 2.0 through two different jars: `elasticsearch-spark-1.x-<version>.jar` and `elasticsearch-hadoop-<version>.jar` support Spark SQL 1.3-1.6 (or higher) while `elasticsearch-spark-2.0-<version>.jar` supports Spark SQL 2.0. In other words, unless you are using Spark 2.0, use `elasticsearch-spark-1.x-<version>.jar`
10891089

10901090
Spark SQL support is available under `org.elasticsearch.spark.sql` package.
10911091

@@ -1184,7 +1184,7 @@ For maximum control over the mapping of your `DataFrame` in {{es}}, it is highly
11841184

11851185
#### Writing existing JSON to {{es}} [spark-sql-json]
11861186

1187-
When using Spark SQL, if the input data is in JSON format, simply convert it to a `DataFrame` (in Spark SQL 1.3) or a `Dataset` (for Spark SQL 2.0) (as described in Spark [documentation](https://spark.apache.org/docs/latest/sql-programming-guide.md#json-datasets)) through `SQLContext`/`JavaSQLContext` `jsonFile` methods.
1187+
When using Spark SQL, if the input data is in JSON format, simply convert it to a `DataFrame` (in Spark SQL 1.3) or a `Dataset` (for Spark SQL 2.0) (as described in Spark [documentation](https://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets)) through `SQLContext`/`JavaSQLContext` `jsonFile` methods.
11881188

11891189

11901190
#### Using pure SQL to read from {{es}} [spark-sql-read-ds]
@@ -1194,7 +1194,7 @@ The index and its mapping, have to exist prior to creating the temporary table
11941194
::::
11951195

11961196

1197-
Spark SQL 1.2 [introduced](http://spark.apache.org/releases/spark-release-1-2-0.md) a new [API](https://github.com/apache/spark/pull/2475) for reading from external data sources, which is supported by elasticsearch-hadoop simplifying the SQL configured needed for interacting with {{es}}. Further more, behind the scenes it understands the operations executed by Spark and thus can optimize the data and queries made (such as filtering or pruning), improving performance.
1197+
Spark SQL 1.2 [introduced](http://spark.apache.org/releases/spark-release-1-2-0.html) a new [API](https://github.com/apache/spark/pull/2475) for reading from external data sources, which is supported by elasticsearch-hadoop simplifying the SQL configured needed for interacting with {{es}}. Further more, behind the scenes it understands the operations executed by Spark and thus can optimize the data and queries made (such as filtering or pruning), improving performance.
11981198

11991199

12001200
#### Data Sources in Spark SQL [spark-data-sources]
@@ -1512,7 +1512,7 @@ When dealing with multi-value/array fields, please see [this](/reference/mapping
15121512

15131513
elasticsearch-hadoop automatically converts Spark built-in types to {{es}} [types](elasticsearch://reference/elasticsearch/mapping-reference/field-data-types.md) (and back) as shown in the table below:
15141514

1515-
While Spark SQL [`DataType`s](https://spark.apache.org/docs/latest/sql-programming-guide.md#data-types) have an equivalent in both Scala and Java and thus the [RDD](#spark-type-conversion) conversion can apply, there are slightly different semantics - in particular with the `java.sql` types due to the way Spark SQL handles them:
1515+
While Spark SQL [`DataType`s](https://spark.apache.org/docs/latest/sql-programming-guide.html#data-types) have an equivalent in both Scala and Java and thus the [RDD](#spark-type-conversion) conversion can apply, there are slightly different semantics - in particular with the `java.sql` types due to the way Spark SQL handles them:
15161516

15171517
| Spark SQL `DataType` | {{es}} type |
15181518
| --- | --- |
@@ -1560,7 +1560,7 @@ Like Spark SQL, Structured Streaming works with *structured* data. All entries a
15601560

15611561
Spark Structured Streaming is considered *generally available* as of Spark v2.2.0. As such, elasticsearch-hadoop support for Structured Streaming (available in elasticsearch-hadoop 6.0+) is only compatible with Spark versions 2.2.0 and onward. Similar to Spark SQL before it, Structured Streaming may be subject to significant changes between releases before its interfaces are considered *stable*.
15621562

1563-
Spark Structured Streaming support is available under the `org.elasticsearch.spark.sql` and `org.elasticsearch.spark.sql.streaming` packages. It shares a unified interface with Spark SQL in the form of the `Dataset[_]` api. Clients can interact with streaming `Dataset`s in almost exactly the same way as regular batch `Dataset`s with only a [few exceptions](http://spark.apache.org/docs/latest/structured-streaming-programming-guide.md#unsupported-operations).
1563+
Spark Structured Streaming support is available under the `org.elasticsearch.spark.sql` and `org.elasticsearch.spark.sql.streaming` packages. It shares a unified interface with Spark SQL in the form of the `Dataset[_]` api. Clients can interact with streaming `Dataset`s in almost exactly the same way as regular batch `Dataset`s with only a [few exceptions](http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#unsupported-operations).
15641564

15651565

15661566
#### Writing Streaming `Datasets` (Spark SQL 2.0+) to {{es}} [spark-sql-streaming-write]
@@ -1660,7 +1660,7 @@ people.writeStream()
16601660

16611661
#### Writing existing JSON to {{es}} [spark-sql-streaming-json]
16621662

1663-
When using Spark SQL, if the input data is in JSON format, simply convert it to a `Dataset` (for Spark SQL 2.0) (as described in Spark [documentation](http://spark.apache.org/docs/latest/structured-streaming-programming-guide.md#input-sources)) through the `DataStreamReader’s `json` format.
1663+
When using Spark SQL, if the input data is in JSON format, simply convert it to a `Dataset` (for Spark SQL 2.0) (as described in Spark [documentation](http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#input-sources)) through the `DataStreamReader’s `json` format.
16641664

16651665

16661666
#### Sink commit log in Spark Structured Streaming [spark-sql-streaming-commit-log]
@@ -1718,7 +1718,7 @@ If automatic index creation is used, please review [this](/reference/mapping-typ
17181718

17191719
elasticsearch-hadoop automatically converts Spark built-in types to {{es}} [types](elasticsearch://reference/elasticsearch/mapping-reference/field-data-types.md) as shown in the table below:
17201720

1721-
While Spark SQL [`DataType`s](https://spark.apache.org/docs/latest/sql-programming-guide.md#data-types) have an equivalent in both Scala and Java and thus the [RDD](#spark-type-conversion) conversion can apply, there are slightly different semantics - in particular with the `java.sql` types due to the way Spark SQL handles them:
1721+
While Spark SQL [`DataType`s](https://spark.apache.org/docs/latest/sql-programming-guide.html#data-types) have an equivalent in both Scala and Java and thus the [RDD](#spark-type-conversion) conversion can apply, there are slightly different semantics - in particular with the `java.sql` types due to the way Spark SQL handles them:
17221722

17231723
| Spark SQL `DataType` | {{es}} type |
17241724
| --- | --- |

docs/reference/hadoop-metrics.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ mapped_pages:
55

66
# Hadoop metrics [metrics]
77

8-
The Hadoop system records a set of metric counters for each job that it runs. elasticsearch-hadoop extends on that and provides metrics about its activity for each job run by leveraging the Hadoop [Counters](http://hadoop.apache.org/docs/r3.3.1/api/org/apache/hadoop/mapred/Counters.md) infrastructure. During each run, elasticsearch-hadoop sends statistics from each task instance, as it is running, which get aggregated by the Map/Reduce infrastructure and are available through the standard Hadoop APIs.
8+
The Hadoop system records a set of metric counters for each job that it runs. elasticsearch-hadoop extends on that and provides metrics about its activity for each job run by leveraging the Hadoop [Counters](http://hadoop.apache.org/docs/r3.3.1/api/org/apache/hadoop/mapred/Counters.html) infrastructure. During each run, elasticsearch-hadoop sends statistics from each task instance, as it is running, which get aggregated by the Map/Reduce infrastructure and are available through the standard Hadoop APIs.
99

1010
elasticsearch-hadoop provides the following counters, available under `org.elasticsearch.hadoop.mr.Counter` enum:
1111

@@ -33,7 +33,7 @@ elasticsearch-hadoop provides the following counters, available under `org.elast
3333
| BULK_RETRIES_TOTAL_TIME_MS | Time (in ms) spent over the network retrying bulk requests |
3434
| SCROLL_TOTAL_TIME_MS | Time (in ms) spent over the network reading the scroll requests |
3535

36-
One can use the counters programatically, depending on the API used, through [mapred](http://hadoop.apache.org/docs/r3.3.1/api/index.md?org/apache/hadoop/mapred/Counters.md) or [mapreduce](http://hadoop.apache.org/docs/r3.3.1/api/index.md?org/apache/hadoop/mapreduce/Counter.md). Whatever the choice, elasticsearch-hadoop performs automatic reports without any user intervention. In fact, when using elasticsearch-hadoop one will see the stats reported at the end of the job run, for example:
36+
One can use the counters programatically, depending on the API used, through [mapred](http://hadoop.apache.org/docs/r3.3.1/api/index.html?org/apache/hadoop/mapred/Counters.md) or [mapreduce](http://hadoop.apache.org/docs/r3.3.1/api/index.html?org/apache/hadoop/mapreduce/Counter.md). Whatever the choice, elasticsearch-hadoop performs automatic reports without any user intervention. In fact, when using elasticsearch-hadoop one will see the stats reported at the end of the job run, for example:
3737

3838
```bash
3939
13:55:08,100 INFO main mapreduce.Job - Job job_local127738678_0013 completed successfully

docs/reference/license.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ navigation_title: License
55
---
66
# {{esh-full}} license [license]
77

8-
elasticsearch-hadoop is Open Source, released under [Apache 2 license](http://www.apache.org/licenses/LICENSE-2.0.md):
8+
elasticsearch-hadoop is Open Source, released under [Apache 2 license](http://www.apache.org/licenses/LICENSE-2.0.html):
99

1010
```
1111
Licensed to Elasticsearch under one or more contributor license agreements.

docs/reference/logging.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ log4j.category.org.elasticsearch.hadoop.mr=DEBUG
3030
```
3131

3232
::::{tip}
33-
See the log4j [javadoc](https://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/PropertyConfigurator.md#doConfigure%28java.lang.String,%20org.apache.log4j.spi.LoggerRepository%29) for more information.
33+
See the log4j [javadoc](https://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/PropertyConfigurator.html#doConfigure%28java.lang.String,%20org.apache.log4j.spi.LoggerRepository%29) for more information.
3434
::::
3535

3636

0 commit comments

Comments
 (0)