GitHub - bryanlb/OpenSearch: 🔎 Open source distributed and RESTful search engine.

Running OpenSearch with etcd cluster state

Within this branch, I'm updating the README to explain how to get started with etcd.

Install and launch etcd

See instructions at https://etcd.io/docs/v3.5/install/.

On my Mac, this meant brew install etcd. Then I just ran the etcd executable and left it running in a terminal tab.

On Ubuntu, I needed to run sudo apt install etcd-server etcd-client to get both the etcd and etcdctl commands. Installing the server automatically started the etcd process.

You should also get the etcdctl command line tool included. You can interact with the running local etcd instance as follows:

# Write value 'bar' to key 'foo'
% etcdctl put foo bar
OK

# Read the value from key 'foo'
% etcdctl get foo
foo
bar

# Get all keys whose first byte is between ' ' (the earliest printable character) and '~' (the last)
%  etcdctl get ' ' '~'
foo
bar

# Delete the entry for key 'foo'
% etcdctl del foo
1

Run three OpenSearch nodes from this branch

In order to validate that we can form a working distributed system without a cluster, we will start three OpenSearch nodes locally. The first will serve as a coordinator, while the other two will be data nodes

# Clone the repo
% git clone https://github.com/msfroh/OpenSearch.git

# Enter the cloned repo
% cd OpenSearch

# Checkout the correct branch
% git checkout clusterless_datanode

# Run with the cluster-etcd plugin loaded and launch three nodes. We also need to set the clusterless mode feature flag.
% ./gradlew run -PinstalledPlugins="['cluster-etcd']" -PnumNodes=3 -Dtests.opensearch.opensearch.experimental.feature.clusterless.enabled=true

# In another tab, check the local cluster state for each node

# In the examples below, this will be the coordinator node. Note that the node name is runTask-0.
% curl 'http://localhost:9200/_cluster/state?local&pretty'

# In the examples below, this will be the first data node. Note that the node name is runTask-1.
% curl 'http://localhost:9201/_cluster/state?local&pretty'

# In the examples below, this will be the second data node. Note that the node name is runTask-2.
% curl 'http://localhost:9202/_cluster/state?local&pretty'

Push some state to etcd to start the data nodes

The cluster-etcd plugin now uses a split metadata approach that separates index configuration into distinct etcd keys:

Settings: /indices/{index}/settings - Basic index configuration needed by all nodes
Mappings: /indices/{index}/mappings - Field definitions needed primarily by data nodes

This approach reduces etcd storage requirements and simplifies control plane logic by filtering out data plane implementation details.

# Write index settings and mappings separately (new split metadata approach)
# Settings are needed by both data nodes and coordinators
% cat << EOF | etcdctl put runTask/indices/myindex/settings
{
  "index": {
    "number_of_shards": "1",
    "number_of_replicas": "0",
    "uuid": "E8F2-ebqQ1-U4SL6NoPEyw",
    "version": {
      "created": "137227827"
    }
  }
}
EOF

# Mappings are needed by data nodes only (flattened structure)
% cat << EOF | etcdctl put runTask/indices/myindex/mappings
{
  "properties": {
    "title": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      }
    }
  }
}
EOF

# Assign primary for shard 0 of myindex to the node listening on port 9201/9301
% etcdctl put runTask/search-unit/runTask-1/goal-state '{"local_shards":{"myindex":{"0":"PRIMARY"}}}'

# Assign primary for shard 1 of myindex to the node listening on port 9202/9302
% etcdctl put runTask/search-unit/runTask-2/goal-state '{"local_shards":{"myindex":{"1":"PRIMARY"}}}'

# Verify the split metadata was stored correctly
% etcdctl get "runTask/indices/myindex/settings"
% etcdctl get "runTask/indices/myindex/mappings"

# Check all keys to see the new structure
% etcdctl get "" --from-key --keys-only

# Check the local cluster state on each data node
% curl 'http://localhost:9201/_cluster/state?local&pretty'
% curl 'http://localhost:9202/_cluster/state?local&pretty'

# Write a document to each shard. Here we're relying on knowing which shard each doc will land on (from trial and error).
# Note that if you try sending each document to the other data node, it will fail, since the data nodes don't know about
# each other and don't know where to forward the documents.
% curl -X POST -H 'Content-Type: application/json' http://localhost:9201/myindex/_doc/3 -d '{"title":"Hello from shard 0"}'
% curl -X POST -H 'Content-Type: application/json' http://localhost:9202/myindex/_doc/1 -d '{"title":"Hello from shard 1"}'

# Search the document on shard 0
% curl 'http://localhost:9201/myindex/_search?pretty'

# Search the document on shard 1
% curl 'http://localhost:9202/myindex/_search?pretty'

Add a coordinator

The coordinator automatically resolves node names to node IDs by reading health data, eliminating the need to manually extract node IDs and ephemeral IDs.

# Tell the coordinator about the data nodes using their node names (not IDs).
% cat << EOF | etcdctl put runTask/search-unit/runTask-0/goal-state
{
  "remote_shards": {
    "indices": {
      "myindex": {
        "uuid" : "E8F2-ebqQ1-U4SL6NoPEyw",
        "shard_routing" : [
          [
            {"node_name": "runTask-1", "primary": true }
          ],
          [
            {"node_name": "runTask-2", "primary": true }
          ]
        ]
      }
    }
  }
}
EOF

# Search via the coordinator node. You'll see both documents added above
% curl 'http://localhost:9200/myindex/_search?pretty'

# Index a batch of documents (surely hitting both shards) via the coordinator node
% curl -X POST -H 'Content-Type: application/json' http://localhost:9200/myindex/_bulk -d '
{ "index": {"_id":"2"}}
{"title": "Document 2"}
{ "index": {"_id":"4"}}
{"title": "Document 4"}
{ "index": {"_id":"5"}}
{"title": "Document 5"}
{ "index": {"_id":"6"}}
{"title": "Document 6"}
{ "index": {"_id":"7"}}
{"title": "Document 7"}
{ "index": {"_id":"8"}}
{"title": "Document 8"}
{ "index": {"_id":"9"}}
{"title": "Document 9"}
{ "index": {"_id":"10"}}
{"title": "Document 10"}
'

# Search via the coordinator node. You'll see 10 documents. If you search each data node you'll see around half.
% curl 'http://localhost:9200/myindex/_search?pretty'

Heartbeat and Health Data

Nodes automatically publish health and status information to ETCD at the path {cluster_name}/search-unit/{node_name}/actual-state.

# View heartbeat data for all nodes
% etcdctl get "runTask/search-unit/" --prefix

# View specific node's health data  
% etcdctl get "runTask/search-unit/<nodename>/actual-state"

Welcome!
Project Resources
Code of Conduct
Security
License
Copyright
Trademark

Welcome!

OpenSearch is a community-driven, open source fork of Elasticsearch and Kibana following the license change in early 2021. We're looking to sustain (and evolve!) a search and analytics suite for the multitude of businesses who are dependent on the rights granted by the original, Apache v2.0 License.

Project Resources

Code of Conduct

The project's Code of Conduct outlines our expectations for all participants in our community, based on the OpenSearch Code of Conduct. Please contact [email protected] with any additional questions or comments.

Security

If you discover a potential security issue in this project we ask that you notify OpenSearch Security directly via email to [email protected]. Please do not create a public GitHub issue.

License

This project is licensed under the Apache v2.0 License.

Copyright

Copyright OpenSearch Contributors. See NOTICE for details.

Trademark

OpenSearch is a registered trademark of Amazon Web Services.

OpenSearch includes certain Apache-licensed Elasticsearch code from Elasticsearch B.V. and other source code. Elasticsearch B.V. is not the source of that other source code. ELASTICSEARCH is a registered trademark of Elasticsearch B.V.

Name		Name	Last commit message	Last commit date
Latest commit History 59,712 Commits
.ci		.ci
.github		.github
.idea		.idea
benchmarks		benchmarks
buildSrc		buildSrc
client		client
dev-tools		dev-tools
distribution		distribution
doc-tools		doc-tools
docs		docs
gradle		gradle
libs		libs
licenses		licenses
modules		modules
plugins		plugins
qa		qa
release-notes		release-notes
rest-api-spec		rest-api-spec
sandbox		sandbox
scripts		scripts
server		server
test		test
.dir-locals.el		.dir-locals.el
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.lychee.excludes		.lychee.excludes
.whitesource		.whitesource
ADMINS.md		ADMINS.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DEVELOPER_GUIDE.md		DEVELOPER_GUIDE.md
LICENSE.txt		LICENSE.txt
MAINTAINERS.md		MAINTAINERS.md
NOTICE.txt		NOTICE.txt
PERFORMANCE_BENCHMARKS.md		PERFORMANCE_BENCHMARKS.md
README.md		README.md
RELEASING.md		RELEASING.md
SECURITY.md		SECURITY.md
TESTING.md		TESTING.md
TRIAGING.md		TRIAGING.md
Vagrantfile		Vagrantfile
build.gradle		build.gradle
codecov.yml		codecov.yml
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle
whitesource.config		whitesource.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Running OpenSearch with etcd cluster state

Install and launch etcd

Run three OpenSearch nodes from this branch

Push some state to etcd to start the data nodes

Add a coordinator

Heartbeat and Health Data

Welcome!

Project Resources

Code of Conduct

Security

License

Copyright

Trademark

About

Uh oh!

Releases

Packages

Languages

License

bryanlb/OpenSearch

Folders and files

Latest commit

History

Repository files navigation

Running OpenSearch with etcd cluster state

Install and launch etcd

Run three OpenSearch nodes from this branch

Push some state to etcd to start the data nodes

Add a coordinator

Heartbeat and Health Data

Welcome!

Project Resources

Code of Conduct

Security

License

Copyright

Trademark

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages