Skip to content

bryanlb/OpenSearch

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Running OpenSearch with etcd cluster state

Within this branch, I'm updating the README to explain how to get started with etcd.

Install and launch etcd

See instructions at https://etcd.io/docs/v3.5/install/.

On my Mac, this meant brew install etcd. Then I just ran the etcd executable and left it running in a terminal tab.

On Ubuntu, I needed to run sudo apt install etcd-server etcd-client to get both the etcd and etcdctl commands. Installing the server automatically started the etcd process.

You should also get the etcdctl command line tool included. You can interact with the running local etcd instance as follows:

# Write value 'bar' to key 'foo'
% etcdctl put foo bar
OK

# Read the value from key 'foo'
% etcdctl get foo
foo
bar

# Get all keys whose first byte is between ' ' (the earliest printable character) and '~' (the last)
%  etcdctl get ' ' '~'
foo
bar

# Delete the entry for key 'foo'
% etcdctl del foo
1

Run three OpenSearch nodes from this branch

In order to validate that we can form a working distributed system without a cluster, we will start three OpenSearch nodes locally. The first will serve as a coordinator, while the other two will be data nodes

# Clone the repo
% git clone https://github.com/msfroh/OpenSearch.git

# Enter the cloned repo
% cd OpenSearch

# Checkout the correct branch
% git checkout clusterless_datanode

# Run with the cluster-etcd plugin loaded and launch three nodes. We also need to set the clusterless mode feature flag.
% ./gradlew run -PinstalledPlugins="['cluster-etcd']" -PnumNodes=3 -Dtests.opensearch.opensearch.experimental.feature.clusterless.enabled=true

# In another tab, check the local cluster state for each node

# In the examples below, this will be the coordinator node. Note that the node name is runTask-0.
% curl 'http://localhost:9200/_cluster/state?local&pretty'

# In the examples below, this will be the first data node. Note that the node name is runTask-1.
% curl 'http://localhost:9201/_cluster/state?local&pretty'

# In the examples below, this will be the second data node. Note that the node name is runTask-2.
% curl 'http://localhost:9202/_cluster/state?local&pretty'

Push some state to etcd to start the data nodes

The cluster-etcd plugin now uses a split metadata approach that separates index configuration into distinct etcd keys:

  • Settings: /indices/{index}/settings - Basic index configuration needed by all nodes
  • Mappings: /indices/{index}/mappings - Field definitions needed primarily by data nodes

This approach reduces etcd storage requirements and simplifies control plane logic by filtering out data plane implementation details.

# Write index settings and mappings separately (new split metadata approach)
# Settings are needed by both data nodes and coordinators
% cat << EOF | etcdctl put runTask/indices/myindex/settings
{
  "index": {
    "number_of_shards": "1",
    "number_of_replicas": "0",
    "uuid": "E8F2-ebqQ1-U4SL6NoPEyw",
    "version": {
      "created": "137227827"
    }
  }
}
EOF

# Mappings are needed by data nodes only (flattened structure)
% cat << EOF | etcdctl put runTask/indices/myindex/mappings
{
  "properties": {
    "title": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      }
    }
  }
}
EOF

# Assign primary for shard 0 of myindex to the node listening on port 9201/9301
% etcdctl put runTask/search-unit/runTask-1/goal-state '{"local_shards":{"myindex":{"0":"PRIMARY"}}}'

# Assign primary for shard 1 of myindex to the node listening on port 9202/9302
% etcdctl put runTask/search-unit/runTask-2/goal-state '{"local_shards":{"myindex":{"1":"PRIMARY"}}}'

# Verify the split metadata was stored correctly
% etcdctl get "runTask/indices/myindex/settings"
% etcdctl get "runTask/indices/myindex/mappings"

# Check all keys to see the new structure
% etcdctl get "" --from-key --keys-only

# Check the local cluster state on each data node
% curl 'http://localhost:9201/_cluster/state?local&pretty'
% curl 'http://localhost:9202/_cluster/state?local&pretty'

# Write a document to each shard. Here we're relying on knowing which shard each doc will land on (from trial and error).
# Note that if you try sending each document to the other data node, it will fail, since the data nodes don't know about
# each other and don't know where to forward the documents.
% curl -X POST -H 'Content-Type: application/json' http://localhost:9201/myindex/_doc/3 -d '{"title":"Hello from shard 0"}'
% curl -X POST -H 'Content-Type: application/json' http://localhost:9202/myindex/_doc/1 -d '{"title":"Hello from shard 1"}'

# Search the document on shard 0
% curl 'http://localhost:9201/myindex/_search?pretty'

# Search the document on shard 1
% curl 'http://localhost:9202/myindex/_search?pretty'

Add a coordinator

The coordinator automatically resolves node names to node IDs by reading health data, eliminating the need to manually extract node IDs and ephemeral IDs.

# Tell the coordinator about the data nodes using their node names (not IDs).
% cat << EOF | etcdctl put runTask/search-unit/runTask-0/goal-state
{
  "remote_shards": {
    "indices": {
      "myindex": {
        "uuid" : "E8F2-ebqQ1-U4SL6NoPEyw",
        "shard_routing" : [
          [
            {"node_name": "runTask-1", "primary": true }
          ],
          [
            {"node_name": "runTask-2", "primary": true }
          ]
        ]
      }
    }
  }
}
EOF

# Search via the coordinator node. You'll see both documents added above
% curl 'http://localhost:9200/myindex/_search?pretty'

# Index a batch of documents (surely hitting both shards) via the coordinator node
% curl -X POST -H 'Content-Type: application/json' http://localhost:9200/myindex/_bulk -d '
{ "index": {"_id":"2"}}
{"title": "Document 2"}
{ "index": {"_id":"4"}}
{"title": "Document 4"}
{ "index": {"_id":"5"}}
{"title": "Document 5"}
{ "index": {"_id":"6"}}
{"title": "Document 6"}
{ "index": {"_id":"7"}}
{"title": "Document 7"}
{ "index": {"_id":"8"}}
{"title": "Document 8"}
{ "index": {"_id":"9"}}
{"title": "Document 9"}
{ "index": {"_id":"10"}}
{"title": "Document 10"}
'

# Search via the coordinator node. You'll see 10 documents. If you search each data node you'll see around half.
% curl 'http://localhost:9200/myindex/_search?pretty'

Heartbeat and Health Data

Nodes automatically publish health and status information to ETCD at the path {cluster_name}/search-unit/{node_name}/actual-state.

# View heartbeat data for all nodes
% etcdctl get "runTask/search-unit/" --prefix

# View specific node's health data  
% etcdctl get "runTask/search-unit/<nodename>/actual-state"

Chat Documentation Code Coverage Untriaged Issues Security Vulnerabilities Open Issues Open Pull Requests 2.19.3 Open Issues 2.18.1 Open Issues 3.0.0 Open Issues GHA gradle check GHA validate pull request GHA precommit Jenkins gradle check job

Welcome!

OpenSearch is a community-driven, open source fork of Elasticsearch and Kibana following the license change in early 2021. We're looking to sustain (and evolve!) a search and analytics suite for the multitude of businesses who are dependent on the rights granted by the original, Apache v2.0 License.

Project Resources

Code of Conduct

The project's Code of Conduct outlines our expectations for all participants in our community, based on the OpenSearch Code of Conduct. Please contact [email protected] with any additional questions or comments.

Security

If you discover a potential security issue in this project we ask that you notify OpenSearch Security directly via email to [email protected]. Please do not create a public GitHub issue.

License

This project is licensed under the Apache v2.0 License.

Copyright

Copyright OpenSearch Contributors. See NOTICE for details.

Trademark

OpenSearch is a registered trademark of Amazon Web Services.

OpenSearch includes certain Apache-licensed Elasticsearch code from Elasticsearch B.V. and other source code. Elasticsearch B.V. is not the source of that other source code. ELASTICSEARCH is a registered trademark of Elasticsearch B.V.

About

πŸ”Ž Open source distributed and RESTful search engine.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 99.7%
  • Groovy 0.3%
  • Shell 0.0%
  • Batchfile 0.0%
  • ANTLR 0.0%
  • Dockerfile 0.0%