diff --git a/README.rst b/README.rst index f806ccae..19eb49b2 100644 --- a/README.rst +++ b/README.rst @@ -24,6 +24,8 @@ Dask Cloud Provider :alt: Conda Forge -Native Cloud integration for Dask. This library intends to allow people to -create dask clusters on a given cloud provider with no set up other than having -credentials. +Native Cloud integration for Dask. + +This library provides tools to enable Dask clusters to more natively integrate with the cloud. +It includes cluster managers to create dask clusters on a given cloud provider using native resources, +plugins to more closely integrate Dask components with the cloud platform they are running on and documentation to empower all folks running Dask on the cloud. diff --git a/doc/source/alternatives.rst b/doc/source/alternatives.rst new file mode 100644 index 00000000..4f9f809d --- /dev/null +++ b/doc/source/alternatives.rst @@ -0,0 +1,51 @@ +Alternatives +============ + +Many tools and services exist today for deploying Dask clusters, many of which are commonly used on the cloud. +This project aims to provide cloud native plugins and tools for Dask which can often compliment other approaches. + +Community tools +--------------- + +Dask has a `vibrant ecosystem of community tooling for deploying Dask `_ on various platforms. Many of which can be used on public cloud. + +Kubernetes +^^^^^^^^^^ + +`Kubernetes `_ is an extremely popular project for managing cloud workloads and is part of the broader `Cloud Native Computing Foundation (CNCF) `_ ecosystem. + +Dask has many options for `deploying clusters on Kubernetes `_. + +HPC on Cloud +^^^^^^^^^^^^ + +Many popular HPC scheduling tools are used on the cloud and support features such as elastic scaling. +If you are already leveraging HPC tools like `SLURM on the cloud `_ then `Dask has great integration with HPC schedulers `_. + +Hadoop/Spark/Yarn +^^^^^^^^^^^^^^^^^ + +Many cloud platforms have popular managed services for running Apache Spark workloads. + +If you're already using a managed map-reduce service like `Amazon EMR `_ then check out `dask-yarn `_. + +Nebari +^^^^^^ + +`Nebari `_ is an open source data science platform which can be run locally or on a cloud platform of your choice. +It includes a managed Dask service built on `Dask Gateway `_ for managing Dask clusters. + +Managed Services +---------------- + +Cloud vendors and third-party companies also offer managed Dask clusters as a service + +Coiled +^^^^^^ + +`Coiled `_ is a mature managed Dask service that spawns clusters in your cloud account and allows you to manage them via a central control plane. + +Saturn Cloud +^^^^^^^^^^^^ + +`Saturn Cloud `_ is a managed data science platform with hosted Dask clusters or the option to deploy them in your own AWS account. diff --git a/doc/source/index.rst b/doc/source/index.rst index bc7dda10..a87aaf94 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -3,6 +3,15 @@ Dask Cloud Provider *Native Cloud integration for Dask.* +This package contains open source tools to help you deploy and operate Dask clusters on the cloud. +It contains cluster managers which can help you launch clusters using native cloud resources like VMs or containers, +it has tools and plugins for use in ANY cluster running on the cloud and is a great source of documentation for Dask cloud deployments. + +It is by no means the "complete" or "only" way to run Dask on the cloud, check out the :doc:`alternatives` page for more tools. + +Cluster managers +---------------- + This package provides classes for constructing and managing ephemeral Dask clusters on various cloud platforms. @@ -52,6 +61,26 @@ this code. with Client(cluster) as client: # Do some Dask things +Plugins +------- + +Dask components like Schedulers and Workers can benefit from being cloud-aware. +This project has plugins and tools that extend these components. + +One example is having the workers check for termination warnings when running on ephemeral/spot instances and begin migrating data to other workers. + +For Azure VMs you could use the :class:`dask_cloudprovider.azure.AzurePreemptibleWorkerPlugin` to do this. +It can be used on any cluster that has workers running on Azure VMs, not just ones created with :class:`dask_cloudprovider.azure.AzureVMCluster`. + +.. code-block:: python + + from distributed import Client + client = Client("") + + from dask_cloudprovider.azure import AzurePreemptibleWorkerPlugin + client.register_worker_plugin(AzurePreemptibleWorkerPlugin()) + + .. toctree:: :maxdepth: 2 :hidden: @@ -59,6 +88,7 @@ this code. installation.rst config.rst + alternatives.rst .. toctree:: :maxdepth: 2