Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions presto-docs/src/main/sphinx/presto-cpp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Note: Presto C++ is in active development. See :doc:`Limitations </presto_cpp/li
.. toctree::
:maxdepth: 1

presto_cpp/installation
presto_cpp/features
presto_cpp/sidecar
presto_cpp/limitations
Expand Down
265 changes: 265 additions & 0 deletions presto-docs/src/main/sphinx/presto_cpp/installation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,265 @@
=======================
Presto C++ Installation
=======================

.. contents::
:local:
:backlinks: none
:depth: 1

This step-by-step tutorial provides a beginner-friendly guide on how to install and run a lightweight Presto cluster utilizing a **PrestoDB Java Coordinator** and **Prestissimo (C++) Workers** using Docker.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove bold

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This step-by-step tutorial provides a beginner-friendly guide on how to install and run a lightweight Presto cluster utilizing a **PrestoDB Java Coordinator** and **Prestissimo (C++) Workers** using Docker.
This guide shows how to install and run a lightweight Presto cluster utilizing a **PrestoDB Java Coordinator** and **Prestissimo (C++) Workers** using Docker.


The setup uses **Meta's high-performance Velox engine** for worker-side query execution. We will configure a cluster and run a test query with the built-in **TPCH connector**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove bold


I. Introducing Prestissimo (Presto C++ Worker)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
I. Introducing Prestissimo (Presto C++ Worker)
Introducing Prestissimo (Presto C++ Worker)

----------------------------------------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
----------------------------------------------
-------------------------------------------


**Prestissimo** is the **C++ native implementation** of the Presto Worker. It is designed to be a **drop-in replacement** for the traditional Java worker. It is built using **Velox**, a high-performance, open-source C++ database acceleration library created by Meta.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove bold

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Prestissimo** is the **C++ native implementation** of the Presto Worker. It is designed to be a **drop-in replacement** for the traditional Java worker. It is built using **Velox**, a high-performance, open-source C++ database acceleration library created by Meta.
Prestissimo is the C++ native implementation of the Presto :ref:`overview/concepts:worker`. It is designed to be a drop-in replacement for the traditional Java worker. Prestissimo is built using Velox, a high-performance, open-source C++ database acceleration library created by Meta.

Add link to the.


The adoption of a C++ execution engine is a significant performance innovation for Presto, offering key advantages for data lake analytics:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The adoption of a C++ execution engine is a significant performance innovation for Presto, offering key advantages for data lake analytics:
A C++ execution engine offers significant advantages for data lake analytics:


* **Massive Performance Boost:** Prestissimo achieves dramatic increases in **CPU efficiency** and reduces query latency by leveraging native C++ execution, **vectorization**, and **SIMD** (Single Instruction, Multiple Data) instructions. Production results have shown fleet sizes shrinking to nearly a third.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* **Massive Performance Boost:** Prestissimo achieves dramatic increases in **CPU efficiency** and reduces query latency by leveraging native C++ execution, **vectorization**, and **SIMD** (Single Instruction, Multiple Data) instructions. Production results have shown fleet sizes shrinking to nearly a third.
* **Performance Boost:** Prestissimo achieves increases in CPU efficiency and reduces query latency by leveraging native C++ execution, vectorization, and SIMD (Single Instruction, Multiple Data) instructions.

* **Eliminates Java GC Issues:** By moving the execution engine outside the JVM, this architecture removes the unpredictable performance spikes and pauses often associated with **Java Garbage Collection (GC)**, resulting in more consistent and stable query times.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* **Eliminates Java GC Issues:** By moving the execution engine outside the JVM, this architecture removes the unpredictable performance spikes and pauses often associated with **Java Garbage Collection (GC)**, resulting in more consistent and stable query times.
* **Eliminates Java Garbage Collection Issues:** By moving the execution engine out of the Java Virtual Machine (JVM), this architecture removes performance spikes and pauses associated with Java Garbage Collection, resulting in more consistent and stable query times.

* **Explicit Memory Control:** The Velox memory management framework offers **explicit memory accounting** and **arbitration**, providing finer control over resource consumption than the JVM.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* **Explicit Memory Control:** The Velox memory management framework offers **explicit memory accounting** and **arbitration**, providing finer control over resource consumption than the JVM.
* **Explicit Memory Control:** The Velox memory management framework offers explicit memory accounting and arbitration, providing finer control over resource consumption than in the JVM.


II. Prerequisites
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
II. Prerequisites
Prerequisites

-----------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-----------------
-------------


To follow this tutorial, you need:

* **Docker** installed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* **Docker** installed.
* Docker installed.

* Basic familiarity with the Terminal and shell commands.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Basic familiarity with the Terminal and shell commands.
* Basic familiarity with the terminal and shell commands.


III. Setup Guide
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
III. Setup Guide
Setup Guide

----------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
----------------
-----------


The recommended directory structure uses ``presto-lab`` as the root directory.

Step 1: Create a Working Directory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove "Step " from all of these numbered headings.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit this formatting line to match the number of characters in the revised line above it.


Create a clean root directory to hold all necessary configuration files and the ``docker-compose.yml`` file.

.. code-block:: bash

mkdir -p ~/presto-lab
cd ~/presto-lab

Step 2: Configure the Presto Java Coordinator
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The Coordinator requires configuration to define its role, enable the discovery service, and set up a catalog for querying.

A. Create Configuration Directory
"""""""""""""""""""""""""""""""""

.. code-block:: bash

mkdir -p coordinator/etc/catalog

This command creates the necessary directories for the coordinator and its catalogs.

B. Configure ``coordinator/etc/config.properties``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be "Create coordinator/etc/config.properties? The file doesn't exist before now.

(Note: If you edit this heading, please edit the formatting line below it to match the number of characters.)

""""""""""""""""""""""""""""""""""""""""""""""""""

This file enables coordinator mode, the discovery server, and sets the HTTP port to ``8080``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This file enables coordinator mode, the discovery server, and sets the HTTP port to ``8080``.
This file enables the coordinator mode, the discovery server, and sets the HTTP port to ``8080``.

This is very good: telling the reader what effect this step accomplishes for them. Nice work!


.. code-block:: properties

# coordinator/etc/config.properties
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
discovery-server.enabled=true
discovery.uri=http://localhost:8080

* ``coordinator=true``: Enables the coordinator mode.
* ``discovery-server.enabled=true``: Designates the coordinator as the host for the worker discovery service.

C. Configure ``coordinator/etc/jvm.config``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create coordinator/etc/jvm.config?

As an alternative, you could change the heading to "Configure the JVM" and add a first sentence like
"In this step, configure the JVM by creating the file coordinator/etc/jvm.config with the following contents:"

Whatever you choose, be consistent here and with line 62 "B. Configure coordinator/etc/config.properties".

"""""""""""""""""""""""""""""""""""""""""""

These are standard **Java 17** flags for Presto, optimizing the JVM.

.. code-block:: text

# coordinator/etc/jvm.config
-server
-Xmx1G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-Djdk.attach.allowAttachSelf=true
--add-opens=java.base/java.io=ALL-UNNAMED
--add-opens=java.base/java.lang=ALL-UNNAMED
--add-opens=java.base/java.lang.ref=ALL-UNNAMED
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED
--add-opens=java.base/java.net=ALL-UNNAMED
--add-opens=java.base/java.nio=ALL-UNNAMED
--add-opens=java.base/java.security=ALL-UNNAMED
--add-opens=java.base/javax.security.auth=ALL-UNNAMED
--add-opens=java.base/javax.security.auth.login=ALL-UNNAMED
--add-opens=java.base/java.text=ALL-UNNAMED
--add-opens=java.base/java.util=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
--add-opens=java.base/java.util.regex=ALL-UNNAMED
--add-opens=java.base/jdk.internal.loader=ALL-UNNAMED
--add-opens=java.base/sun.security.action=ALL-UNNAMED
--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED

D. Configure ``coordinator/etc/node.properties``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create?

Again, whatever you change or not, be consistent.

""""""""""""""""""""""""""""""""""""""""""""""""

This file sets the node environment and the data directory.

.. code-block:: properties

# coordinator/etc/node.properties
node.id=${ENV:HOSTNAME}
node.environment=test
node.data-dir=/var/lib/presto/data

E. Add TPCH Catalog Configuration
"""""""""""""""""""""""""""""""""

The **TPCH catalog** enables running test queries against an in-memory dataset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The **TPCH catalog** enables running test queries against an in-memory dataset.
The TPCH catalog enables running test queries against an in-memory dataset.


.. code-block:: properties

# coordinator/etc/catalog/tpch.properties
connector.name=tpch

Step 3: Configure the Prestissimo (C++) Worker
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The Worker must be configured to locate the Coordinator/Discovery service and identify itself within the network.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid slashes. See the GitLab documentation style guide recommended word list entry for slashes for discussion and alternative suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Worker must be configured to locate the Coordinator/Discovery service and identify itself within the network.
The Worker must be configured to locate the Coordinator/Discovery service and identify itself within the network.
Repeat this step to add more workers, such as ``worker-2``.


A. Create Worker Configuration Directory
""""""""""""""""""""""""""""""""""""""""

.. code-block:: bash

mkdir -p worker-1/etc/catalog

B. Configure ``worker-1/etc/config.properties``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create?

"""""""""""""""""""""""""""""""""""""""""""""""

This configuration points the worker to the discovery service running on the coordinator.

.. code-block:: properties

# worker-1/etc/config.properties
discovery.uri=http://coordinator:8080
presto.version=0.288-15f14bb
http-server.http.port=7777
shutdown-onset-sec=1
runtime-metrics-collection-enabled=true

* ``discovery.uri=http://coordinator:8080``: This uses the **coordinator** service name (as defined in the ``docker-compose.yml`` file) for network communication within Docker.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* ``discovery.uri=http://coordinator:8080``: This uses the **coordinator** service name (as defined in the ``docker-compose.yml`` file) for network communication within Docker.
* ``discovery.uri=http://coordinator:8080``: This uses the coordinator service name as defined in the ``docker-compose.yml`` file for network communication within Docker.


C. Configure ``worker-1/etc/node.properties``
"""""""""""""""""""""""""""""""""""""""""""""

This defines the worker's internal address for reliable registration.

.. code-block:: properties

# worker-1/etc/node.properties
node.environment=test
node.internal-address=worker-1
node.location=docker
node.id=worker-1

* ``node.internal-address=worker-1``: This setting matches the service name defined in Docker Compose.

D. Add TPCH Catalog Configuration
"""""""""""""""""""""""""""""""""

The worker requires the same catalog definition as the coordinator to execute the query stages.

.. code-block:: properties

# worker-1/etc/catalog/tpch.properties
connector.name=tpch

.. note::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. note::

This doesn't need to be a note, and can be moved to the beginning of step 3 for better visibility.

You can repeat Step 3 to add more workers, such as **worker-2**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can repeat Step 3 to add more workers, such as **worker-2**.


Step 4: Create ``docker-compose.yml``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This file orchestrates both the Java Coordinator and the C++ Worker containers. Create the file ``docker-compose.yml`` in your ``~/presto-lab`` directory.

.. code-block:: yaml

# docker-compose.yml
services:
coordinator:
image: public.ecr.aws/oss-presto/presto:latest
platform: linux/amd64
container_name: presto-coordinator
hostname: coordinator
ports:
- "8080:8080"
volumes:
- ./coordinator/etc:/opt/presto-server/etc:ro
restart: unless-stopped

worker-1:
image: public.ecr.aws/oss-presto/presto-native:latest
platform: linux/amd64
container_name: prestissimo-worker-1
hostname: worker-1
depends_on:
- coordinator
volumes:
- ./worker-1/etc:/opt/presto-server/etc:ro
restart: unless-stopped

worker-2:
image: public.ecr.aws/oss-presto/presto-native:latest
platform: linux/amd64
container_name: prestissimo-worker-2
hostname: worker-2
depends_on:
- coordinator
volumes:
- ./worker-2/etc:/opt/presto-server/etc:ro
restart: unless-stopped

.. important::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest you remove the Important formatting.

Paradoxically, putting important information into a Note makes the text smaller, harder to read, and with several lines of smaller text, easier to overlook. The highlighting of the Important box works better with 1 line of text, not 4.

* The **coordinator** service uses the standard **Java Presto image** (presto:latest).
* The **worker-1** and **worker-2** services use the **Prestissimo (C++ Native) image** (presto-native:latest).
* The setting ``platform: linux/amd64`` is essential for users running on Apple Silicon (M1/M2/M3) Macs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* The setting ``platform: linux/amd64`` is essential for users running on Apple Silicon (M1/M2/M3) Macs.
* The setting ``platform: linux/amd64`` is essential for users running on Apple Silicon Macs.

Removing (M1/M2/M3) as it's already out of date: Apple sells M4 and M5 MacBooks as of now, and (M1/M2/M3/M4/M5) is clumsy. It's all Apple Silicon anyway and this way the line isn't out of date when M6 MacBooks ship.

* The ``volumes`` section mounts your local configuration directories (``./coordinator/etc``, ``./worker-1/etc``) into the container's expected path (``/opt/presto-server/etc``).

Step 5: Start the Cluster and Verify
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verify is a good thing to include! Nice work.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A. Start the Cluster
""""""""""""""""""""

Use Docker Compose to start the cluster in detached mode (``-d``).

.. code-block:: bash

docker compose up -d

B. Verification
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest change to "Verify" for consistency.

"""""""""""""""

1. **Check the Web UI:** Open the Presto Web UI at http://localhost:8080.

* *Verification Result:* You should see the UI displaying **3 Active Workers** (1 Coordinator + 2 Workers).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* *Verification Result:* You should see the UI displaying **3 Active Workers** (1 Coordinator + 2 Workers).
* *Verification Result:* You should see the UI displaying 3 Active Workers: 1 Coordinator and 2 Workers.


2. **Check Detailed Node Status (SQL Query):** Run the following query to check the detailed status and metadata about every node (Coordinator and Workers).

.. code-block:: sql

select * from system.runtime.nodes;

This confirms the cluster nodes are registered and active.
Loading