-
Notifications
You must be signed in to change notification settings - Fork 5.5k
docs(native): Add documentation for Presto C++ Installation #26718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,265 @@ | ||||||||||
| ======================= | ||||||||||
| Presto C++ Installation | ||||||||||
| ======================= | ||||||||||
|
|
||||||||||
| .. contents:: | ||||||||||
| :local: | ||||||||||
| :backlinks: none | ||||||||||
| :depth: 1 | ||||||||||
|
|
||||||||||
| This step-by-step tutorial provides a beginner-friendly guide on how to install and run a lightweight Presto cluster utilizing a **PrestoDB Java Coordinator** and **Prestissimo (C++) Workers** using Docker. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
|
||||||||||
| The setup uses **Meta's high-performance Velox engine** for worker-side query execution. We will configure a cluster and run a test query with the built-in **TPCH connector**. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remove bold |
||||||||||
|
|
||||||||||
| I. Introducing Prestissimo (Presto C++ Worker) | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| ---------------------------------------------- | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
|
||||||||||
| **Prestissimo** is the **C++ native implementation** of the Presto Worker. It is designed to be a **drop-in replacement** for the traditional Java worker. It is built using **Velox**, a high-performance, open-source C++ database acceleration library created by Meta. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remove bold
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Add link to the. |
||||||||||
|
|
||||||||||
| The adoption of a C++ execution engine is a significant performance innovation for Presto, offering key advantages for data lake analytics: | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
|
||||||||||
| * **Massive Performance Boost:** Prestissimo achieves dramatic increases in **CPU efficiency** and reduces query latency by leveraging native C++ execution, **vectorization**, and **SIMD** (Single Instruction, Multiple Data) instructions. Production results have shown fleet sizes shrinking to nearly a third. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| * **Eliminates Java GC Issues:** By moving the execution engine outside the JVM, this architecture removes the unpredictable performance spikes and pauses often associated with **Java Garbage Collection (GC)**, resulting in more consistent and stable query times. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| * **Explicit Memory Control:** The Velox memory management framework offers **explicit memory accounting** and **arbitration**, providing finer control over resource consumption than the JVM. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
|
||||||||||
| II. Prerequisites | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| ----------------- | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
|
||||||||||
| To follow this tutorial, you need: | ||||||||||
|
|
||||||||||
| * **Docker** installed. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| * Basic familiarity with the Terminal and shell commands. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
|
||||||||||
| III. Setup Guide | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| ---------------- | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
|
||||||||||
| The recommended directory structure uses ``presto-lab`` as the root directory. | ||||||||||
|
|
||||||||||
| Step 1: Create a Working Directory | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove "Step " from all of these numbered headings. |
||||||||||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Edit this formatting line to match the number of characters in the revised line above it. |
||||||||||
|
|
||||||||||
| Create a clean root directory to hold all necessary configuration files and the ``docker-compose.yml`` file. | ||||||||||
|
|
||||||||||
| .. code-block:: bash | ||||||||||
|
|
||||||||||
| mkdir -p ~/presto-lab | ||||||||||
| cd ~/presto-lab | ||||||||||
|
|
||||||||||
| Step 2: Configure the Presto Java Coordinator | ||||||||||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||||||||||
|
|
||||||||||
| The Coordinator requires configuration to define its role, enable the discovery service, and set up a catalog for querying. | ||||||||||
|
|
||||||||||
| A. Create Configuration Directory | ||||||||||
| """"""""""""""""""""""""""""""""" | ||||||||||
|
|
||||||||||
| .. code-block:: bash | ||||||||||
|
|
||||||||||
| mkdir -p coordinator/etc/catalog | ||||||||||
|
|
||||||||||
| This command creates the necessary directories for the coordinator and its catalogs. | ||||||||||
|
|
||||||||||
| B. Configure ``coordinator/etc/config.properties`` | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this be "Create (Note: If you edit this heading, please edit the formatting line below it to match the number of characters.) |
||||||||||
| """""""""""""""""""""""""""""""""""""""""""""""""" | ||||||||||
|
|
||||||||||
| This file enables coordinator mode, the discovery server, and sets the HTTP port to ``8080``. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
This is very good: telling the reader what effect this step accomplishes for them. Nice work! |
||||||||||
|
|
||||||||||
| .. code-block:: properties | ||||||||||
|
|
||||||||||
| # coordinator/etc/config.properties | ||||||||||
| coordinator=true | ||||||||||
| node-scheduler.include-coordinator=true | ||||||||||
| http-server.http.port=8080 | ||||||||||
| discovery-server.enabled=true | ||||||||||
| discovery.uri=http://localhost:8080 | ||||||||||
|
|
||||||||||
| * ``coordinator=true``: Enables the coordinator mode. | ||||||||||
| * ``discovery-server.enabled=true``: Designates the coordinator as the host for the worker discovery service. | ||||||||||
|
|
||||||||||
| C. Configure ``coordinator/etc/jvm.config`` | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Create As an alternative, you could change the heading to "Configure the JVM" and add a first sentence like Whatever you choose, be consistent here and with line 62 "B. Configure |
||||||||||
| """"""""""""""""""""""""""""""""""""""""""" | ||||||||||
|
|
||||||||||
| These are standard **Java 17** flags for Presto, optimizing the JVM. | ||||||||||
|
|
||||||||||
| .. code-block:: text | ||||||||||
|
|
||||||||||
| # coordinator/etc/jvm.config | ||||||||||
| -server | ||||||||||
| -Xmx1G | ||||||||||
| -XX:+UseG1GC | ||||||||||
| -XX:G1HeapRegionSize=32M | ||||||||||
| -XX:+UseGCOverheadLimit | ||||||||||
| -XX:+ExplicitGCInvokesConcurrent | ||||||||||
| -XX:+HeapDumpOnOutOfMemoryError | ||||||||||
| -XX:+ExitOnOutOfMemoryError | ||||||||||
| -Djdk.attach.allowAttachSelf=true | ||||||||||
| --add-opens=java.base/java.io=ALL-UNNAMED | ||||||||||
| --add-opens=java.base/java.lang=ALL-UNNAMED | ||||||||||
| --add-opens=java.base/java.lang.ref=ALL-UNNAMED | ||||||||||
| --add-opens=java.base/java.lang.reflect=ALL-UNNAMED | ||||||||||
| --add-opens=java.base/java.net=ALL-UNNAMED | ||||||||||
| --add-opens=java.base/java.nio=ALL-UNNAMED | ||||||||||
| --add-opens=java.base/java.security=ALL-UNNAMED | ||||||||||
| --add-opens=java.base/javax.security.auth=ALL-UNNAMED | ||||||||||
| --add-opens=java.base/javax.security.auth.login=ALL-UNNAMED | ||||||||||
| --add-opens=java.base/java.text=ALL-UNNAMED | ||||||||||
| --add-opens=java.base/java.util=ALL-UNNAMED | ||||||||||
| --add-opens=java.base/java.util.concurrent=ALL-UNNAMED | ||||||||||
| --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED | ||||||||||
| --add-opens=java.base/java.util.regex=ALL-UNNAMED | ||||||||||
| --add-opens=java.base/jdk.internal.loader=ALL-UNNAMED | ||||||||||
| --add-opens=java.base/sun.security.action=ALL-UNNAMED | ||||||||||
| --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED | ||||||||||
|
|
||||||||||
| D. Configure ``coordinator/etc/node.properties`` | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Create? Again, whatever you change or not, be consistent. |
||||||||||
| """""""""""""""""""""""""""""""""""""""""""""""" | ||||||||||
|
|
||||||||||
| This file sets the node environment and the data directory. | ||||||||||
|
|
||||||||||
| .. code-block:: properties | ||||||||||
|
|
||||||||||
| # coordinator/etc/node.properties | ||||||||||
| node.id=${ENV:HOSTNAME} | ||||||||||
| node.environment=test | ||||||||||
| node.data-dir=/var/lib/presto/data | ||||||||||
|
|
||||||||||
| E. Add TPCH Catalog Configuration | ||||||||||
| """"""""""""""""""""""""""""""""" | ||||||||||
|
|
||||||||||
| The **TPCH catalog** enables running test queries against an in-memory dataset. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
|
||||||||||
| .. code-block:: properties | ||||||||||
|
|
||||||||||
| # coordinator/etc/catalog/tpch.properties | ||||||||||
| connector.name=tpch | ||||||||||
|
|
||||||||||
| Step 3: Configure the Prestissimo (C++) Worker | ||||||||||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||||||||||
|
|
||||||||||
| The Worker must be configured to locate the Coordinator/Discovery service and identify itself within the network. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Avoid slashes. See the GitLab documentation style guide recommended word list entry for slashes for discussion and alternative suggestions.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
|
||||||||||
| A. Create Worker Configuration Directory | ||||||||||
| """""""""""""""""""""""""""""""""""""""" | ||||||||||
|
|
||||||||||
| .. code-block:: bash | ||||||||||
|
|
||||||||||
| mkdir -p worker-1/etc/catalog | ||||||||||
|
|
||||||||||
| B. Configure ``worker-1/etc/config.properties`` | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Create? |
||||||||||
| """"""""""""""""""""""""""""""""""""""""""""""" | ||||||||||
|
|
||||||||||
| This configuration points the worker to the discovery service running on the coordinator. | ||||||||||
|
|
||||||||||
| .. code-block:: properties | ||||||||||
|
|
||||||||||
| # worker-1/etc/config.properties | ||||||||||
| discovery.uri=http://coordinator:8080 | ||||||||||
| presto.version=0.288-15f14bb | ||||||||||
| http-server.http.port=7777 | ||||||||||
| shutdown-onset-sec=1 | ||||||||||
| runtime-metrics-collection-enabled=true | ||||||||||
|
|
||||||||||
| * ``discovery.uri=http://coordinator:8080``: This uses the **coordinator** service name (as defined in the ``docker-compose.yml`` file) for network communication within Docker. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
|
||||||||||
| C. Configure ``worker-1/etc/node.properties`` | ||||||||||
| """"""""""""""""""""""""""""""""""""""""""""" | ||||||||||
|
|
||||||||||
| This defines the worker's internal address for reliable registration. | ||||||||||
|
|
||||||||||
| .. code-block:: properties | ||||||||||
|
|
||||||||||
| # worker-1/etc/node.properties | ||||||||||
| node.environment=test | ||||||||||
| node.internal-address=worker-1 | ||||||||||
| node.location=docker | ||||||||||
| node.id=worker-1 | ||||||||||
|
|
||||||||||
| * ``node.internal-address=worker-1``: This setting matches the service name defined in Docker Compose. | ||||||||||
|
|
||||||||||
| D. Add TPCH Catalog Configuration | ||||||||||
| """"""""""""""""""""""""""""""""" | ||||||||||
|
|
||||||||||
| The worker requires the same catalog definition as the coordinator to execute the query stages. | ||||||||||
|
|
||||||||||
| .. code-block:: properties | ||||||||||
|
|
||||||||||
| # worker-1/etc/catalog/tpch.properties | ||||||||||
| connector.name=tpch | ||||||||||
|
|
||||||||||
| .. note:: | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
This doesn't need to be a note, and can be moved to the beginning of step 3 for better visibility. |
||||||||||
| You can repeat Step 3 to add more workers, such as **worker-2**. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
|
||||||||||
| Step 4: Create ``docker-compose.yml`` | ||||||||||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||||||||||
|
|
||||||||||
| This file orchestrates both the Java Coordinator and the C++ Worker containers. Create the file ``docker-compose.yml`` in your ``~/presto-lab`` directory. | ||||||||||
|
|
||||||||||
| .. code-block:: yaml | ||||||||||
|
|
||||||||||
| # docker-compose.yml | ||||||||||
| services: | ||||||||||
| coordinator: | ||||||||||
| image: public.ecr.aws/oss-presto/presto:latest | ||||||||||
| platform: linux/amd64 | ||||||||||
| container_name: presto-coordinator | ||||||||||
| hostname: coordinator | ||||||||||
| ports: | ||||||||||
| - "8080:8080" | ||||||||||
| volumes: | ||||||||||
| - ./coordinator/etc:/opt/presto-server/etc:ro | ||||||||||
| restart: unless-stopped | ||||||||||
|
|
||||||||||
| worker-1: | ||||||||||
| image: public.ecr.aws/oss-presto/presto-native:latest | ||||||||||
| platform: linux/amd64 | ||||||||||
| container_name: prestissimo-worker-1 | ||||||||||
| hostname: worker-1 | ||||||||||
| depends_on: | ||||||||||
| - coordinator | ||||||||||
| volumes: | ||||||||||
| - ./worker-1/etc:/opt/presto-server/etc:ro | ||||||||||
| restart: unless-stopped | ||||||||||
|
|
||||||||||
| worker-2: | ||||||||||
| image: public.ecr.aws/oss-presto/presto-native:latest | ||||||||||
| platform: linux/amd64 | ||||||||||
| container_name: prestissimo-worker-2 | ||||||||||
| hostname: worker-2 | ||||||||||
| depends_on: | ||||||||||
| - coordinator | ||||||||||
| volumes: | ||||||||||
| - ./worker-2/etc:/opt/presto-server/etc:ro | ||||||||||
| restart: unless-stopped | ||||||||||
|
|
||||||||||
| .. important:: | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggest you remove the Important formatting. Paradoxically, putting important information into a Note makes the text smaller, harder to read, and with several lines of smaller text, easier to overlook. The highlighting of the Important box works better with 1 line of text, not 4. |
||||||||||
| * The **coordinator** service uses the standard **Java Presto image** (presto:latest). | ||||||||||
| * The **worker-1** and **worker-2** services use the **Prestissimo (C++ Native) image** (presto-native:latest). | ||||||||||
| * The setting ``platform: linux/amd64`` is essential for users running on Apple Silicon (M1/M2/M3) Macs. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Removing (M1/M2/M3) as it's already out of date: Apple sells M4 and M5 MacBooks as of now, and (M1/M2/M3/M4/M5) is clumsy. It's all Apple Silicon anyway and this way the line isn't out of date when M6 MacBooks ship. |
||||||||||
| * The ``volumes`` section mounts your local configuration directories (``./coordinator/etc``, ``./worker-1/etc``) into the container's expected path (``/opt/presto-server/etc``). | ||||||||||
|
|
||||||||||
| Step 5: Start the Cluster and Verify | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Verify is a good thing to include! Nice work. |
||||||||||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||||||||||
|
|
||||||||||
| A. Start the Cluster | ||||||||||
| """""""""""""""""""" | ||||||||||
|
|
||||||||||
| Use Docker Compose to start the cluster in detached mode (``-d``). | ||||||||||
|
|
||||||||||
| .. code-block:: bash | ||||||||||
|
|
||||||||||
| docker compose up -d | ||||||||||
|
|
||||||||||
| B. Verification | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggest change to "Verify" for consistency. |
||||||||||
| """"""""""""""" | ||||||||||
|
|
||||||||||
| 1. **Check the Web UI:** Open the Presto Web UI at http://localhost:8080. | ||||||||||
|
|
||||||||||
| * *Verification Result:* You should see the UI displaying **3 Active Workers** (1 Coordinator + 2 Workers). | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
|
||||||||||
| 2. **Check Detailed Node Status (SQL Query):** Run the following query to check the detailed status and metadata about every node (Coordinator and Workers). | ||||||||||
|
|
||||||||||
| .. code-block:: sql | ||||||||||
|
|
||||||||||
| select * from system.runtime.nodes; | ||||||||||
|
|
||||||||||
| This confirms the cluster nodes are registered and active. | ||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove bold