diff --git a/FOR_CONTRIBUTORS.md b/FOR_CONTRIBUTORS.md new file mode 100644 index 0000000..651a60e --- /dev/null +++ b/FOR_CONTRIBUTORS.md @@ -0,0 +1,82 @@ +# New Developer Guide - Data Cloud Custom Code Python SDK + +Welcome to the Salesforce Data Cloud Custom Code Python SDK! This guide will help you get started with development and contribution to this repository. + +## 🚀 Quick Start + +### Prerequisites + +See the [Prerequisites section in README.md](./README.md#prerequisites) for complete setup requirements. + +### Initial Setup + +1. **Clone the repository** + ```bash + git clone + cd datacloud-customcode-python-sdk + ``` + +2. **Set up virtual environment and install dependencies** + + **Note**: If you need to set a specific Python version, use `pyenv local 3.11.x` in the project directory. + + ```bash + python3.11 -m venv .venv + source .venv/bin/activate + pip install poetry + make develop + ``` + +3. **Verify installation** + ```bash + datacustomcode version + ``` + +4. **Initialize a project for development work** + + **Note**: To test your changes and develop new features, initialize a sample project: + + ```bash + # Create a new directory for your test project + mkdir my-test-project + cd my-test-project + + # Initialize a new Data Cloud custom code project + datacustomcode init . + + # Test your SDK modifications against the sample project with: + datacustomcode run ./payload/entrypoint.py + ``` + + **Tip**: See the [README.md](./README.md) for additional `datacustomcode` commands (`scan`, `deploy`, `zip`) to test specific code paths and validate your SDK changes thoroughly. + +## 🔧 Makefile Commands + +The project includes a comprehensive Makefile for common development tasks: + +```bash +# Clean build artifacts, caches and temporary files +make clean + +# Build package distribution +make package + +# Install main dependencies only +make install + +# Install dependencies for full development setup +make develop + +# Run code quality checks +make lint + +# Perform static type checking +make mypy + +# Run complete test suite +make test +``` + +--- + +**Welcome to the community!** If you have any questions or need help getting started, don't hesitate to create an issue in the repository or reach out to the maintainers through the project's communication channels. diff --git a/README.md b/README.md index cc21607..d538d34 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@ Use of this project with Salesforce is subject to the [TERMS OF USE](./TERMS_OF_ ## Prerequisites -- Python 3.11 (If your system version is different, we recommend using [pyenv](https://github.com/pyenv/pyenv) to configure 3.11) +- **Python 3.11 only** (currently supported version - if your system version is different, we recommend using [pyenv](https://github.com/pyenv/pyenv) to configure 3.11) - [Azul Zulu OpenJDK 17.x](https://www.azul.com/downloads/?version=java-17-lts&package=jdk#zulu) - Docker support like [Docker Desktop](https://docs.docker.com/desktop/) - A salesforce org with some DLOs or DMOs with data and this feature enabled (it is not GA) @@ -69,6 +69,13 @@ datacustomcode run ./payload/entrypoint.py > The example entrypoint.py requires a `Account_Home__dll` DLO to be present. And in order to deploy the script (next step), the output DLO (which is `Account_Home_copy__dll` in the example entrypoint.py) also needs to exist and be in the same dataspace as `Account_Home__dll`. After modifying the `entrypoint.py` as needed, using any dependencies you add in the `.venv` virtual environment, you can run this script in Data Cloud: + +**Adding Dependencies**: To add new dependencies: +1. Make sure your virtual environment is activated +2. Add dependencies to `requirements.txt` +3. Run `pip install -r requirements.txt` +4. The SDK automatically packages all dependencies when you run `datacustomcode zip` + ```zsh datacustomcode scan ./payload/entrypoint.py datacustomcode deploy --path ./payload --name my_custom_script @@ -80,6 +87,18 @@ datacustomcode deploy --path ./payload --name my_custom_script You can now use the Salesforce Data Cloud UI to find the created Data Transform and use the `Run Now` button to run it. Once the Data Transform run is successful, check the DLO your script is writing to and verify the correct records were added. +## Dependency Management + +The SDK automatically handles all dependency packaging for Data Cloud deployment. Here's how it works: + +1. **Add dependencies to `requirements.txt`** - List any Python packages your script needs +2. **Install locally** - Use `pip install -r requirements.txt` in your virtual environment +3. **Automatic packaging** - When you run `datacustomcode zip`, the SDK automatically: + - Packages all dependencies from `requirements.txt` + - Uses the correct platform and architecture for Data Cloud + +**No need to worry about platform compatibility** - the SDK handles this automatically through the Docker-based packaging process. + ## API Your entry point script will define logic using the `Client` object which wraps data access layers. @@ -174,25 +193,40 @@ Options: ## Docker usage -After initializing a project with `datacustomcode init my_package`, you might notice a Dockerfile. This file isn't used for the -[Quick Start](#quick-start) approach above, which uses virtual environments, until the `zip` or `deploy` commands are used. When using dependencies -that include [native features](https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html#using-pyspark-native-features) -like C++ or C interop, the platform and architecture may be different between your machine and Data Cloud compute. This is all taken care of -in the `zip` and `deploy` commands, which utilize the Dockerfile which starts `FROM` an image compatible with Data Cloud. However, you may -want to build, run, and test your script on your machine using the same platform and architecture as Data Cloud. You can use the sections below -to test your script in this manner. +The SDK provides Docker-based development options that allow you to test your code in an environment that closely resembles Data Cloud's execution environment. + +### How Docker Works with the SDK + +When you initialize a project with `datacustomcode init my_package`, a `Dockerfile` is created automatically. This Dockerfile: + +- **Isn't used during local development** with virtual environments +- **Becomes active during packaging** when you run `datacustomcode zip` or `deploy` +- **Ensures compatibility** by using the same base image as Data Cloud +- **Handles dependencies automatically** regardless of platform differences ### VS Code Dev Containers Within your `init`ed package, you will find a `.devcontainer` folder which allows you to run a docker container while developing inside of it. Read more about Dev Containers here: https://code.visualstudio.com/docs/devcontainers/containers. +#### Setup Instructions 1. Install the VS Code extension "Dev Containers" by microsoft.com. -1. Open your package folder in VS Code, ensuring that the `.devcontainer` folder is at the root of the File Explorer -1. Bring up the Command Palette (on mac: Cmd + Shift + P), and select "Dev Containers: Rebuild and Reopen in Container" -1. Allow the docker image to be built, then you're ready to develop -1. Now if you open a terminal (within the Dev Container window) and `datacustomcode run ./payload/entrypoint.py`, it will run inside a docker container that more closely resembles Data Cloud compute than your machine +2. Open your package folder in VS Code, ensuring that the `.devcontainer` folder is +at the root of the File Explorer +3. Bring up the Command Palette (on mac: Cmd + Shift + P), and select "Dev +Containers: Rebuild and Reopen in Container" +4. Allow the docker image to be built, then you're ready to develop + +#### Development Workflow + +Once inside the Dev Container: +- **Terminal access**: Open a terminal within the container +- **Run your code**: Execute `datacustomcode run ./payload/entrypoint.py` +- **Environment consistency**: Your code will run inside a docker container that more closely resembles Data Cloud compute than your machine + +> [!TIP] +> **IDE Configuration**: Use `CMD+Shift+P` (or `Ctrl+Shift+P` on Windows/Linux), then select "Python: Select Interpreter" to configure the correct Python Interpreter > [!IMPORTANT] > Dev Containers get their own tmp file storage, so you'll need to re-run `datacustomcode configure` every time you "Rebuild and Reopen in Container".