Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 82 additions & 0 deletions FOR_CONTRIBUTORS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# New Developer Guide - Data Cloud Custom Code Python SDK

Welcome to the Salesforce Data Cloud Custom Code Python SDK! This guide will help you get started with development and contribution to this repository.

## 🚀 Quick Start

### Prerequisites

See the [Prerequisites section in README.md](./README.md#prerequisites) for complete setup requirements.

### Initial Setup

1. **Clone the repository**
```bash
git clone <repository-url>
cd datacloud-customcode-python-sdk
```

2. **Set up virtual environment and install dependencies**

**Note**: If you need to set a specific Python version, use `pyenv local 3.11.x` in the project directory.

```bash
python3.11 -m venv .venv
source .venv/bin/activate
pip install poetry
make develop
```

3. **Verify installation**
```bash
datacustomcode version
```

4. **Initialize a project for development work**

**Note**: To test your changes and develop new features, initialize a sample project:

```bash
# Create a new directory for your test project
mkdir my-test-project
cd my-test-project

# Initialize a new Data Cloud custom code project
datacustomcode init .

# Test your SDK modifications against the sample project with:
datacustomcode run ./payload/entrypoint.py
```

**Tip**: See the [README.md](./README.md) for additional `datacustomcode` commands (`scan`, `deploy`, `zip`) to test specific code paths and validate your SDK changes thoroughly.

## 🔧 Makefile Commands

The project includes a comprehensive Makefile for common development tasks:

```bash
# Clean build artifacts, caches and temporary files
make clean

# Build package distribution
make package

# Install main dependencies only
make install

# Install dependencies for full development setup
make develop

# Run code quality checks
make lint

# Perform static type checking
make mypy

# Run complete test suite
make test
```

---

**Welcome to the community!** If you have any questions or need help getting started, don't hesitate to create an issue in the repository or reach out to the maintainers through the project's communication channels.
58 changes: 46 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Use of this project with Salesforce is subject to the [TERMS OF USE](./TERMS_OF_

## Prerequisites

- Python 3.11 (If your system version is different, we recommend using [pyenv](https://github.com/pyenv/pyenv) to configure 3.11)
- **Python 3.11 only** (currently supported version - if your system version is different, we recommend using [pyenv](https://github.com/pyenv/pyenv) to configure 3.11)
- [Azul Zulu OpenJDK 17.x](https://www.azul.com/downloads/?version=java-17-lts&package=jdk#zulu)
- Docker support like [Docker Desktop](https://docs.docker.com/desktop/)
- A salesforce org with some DLOs or DMOs with data and this feature enabled (it is not GA)
Expand Down Expand Up @@ -69,6 +69,13 @@ datacustomcode run ./payload/entrypoint.py
> The example entrypoint.py requires a `Account_Home__dll` DLO to be present. And in order to deploy the script (next step), the output DLO (which is `Account_Home_copy__dll` in the example entrypoint.py) also needs to exist and be in the same dataspace as `Account_Home__dll`.

After modifying the `entrypoint.py` as needed, using any dependencies you add in the `.venv` virtual environment, you can run this script in Data Cloud:

**Adding Dependencies**: To add new dependencies:
1. Make sure your virtual environment is activated
2. Add dependencies to `requirements.txt`
3. Run `pip install -r requirements.txt`
4. The SDK automatically packages all dependencies when you run `datacustomcode zip`

```zsh
datacustomcode scan ./payload/entrypoint.py
datacustomcode deploy --path ./payload --name my_custom_script
Expand All @@ -80,6 +87,18 @@ datacustomcode deploy --path ./payload --name my_custom_script
You can now use the Salesforce Data Cloud UI to find the created Data Transform and use the `Run Now` button to run it.
Once the Data Transform run is successful, check the DLO your script is writing to and verify the correct records were added.

## Dependency Management

The SDK automatically handles all dependency packaging for Data Cloud deployment. Here's how it works:

1. **Add dependencies to `requirements.txt`** - List any Python packages your script needs
2. **Install locally** - Use `pip install -r requirements.txt` in your virtual environment
3. **Automatic packaging** - When you run `datacustomcode zip`, the SDK automatically:
- Packages all dependencies from `requirements.txt`
- Uses the correct platform and architecture for Data Cloud

**No need to worry about platform compatibility** - the SDK handles this automatically through the Docker-based packaging process.

## API

Your entry point script will define logic using the `Client` object which wraps data access layers.
Expand Down Expand Up @@ -174,25 +193,40 @@ Options:

## Docker usage

After initializing a project with `datacustomcode init my_package`, you might notice a Dockerfile. This file isn't used for the
[Quick Start](#quick-start) approach above, which uses virtual environments, until the `zip` or `deploy` commands are used. When using dependencies
that include [native features](https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html#using-pyspark-native-features)
like C++ or C interop, the platform and architecture may be different between your machine and Data Cloud compute. This is all taken care of
in the `zip` and `deploy` commands, which utilize the Dockerfile which starts `FROM` an image compatible with Data Cloud. However, you may
want to build, run, and test your script on your machine using the same platform and architecture as Data Cloud. You can use the sections below
to test your script in this manner.
The SDK provides Docker-based development options that allow you to test your code in an environment that closely resembles Data Cloud's execution environment.

### How Docker Works with the SDK

When you initialize a project with `datacustomcode init my_package`, a `Dockerfile` is created automatically. This Dockerfile:

- **Isn't used during local development** with virtual environments
- **Becomes active during packaging** when you run `datacustomcode zip` or `deploy`
- **Ensures compatibility** by using the same base image as Data Cloud
- **Handles dependencies automatically** regardless of platform differences

### VS Code Dev Containers

Within your `init`ed package, you will find a `.devcontainer` folder which allows you to run a docker container while developing inside of it.

Read more about Dev Containers here: https://code.visualstudio.com/docs/devcontainers/containers.
#### Setup Instructions

1. Install the VS Code extension "Dev Containers" by microsoft.com.
1. Open your package folder in VS Code, ensuring that the `.devcontainer` folder is at the root of the File Explorer
1. Bring up the Command Palette (on mac: Cmd + Shift + P), and select "Dev Containers: Rebuild and Reopen in Container"
1. Allow the docker image to be built, then you're ready to develop
1. Now if you open a terminal (within the Dev Container window) and `datacustomcode run ./payload/entrypoint.py`, it will run inside a docker container that more closely resembles Data Cloud compute than your machine
2. Open your package folder in VS Code, ensuring that the `.devcontainer` folder is
at the root of the File Explorer
3. Bring up the Command Palette (on mac: Cmd + Shift + P), and select "Dev
Containers: Rebuild and Reopen in Container"
4. Allow the docker image to be built, then you're ready to develop

#### Development Workflow

Once inside the Dev Container:
- **Terminal access**: Open a terminal within the container
- **Run your code**: Execute `datacustomcode run ./payload/entrypoint.py`
- **Environment consistency**: Your code will run inside a docker container that more closely resembles Data Cloud compute than your machine

> [!TIP]
> **IDE Configuration**: Use `CMD+Shift+P` (or `Ctrl+Shift+P` on Windows/Linux), then select "Python: Select Interpreter" to configure the correct Python Interpreter

> [!IMPORTANT]
> Dev Containers get their own tmp file storage, so you'll need to re-run `datacustomcode configure` every time you "Rebuild and Reopen in Container".
Expand Down