Skip to content

Commit 6c69289

Browse files
authored
Readme updates for new developer onboarding (#37)
1 parent e6680a5 commit 6c69289

File tree

2 files changed

+128
-12
lines changed

2 files changed

+128
-12
lines changed

FOR_CONTRIBUTORS.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# New Developer Guide - Data Cloud Custom Code Python SDK
2+
3+
Welcome to the Salesforce Data Cloud Custom Code Python SDK! This guide will help you get started with development and contribution to this repository.
4+
5+
## 🚀 Quick Start
6+
7+
### Prerequisites
8+
9+
See the [Prerequisites section in README.md](./README.md#prerequisites) for complete setup requirements.
10+
11+
### Initial Setup
12+
13+
1. **Clone the repository**
14+
```bash
15+
git clone <repository-url>
16+
cd datacloud-customcode-python-sdk
17+
```
18+
19+
2. **Set up virtual environment and install dependencies**
20+
21+
**Note**: If you need to set a specific Python version, use `pyenv local 3.11.x` in the project directory.
22+
23+
```bash
24+
python3.11 -m venv .venv
25+
source .venv/bin/activate
26+
pip install poetry
27+
make develop
28+
```
29+
30+
3. **Verify installation**
31+
```bash
32+
datacustomcode version
33+
```
34+
35+
4. **Initialize a project for development work**
36+
37+
**Note**: To test your changes and develop new features, initialize a sample project:
38+
39+
```bash
40+
# Create a new directory for your test project
41+
mkdir my-test-project
42+
cd my-test-project
43+
44+
# Initialize a new Data Cloud custom code project
45+
datacustomcode init .
46+
47+
# Test your SDK modifications against the sample project with:
48+
datacustomcode run ./payload/entrypoint.py
49+
```
50+
51+
**Tip**: See the [README.md](./README.md) for additional `datacustomcode` commands (`scan`, `deploy`, `zip`) to test specific code paths and validate your SDK changes thoroughly.
52+
53+
## 🔧 Makefile Commands
54+
55+
The project includes a comprehensive Makefile for common development tasks:
56+
57+
```bash
58+
# Clean build artifacts, caches and temporary files
59+
make clean
60+
61+
# Build package distribution
62+
make package
63+
64+
# Install main dependencies only
65+
make install
66+
67+
# Install dependencies for full development setup
68+
make develop
69+
70+
# Run code quality checks
71+
make lint
72+
73+
# Perform static type checking
74+
make mypy
75+
76+
# Run complete test suite
77+
make test
78+
```
79+
80+
---
81+
82+
**Welcome to the community!** If you have any questions or need help getting started, don't hesitate to create an issue in the repository or reach out to the maintainers through the project's communication channels.

README.md

Lines changed: 46 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Use of this project with Salesforce is subject to the [TERMS OF USE](./TERMS_OF_
88

99
## Prerequisites
1010

11-
- Python 3.11 (If your system version is different, we recommend using [pyenv](https://github.com/pyenv/pyenv) to configure 3.11)
11+
- **Python 3.11 only** (currently supported version - if your system version is different, we recommend using [pyenv](https://github.com/pyenv/pyenv) to configure 3.11)
1212
- [Azul Zulu OpenJDK 17.x](https://www.azul.com/downloads/?version=java-17-lts&package=jdk#zulu)
1313
- Docker support like [Docker Desktop](https://docs.docker.com/desktop/)
1414
- A salesforce org with some DLOs or DMOs with data and this feature enabled (it is not GA)
@@ -69,6 +69,13 @@ datacustomcode run ./payload/entrypoint.py
6969
> The example entrypoint.py requires a `Account_Home__dll` DLO to be present. And in order to deploy the script (next step), the output DLO (which is `Account_Home_copy__dll` in the example entrypoint.py) also needs to exist and be in the same dataspace as `Account_Home__dll`.
7070
7171
After modifying the `entrypoint.py` as needed, using any dependencies you add in the `.venv` virtual environment, you can run this script in Data Cloud:
72+
73+
**Adding Dependencies**: To add new dependencies:
74+
1. Make sure your virtual environment is activated
75+
2. Add dependencies to `requirements.txt`
76+
3. Run `pip install -r requirements.txt`
77+
4. The SDK automatically packages all dependencies when you run `datacustomcode zip`
78+
7279
```zsh
7380
datacustomcode scan ./payload/entrypoint.py
7481
datacustomcode deploy --path ./payload --name my_custom_script --compute-type CPU_L
@@ -85,6 +92,18 @@ datacustomcode deploy --path ./payload --name my_custom_script --compute-type CP
8592
You can now use the Salesforce Data Cloud UI to find the created Data Transform and use the `Run Now` button to run it.
8693
Once the Data Transform run is successful, check the DLO your script is writing to and verify the correct records were added.
8794

95+
## Dependency Management
96+
97+
The SDK automatically handles all dependency packaging for Data Cloud deployment. Here's how it works:
98+
99+
1. **Add dependencies to `requirements.txt`** - List any Python packages your script needs
100+
2. **Install locally** - Use `pip install -r requirements.txt` in your virtual environment
101+
3. **Automatic packaging** - When you run `datacustomcode zip`, the SDK automatically:
102+
- Packages all dependencies from `requirements.txt`
103+
- Uses the correct platform and architecture for Data Cloud
104+
105+
**No need to worry about platform compatibility** - the SDK handles this automatically through the Docker-based packaging process.
106+
88107
## API
89108

90109
Your entry point script will define logic using the `Client` object which wraps data access layers.
@@ -180,25 +199,40 @@ Options:
180199

181200
## Docker usage
182201

183-
After initializing a project with `datacustomcode init my_package`, you might notice a Dockerfile. This file isn't used for the
184-
[Quick Start](#quick-start) approach above, which uses virtual environments, until the `zip` or `deploy` commands are used. When using dependencies
185-
that include [native features](https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html#using-pyspark-native-features)
186-
like C++ or C interop, the platform and architecture may be different between your machine and Data Cloud compute. This is all taken care of
187-
in the `zip` and `deploy` commands, which utilize the Dockerfile which starts `FROM` an image compatible with Data Cloud. However, you may
188-
want to build, run, and test your script on your machine using the same platform and architecture as Data Cloud. You can use the sections below
189-
to test your script in this manner.
202+
The SDK provides Docker-based development options that allow you to test your code in an environment that closely resembles Data Cloud's execution environment.
203+
204+
### How Docker Works with the SDK
205+
206+
When you initialize a project with `datacustomcode init my_package`, a `Dockerfile` is created automatically. This Dockerfile:
207+
208+
- **Isn't used during local development** with virtual environments
209+
- **Becomes active during packaging** when you run `datacustomcode zip` or `deploy`
210+
- **Ensures compatibility** by using the same base image as Data Cloud
211+
- **Handles dependencies automatically** regardless of platform differences
190212

191213
### VS Code Dev Containers
192214

193215
Within your `init`ed package, you will find a `.devcontainer` folder which allows you to run a docker container while developing inside of it.
194216

195217
Read more about Dev Containers here: https://code.visualstudio.com/docs/devcontainers/containers.
218+
#### Setup Instructions
196219

197220
1. Install the VS Code extension "Dev Containers" by microsoft.com.
198-
1. Open your package folder in VS Code, ensuring that the `.devcontainer` folder is at the root of the File Explorer
199-
1. Bring up the Command Palette (on mac: Cmd + Shift + P), and select "Dev Containers: Rebuild and Reopen in Container"
200-
1. Allow the docker image to be built, then you're ready to develop
201-
1. Now if you open a terminal (within the Dev Container window) and `datacustomcode run ./payload/entrypoint.py`, it will run inside a docker container that more closely resembles Data Cloud compute than your machine
221+
2. Open your package folder in VS Code, ensuring that the `.devcontainer` folder is
222+
at the root of the File Explorer
223+
3. Bring up the Command Palette (on mac: Cmd + Shift + P), and select "Dev
224+
Containers: Rebuild and Reopen in Container"
225+
4. Allow the docker image to be built, then you're ready to develop
226+
227+
#### Development Workflow
228+
229+
Once inside the Dev Container:
230+
- **Terminal access**: Open a terminal within the container
231+
- **Run your code**: Execute `datacustomcode run ./payload/entrypoint.py`
232+
- **Environment consistency**: Your code will run inside a docker container that more closely resembles Data Cloud compute than your machine
233+
234+
> [!TIP]
235+
> **IDE Configuration**: Use `CMD+Shift+P` (or `Ctrl+Shift+P` on Windows/Linux), then select "Python: Select Interpreter" to configure the correct Python Interpreter
202236
203237
> [!IMPORTANT]
204238
> Dev Containers get their own tmp file storage, so you'll need to re-run `datacustomcode configure` every time you "Rebuild and Reopen in Container".

0 commit comments

Comments
 (0)