✨ Source Salesforce: Bulk stream uses async CDK components #45678

maxi297 · 2024-09-19T16:37:20Z

What

Following the CDK release of the improve async job components, we will be able to release Salesforce that relies on those CDK components in order to sync bulk streams.

How

See #45673 for more details.

We are currently rolling out this change using dev version 2.5.32-dev.0a58b0968c. The goal for the progressive rollout is roughly:

2024-09-19: internal workspaces
2024-09-20: 5 external workspaces (at least one including parent streams) (see update here)
2024-09-23: 20 external workspaces
2024-09-24: Full release if no issues discovered

Review guide

See #45673 for more details.

User Impact

There should be no user impact as the goal is to make the maintenance easier for us.

There is one case we willingly changed the behavior and it his here where before, the connector would retry a whole job given we could not download the result. It has been removed because:

It is harder to port back to the new version of the CDK because the job creation and the download are now in two separate code paths (stream_slices and read_records instead of just read_records)
The value is very unclear as this was done as an attempted patch without being able to clearly monitor the value of the fix. (We see the warning log in the last 7 days). All of these happened on the same connection, 8bb4614b-cc29-41b0-bc7c-86d4f52bab62. As there were 5 logs for Downloading data failed after 0 retries. Retrying the whole job..., there were only two for Downloading data failed even after 1 retries. Stopping retry and raising exception which seems to indicate that retrying helped in that specific case or the stream stopped before. As the logs are a bit screwed up, I can't determine which one is true here. In any case, I would assume that we can push back if it's only for one customer and retrying would work on the next attempt/job.

Given that we were to see that case in prod, the fix would be to have AsyncRetriever.read_records create a new factory using it's factory and having just one slice.

Can this PR be safely reverted and rolled back?

YES 💚
NO ❌

vercel · 2024-09-19T16:37:25Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment

Name	Status	Preview	Comments	Updated (UTC)
airbyte-docs	⬜️ Ignored (Inspect)	Visit Preview		Oct 2, 2024 2:16pm

airbyte-integrations/connectors/source-salesforce/pyproject.toml

… able to create dev release

Signed-off-by: Artem Inzhyyants <[email protected]> Co-authored-by: maxi297 <[email protected]>

maxi297 · 2024-09-22T19:39:48Z

We are getting a seg fault on some of the workspaces we've progressively rolled out to. We have two pre-build that might help us debug:

2.6.0-dev.884dfdee72: Better handling on breaking errors during polling
2.6.0-dev.41de741141: Better logging on seg fault crash

…ob-salesforce/salesforce-release`) Here are a few optimizations to enhance the performance of the program. 1. Remove multiple `import logging` statements. 2. Use lazy logging directly instead of creating a separate `lazy_log` function. 3. Simplify the job replacement logic to avoid unnecessary operations. Here is the optimized version. ### Improvements 1. **Logging**. - Removed the separate `lazy_log` function and inlined the `isEnabledFor` check. - Doing this reduces function calls and enables directly logging only when necessary. 2. **Synchronization**. - Kept the lock usage the same to ensure thread safety, which is necessary for modifying the `_jobs` set. These optimizations reduce the overhead and simplify the logic while ensuring the thread safety and functionality remain intact.

codeflash-ai · 2024-09-23T20:54:09Z

⚡️ Codeflash found optimizations for this PR

📄 `JobTracker.add_job()` in `airbyte-cdk/python/airbyte_cdk/sources/declarative/async_job/job_tracker.py`

📈 Performance improved by 24% (0.24x faster)

⏱️ Runtime went down from 38.5 microseconds to 31.0 microseconds

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up method JobTracker.add_job by 24% in PR #45678 (async-job-salesforce/salesforce-release) #45863

If you approve, it will be merged into this PR (branch async-job-salesforce/salesforce-release).

…orce/salesforce-release

airbyte-integrations/connectors/source-salesforce/source_salesforce/streams.py

artem1205 · 2024-09-25T17:13:30Z

airbyte-integrations/connectors/source-salesforce/unit_tests/test_memory.py

-        "200k records",
-    ],
-)
-def test_memory_download_data(stream_config, stream_api, n_records, first_size, first_peak):


Since memory test is removed, should we add another integration test with @pytest.mark.limit_memory(" MB"), example:
https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/unit_tests/sources/declarative/decoders/test_json_decoder.py#L55-L56

There was indeed a memory issue. I created this PR to address the issue directly in the CDK. Once you approve it, I'll resolve this issue

artem1205

LGTM!

maxi297 · 2024-10-02T15:35:41Z

/approve-regression-tests The check in the regression testing always seems to fail. Here is an example for this PR which only update dependencies

Check job output.

✅ Approving regression tests

maxi297 added 8 commits September 19, 2024 10:03

extract CDK files from async-job/salesforce-feature-branch

ef72dd6

format

8f3a050

lint

634ae48

missing file

a499a93

lint

c1096e1

Fix overried on model_to_component_factory.py

330fbcd

format

e5cae76

Salesforce release without CDK release

ae89efd

octavia-squidington-iii added area/connectors Connector related issues connectors/source/salesforce labels Sep 19, 2024

maxi297 commented Sep 19, 2024

View reviewed changes

airbyte-integrations/connectors/source-salesforce/pyproject.toml Show resolved Hide resolved

maxi297 added 3 commits September 19, 2024 14:24

dummy commit to update lock even though it is not relevant just to be…

0a58b09

… able to create dev release

format

26a9aa4

Update release information

7eb4f6a

octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Sep 20, 2024

vercel bot deployed to Preview September 20, 2024 13:14 View deployment

maxi297 changed the title ~~Salesforce release without CDK release~~ ✨ Source Salesforce: Bulk stream uses async CDK components Sep 20, 2024

artem1205 and others added 3 commits September 20, 2024 12:29

Source SalesForce: switch to REST if BulkNotSupportedException (#45595)

5eff3fb

Signed-off-by: Artem Inzhyyants <[email protected]> Co-authored-by: maxi297 <[email protected]>

Add comment and format

49668f9

Fixing tests

472ea67

octavia-squidington-iii added the CDK Connector Development Kit label Sep 22, 2024

maxi297 force-pushed the async-job-salesforce/salesforce-release branch from 884dfde to 41de741 Compare September 22, 2024 18:22

octavia-squidington-iii removed the CDK Connector Development Kit label Sep 22, 2024

TMP segmentation bug fix

8d02497

maxi297 force-pushed the async-job-salesforce/salesforce-release branch from 41de741 to 8d02497 Compare September 23, 2024 00:45

octavia-squidington-iii added the CDK Connector Development Kit label Sep 23, 2024

Fix child taking all the API budget bug

c9b6d46

codeflash-ai bot mentioned this pull request Sep 23, 2024

⚡️ Speed up method JobTracker.add_job by 24% in PR #45678 (async-job-salesforce/salesforce-release) #45863

Closed

maxi297 added 3 commits September 25, 2024 10:13

Fix issues following progressive rollout

188a0a7

Merge branch 'async-job-salesforce/cdk-release' into async-job-salesf…

c1f9a3a

…orce/salesforce-release

remove unused code

9302c72

octavia-squidington-iii removed the CDK Connector Development Kit label Sep 25, 2024

artem1205 reviewed Sep 25, 2024

View reviewed changes

Skip when rest stream is not available

8386e77

Base automatically changed from async-job-salesforce/cdk-release to master October 1, 2024 12:48

maxi297 added 2 commits October 1, 2024 09:55

Prepare for release

a6ba321

Update release information

255fb22

octavia-squidington-iii added the CDK Connector Development Kit label Oct 1, 2024

vercel bot deployed to Preview October 1, 2024 14:02 View deployment

Merge branch 'master' into async-job-salesforce/salesforce-release

38c39c0

octavia-squidington-iii removed the CDK Connector Development Kit label Oct 1, 2024

vercel bot deployed to Preview October 1, 2024 14:10 View deployment

Code review

358c045

maxi297 mentioned this pull request Oct 1, 2024

feat(airbyte-cdk) Async jobs - Limit memory usage #46286

Merged

2 tasks

maxi297 added 2 commits October 1, 2024 11:05

Update release information

84e1e20

Fix airbyte_protocol and formatting issues

e62cfd9

artem1205 approved these changes Oct 2, 2024

View reviewed changes

maxi297 added 3 commits October 2, 2024 09:03

fix integration tests and upgrade cdk version for memory fix

e1d5094

format

50c5317

Fix test

9914ba5

maxi297 merged commit 07365b0 into master Oct 3, 2024
34 checks passed

maxi297 deleted the async-job-salesforce/salesforce-release branch October 3, 2024 12:42

maxi297 mentioned this pull request Nov 4, 2024

Async job/salesforce feature branch #45373

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Source Salesforce: Bulk stream uses async CDK components #45678

✨ Source Salesforce: Bulk stream uses async CDK components #45678

maxi297 commented Sep 19, 2024 •

edited

Loading

vercel bot commented Sep 19, 2024 •

edited

Loading

maxi297 commented Sep 22, 2024

codeflash-ai bot commented Sep 23, 2024

⚡️ Speed up method `JobTracker.add_job` by 24% in PR #45678 (`async-job-salesforce/salesforce-release`) #45863

artem1205 Sep 25, 2024

maxi297 Oct 1, 2024

artem1205 left a comment

maxi297 commented Oct 2, 2024 •

edited by github-actions bot

Loading

✨ Source Salesforce: Bulk stream uses async CDK components #45678

✨ Source Salesforce: Bulk stream uses async CDK components #45678

Conversation

maxi297 commented Sep 19, 2024 • edited Loading

What

How

Review guide

User Impact

Can this PR be safely reverted and rolled back?

vercel bot commented Sep 19, 2024 • edited Loading

maxi297 commented Sep 22, 2024

codeflash-ai bot commented Sep 23, 2024

⚡️ Codeflash found optimizations for this PR

📄 JobTracker.add_job() in airbyte-cdk/python/airbyte_cdk/sources/declarative/async_job/job_tracker.py

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up method JobTracker.add_job by 24% in PR #45678 (async-job-salesforce/salesforce-release) #45863

artem1205 Sep 25, 2024

Choose a reason for hiding this comment

maxi297 Oct 1, 2024

Choose a reason for hiding this comment

artem1205 left a comment

Choose a reason for hiding this comment

maxi297 commented Oct 2, 2024 • edited by github-actions bot Loading

maxi297 commented Sep 19, 2024 •

edited

Loading

vercel bot commented Sep 19, 2024 •

edited

Loading

📄 `JobTracker.add_job()` in `airbyte-cdk/python/airbyte_cdk/sources/declarative/async_job/job_tracker.py`

⚡️ Speed up method `JobTracker.add_job` by 24% in PR #45678 (`async-job-salesforce/salesforce-release`) #45863

maxi297 commented Oct 2, 2024 •

edited by github-actions bot

Loading