Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚡️ Speed up method StreamAvailabilityStrategy.check_availability by 10% in PR #45673 (async-job-salesforce/cdk-release) #45683

Closed

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Sep 19, 2024

⚡️ This pull request contains optimizations for PR #45673

If you approve this dependent PR, these changes will be merged into the original PR branch async-job-salesforce/cdk-release.

This PR will be automatically closed if the original PR is merged.


📄 StreamAvailabilityStrategy.check_availability() in airbyte-cdk/python/airbyte_cdk/sources/streams/concurrent/adapters.py

📈 Performance improved by 10% (0.10x faster)

⏱️ Runtime went down from 29.7 microseconds to 27.0 microseconds

Explanation and details

Certainly! To make this Python program faster, I'll focus on some areas for optimization. Since Python is an interpreted language, some practices favor readability over performance, but there are still ways to optimize such as.

  1. Remove Redundant Checks: In the StreamAvailabilityStrategy class, there’s a check for the availability of the check_availability method, which could be moved out of the try-except block to lower the overhead.
  2. Avoid Repeated Attribute Lookups: Cache attribute lookups to improve runtime.

Here's the optimized code.

Changes Made.

  1. Moved the check for check_availability out of the try block: It reduces the overhead of entering and exiting the try block unnecessarily.
  2. Used getattr with default value: This way, the check if stream_check_avail is None makes the code more readable and Pythonic.

These small but potent changes can improve the runtime efficiency and readability of your code. They help in minimizing redundant operations and improve overall performance.

Correctness verification

The new optimized code was tested for correctness. The results are listed below.

✅ 1 Passed − ⚙️ Existing Unit Tests

(click to show existing tests)
- sources/streams/concurrent/test_adapters.py

✅ 0 Passed − 🌀 Generated Regression Tests

(click to show generated tests)
# imports
import logging
from abc import ABC, abstractmethod
from typing import Optional

import pytest  # used for our unit tests
# function to test
from airbyte_cdk.sources import Source
from airbyte_cdk.sources.streams import Stream
from airbyte_cdk.sources.streams.concurrent.adapters import \
    StreamAvailabilityStrategy
from airbyte_cdk.sources.streams.concurrent.availability_strategy import \
    AbstractAvailabilityStrategy

# unit tests

# Mock classes for testing
class MockStreamAvailable(Stream):
    def check_availability(self, logger, source):
        return True, None
        # Outputs were verified to be equal to the original implementation

class MockStreamUnavailable(Stream):
    def check_availability(self, logger, source):
        return False, "Service is down"
        # Outputs were verified to be equal to the original implementation

class MockStreamWithoutCheck(Stream):
    pass

class MockStreamWithException(Stream):
    def check_availability(self, logger, source):
        raise Exception("Generic error")
        # Outputs were verified to be equal to the original implementation

class MockStreamWithUserFriendlyException(Stream):
    def check_availability(self, logger, source):
        raise Exception("Specific error")
        # Outputs were verified to be equal to the original implementation
    
    def get_error_display_message(self, exception):
        return "User-friendly error message"
        # Outputs were verified to be equal to the original implementation

class MockStreamWithEmptyMessage(Stream):
    def check_availability(self, logger, source):
        return False, ""
        # Outputs were verified to be equal to the original implementation

class MockStreamWithNoneMessage(Stream):
    def check_availability(self, logger, source):
        return False, None
        # Outputs were verified to be equal to the original implementation

class MockStreamWithNonBooleanAvailability(Stream):
    def check_availability(self, logger, source):
        return "yes", "Non-boolean availability"
        # Outputs were verified to be equal to the original implementation

class MockStreamWithLatency(Stream):
    def check_availability(self, logger, source):
        import time
        time.sleep(5)  # Simulate latency
        return True, None
        # Outputs were verified to be equal to the original implementation

class MockStreamWithLogger(Stream):
    def check_availability(self, logger, source):
        logger.info("Checking availability")
        return True, None
        # Outputs were verified to be equal to the original implementation

class ComplexStream(Stream):
    def __init__(self, config):
        self.config = config
        # Outputs were verified to be equal to the original implementation

    def check_availability(self, logger, source):
        # Complex logic here
        return True, None
        # Outputs were verified to be equal to the original implementation

# Test cases









🔘 (none found) − ⏪ Replay Tests

maxi297 and others added 8 commits September 19, 2024 10:03
… 10% in PR #45673 (`async-job-salesforce/cdk-release`)

Certainly! To make this Python program faster, I'll focus on some areas for optimization. Since Python is an interpreted language, some practices favor readability over performance, but there are still ways to optimize such as.

1. **Remove Redundant Checks**: In the `StreamAvailabilityStrategy` class, there’s a check for the availability of the `check_availability` method, which could be moved out of the try-except block to lower the overhead.
2. **Avoid Repeated Attribute Lookups**: Cache attribute lookups to improve runtime.

Here's the optimized code.


### Changes Made.
1. **Moved the check for `check_availability` out of the `try` block**: It reduces the overhead of entering and exiting the `try` block unnecessarily.
2. **Used `getattr` with default value**: This way, the check if `stream_check_avail` is `None` makes the code more readable and Pythonic.

These small but potent changes can improve the runtime efficiency and readability of your code. They help in minimizing redundant operations and improve overall performance.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Sep 19, 2024
Copy link

vercel bot commented Sep 19, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Sep 19, 2024 5:28pm

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@octavia-squidington-iii octavia-squidington-iii added CDK Connector Development Kit community labels Sep 19, 2024
Base automatically changed from async-job-salesforce/cdk-release to master October 1, 2024 12:48
@codeflash-ai codeflash-ai bot closed this Oct 1, 2024
Copy link
Author

codeflash-ai bot commented Oct 1, 2024

This PR has been automatically closed because the original PR #45673 by maxi297 was closed.

@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-pr45673-2024-09-19T17.28.01 branch October 1, 2024 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit ⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants