diff --git a/README.md b/README.md index 9c11375..cb79602 100644 --- a/README.md +++ b/README.md @@ -4,93 +4,136 @@ A streamlined solution for accessing Kaggle computational resources via SSH and ## Overview -kagglelink allows you to ssh into Kaggle and leverage those kaggle resources, or you can run kaggles notebook remotely using VSCode, with more coding support, and better development environment +KaggleLink allows you to connect to Kaggle environments via SSH, enabling you to leverage Kaggle's computational resources -![Image](https://github.com/user-attachments/assets/db4454ff-5545-4094-adeb-47b74ab0c33a) +![](https://github.com/user-attachments/assets/db4454ff-5545-4094-adeb-47b74ab0c33a) -## Requirements +## Getting Started -1. A Zrok token is required for establishing the tunnel. Create an account at [myZrok.io](https://myzrok.io/) to get your token. +### Requirements -2. Ensure your account is on the Starter plan to utilize NetFoundry's public Zrok instance. +To use KaggleLink, you need: -3. You need to upload your public key to a github repository or a public file hosting service +1. **Zrok Token**: A Zrok token is essential for establishing the secure tunnel. Create an account at [myZrok.io](https://myzrok.io/) to obtain your token. Ensure your account is on the **Starter plan** to utilize NetFoundry's public Zrok instance, which offers 2 environment connections (one for your local machine, one for the Kaggle instance). +2. **Public SSH Key**: Your public SSH key needs to be accessible via a URL, either from a GitHub repository or another public file hosting service. -## Quick Setup +### Quick Setup (on Kaggle) -One line command setup? - -Paste this into Kaggle cell +Execute the following one-line command in a Kaggle notebook cell. This script will set up Zrok and SSH on your Kaggle instance. ```bash !curl -sS https://bhdai.github.io/setup | bash -s -- -k -t ``` > [!NOTE] -> -> replace with the URL of your public key file and with your Zrok token. +> Replace `` with the URL of your public SSH key file and `` with your Zrok token. +Wait for the setup to complete. You should see output similar to this upon successful configuration: -Wait for the setup to finish, you should see something like this at the end +![](https://github.com/user-attachments/assets/22f564f3-8622-4c6c-bb82-9c9c63dd322a) -![Image](https://github.com/user-attachments/assets/22f564f3-8622-4c6c-bb82-9c9c63dd322a) +#### How to set up your public SSH key? -### How to setup public key? +1. **Generate an SSH key pair** on your local machine (if you haven't already). Use a descriptive filename, for example: -Generate a new SSH key pair on your local machine (if you haven't already): + ```bash + ssh-keygen -t rsa -b 4096 -C "kaggle_remote_ssh" -f ~/.ssh/kaggle_rsa + ``` -```bash -ssh-keygen -t rsa -b 4096 -C "kaggle_remote_ssh" -f ~/.ssh/kaggle_rsa -``` +2. **Upload your public key** (`~/.ssh/kaggle_rsa.pub`) to a public GitHub repository or a similar public file hosting service. +3. **Obtain the Raw URL**: Navigate to your uploaded public key file in your repository and click the "Raw" button. + + ![](https://private-user-images.githubusercontent.com/140616004/444039100-ec9a884c-1c97-4be6-bd6d-03ac5dd16de7.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NjU0NjQyMzMsIm5iZiI6MTc2NTQ2MzkzMywicGF0aCI6Ii8xNDA2MTYwMDQvNDQ0MDM5MTAwLWVjOWE4ODRjLTFjOTctNGJlNi1iZDZkLTAzYWM1ZGQxNmRlNy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUxMjExJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MTIxMVQxNDM4NTNaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT04YjZiY2M1OWRiMDUzYWZiMDUwODUzMjg2NDA4ZTU5NDAxZTM3YWU3ZGJmMDRlMjFiZjA0YmFmOGJlNTJmNzg1JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.wDGsBk1CyVVAWFLSGh8wRldUbz2hiAOzw6t3Zf39K5A) + + Copy the URL from your browser's address bar. It typically looks like `https://raw.githubusercontent.com///refs/heads/main/`. + +#### How to get your Zrok token? + +1. If you don't have one, create your Zrok account at [myZrok.io](https://myzrok.io/). +2. Go to the [billing page](https://myzrok.io/billing) and ensure your plan is set to **Starter**. +3. Create a new token. +4. Visit [https://api-v1.zrok.io](https://api-v1.zrok.io/) to retrieve and manage your Zrok tokens. + +### Advanced: Environment Variables + +For automated pipelines or power users, you can configure KaggleLink using environment variables instead of CLI flags. + +| Variable | CLI Equivalent | Description | +|----------|----------------|-------------| +| `KAGGLELINK_KEYS_URL` | `-k` | URL to your public SSH key | +| `KAGGLELINK_TOKEN` | `-t` | Your Zrok token | + +> [!NOTE] +> CLI arguments (`-k`, `-t`) always override environment variables if both are present. + +#### Setting Environment Variables in Kaggle -Create a github repository and push the `~/.ssh/kaggle_rsa.pub` file to it. Make sure the repository is public. Once finished, you can get the public key URL by navigating to the file in your repository and clicking on the "Raw" button. +The most secure way to pass these credentials is using **Kaggle Secrets**. -![Image](https://github.com/user-attachments/assets/ec9a884c-1c97-4be6-bd6d-03ac5dd16de7) +1. Add your secrets in the Kaggle notebook sidebar (**Add-ons** -> **Secrets**). +2. Use the following Python snippet in a cell *before* running the setup script: -Copy the URL from your browser's address bar. It usually takes the form like this `https://raw.githubusercontent.com///refs/heads/main/` +```python +from kaggle_secrets import UserSecretsClient +import os -### How to get zrok token? +user_secrets = UserSecretsClient() -Create your zrok account, if you haven't already, go [here](https://myzrok.io/billing) and change your plan to Starter plan, and then create a new token. Finally visit [https://api-v1.zrok.io](https://api-v1.zrok.io/), you should setup and get your token there +# Set environment variables from secrets +# Ensure you have added 'KAGGLELINK_TOKEN' and 'KAGGLELINK_KEYS_URL' (optional) to your secrets +os.environ['KAGGLELINK_TOKEN'] = user_secrets.get_secret("KAGGLELINK_TOKEN") -## Client Setup +# You can also set the URL directly if it's public and not stored as a secret +os.environ['KAGGLELINK_KEYS_URL'] = "https://raw.githubusercontent.com/your/repo/main/key.pub" +``` + +Once the environment variables are set, you can run the setup script without arguments: + +```bash +!curl -sS https://bhdai.github.io/setup | bash +``` + +## Usage + +After completing the Kaggle setup, your Kaggle instance is ready for connection. The script will output a Zrok private token at the end which you'll use to connect from your local machine. -After completing the Kaggle setup, you'll receive a token. Follow these steps on your local machine: +### Client Setup (on your Local Machine) -1. Install Zrok locally by following the [official installation guide](https://docs.zrok.io/docs/guides/install/). +1. **Install Zrok locally**: Follow the [official Zrok installation guide](https://docs.zrok.io/docs/guides/install/). + For Arch-based distributions, you can use: - For Arch-based distributions, you can use: - ```bash - yay -S zrok-bin - ``` + ```bash + yay -S zrok-bin + ``` -2. Enable zrok in your local machine - ```bash - zrok enable - ``` +2. **Enable Zrok**: Enable Zrok on your local machine using your personal Zrok token: -2. Access your Kaggle instance using the token: - ```bash - zrok access private - ``` + ```bash + zrok enable + ``` -3. This will open a dashboard displaying your connection details, including a local address like `127.0.0.1:9191`. +3. **Access the private tunnel**: Use the Zrok `private_token` obtained from the Kaggle setup output to establish the connection: -## SSH Connection + ```bash + zrok access private + ``` -*For VSCode check out the [old instrunction](https://github.com/bhdai/kagglelink/blob/ngrok/README.md#connect-via-ssh) (will update this eventually)* + This command will open a dashboard in your terminal, displaying your connection details, including a local address like `127.0.0.1:9191`. -Connect to your Kaggle instance via SSH: +### SSH Connection + +Connect to your Kaggle instance via SSH using the local address and port provided by Zrok (e.g., `127.0.0.1:9191`). ```bash ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i ~/.ssh/kaggle_rsa -p 9191 root@127.0.0.1 ``` -Note: The port (e.g., 9191) generally remains consistent across sessions, so no need to adjust it for each new instance. +> [!NOTE] +> The port (e.g., 9191) generally remains consistent across sessions, so you typically won't need to adjust it for each new instance. -### SSH Configuration +#### SSH Configuration -To simplify future connections, add this configuration to your `~/.ssh/config` file: +To simplify future connections, add the following configuration to your `~/.ssh/config` file: ``` Host Kaggle @@ -104,21 +147,36 @@ Host Kaggle With this configuration, you can simply use `ssh Kaggle` to connect. -## File Transfer with Rsync +### File Transfer with Rsync -Transfer files between your local machine and Kaggle instance: +Transfer files between your local machine and Kaggle instance using `rsync`: ```bash # From local to remote -rsync -e "ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i ~/.ssh/kaggle_rsa -p 9191" root@127.0.0.1:/kaggle/working +rsync -e "ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i ~/.ssh/kaggle_rsa -p 9191" root@127.0.0.1: +# or if you have you SSH config set up (see above) +rsync -avz Kaggle: # From remote to local rsync -e "ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i ~/.ssh/kaggle_rsa -p 9191" root@127.0.0.1: +# or if you have you SSH config set up (see above) +rsync -avz Kaggle: ``` -> [!NOTE] -> -> If you're using the Starter plan, they only offer 2 environment connection on this plan one for you local machine, one for kaggle instance. While the script will automatically release the Kaggle instance when you turn off Kaggle, but it's best to check [https://api-v1.zrok.io/](https://api-v1.zrok.io/) to make sure your local machine is connected and there are no other active connections before running the script again. +> [!IMPORTANT] +> The Zrok Starter plan limits you to two environment connections. While the script automatically releases the Kaggle instance's connection upon shutdown, it's good practice to verify your active connections at [https://api-v1.zrok.io/](https://api-v1.zrok.io/) before rerunning the script, ensuring your local machine is the primary active connection. + +## Contributing + +We welcome contributions to KaggleLink! If you're interested in improving this project, please follow these steps: + +1. **Fork the repository**. +2. **Create a new branch** for your feature or bug fix (`git checkout -b feature/your-feature-name` or `bugfix/issue-description`). +3. **Make your changes**, adhering to the existing coding style and standards. +4. **Write and run tests** to ensure your changes work as expected and don't introduce regressions. +5. **Commit your changes** with clear and concise commit messages. +6. **Push your branch** to your forked repository. +7. **Open a Pull Request** to the main branch, providing a detailed description of your changes. ## License diff --git a/setup.sh b/setup.sh index 0edf999..5c07748 100755 --- a/setup.sh +++ b/setup.sh @@ -45,50 +45,74 @@ usage() { echo " -t, --token TOKEN Your zrok token" echo " -h, --help Display this help message" echo "" - echo "Environment Variables:" + echo "Environment Variables (fallback when CLI flags not provided):" + echo " KAGGLELINK_KEYS_URL URL to your authorized_keys file" + echo " KAGGLELINK_TOKEN Your zrok token" echo " BRANCH Override default branch (current: ${KAGGLELINK_BRANCH})" exit "$exit_code" } # Parse command line arguments +# Initialize source tracking variables +AUTH_KEYS_SOURCE="" +ZROK_TOKEN_SOURCE="" + while [[ $# -gt 0 ]]; do case $1 in - -k | --keys-url) - AUTH_KEYS_URL="$2" - shift 2 - ;; - -t | --token) - ZROK_TOKEN="$2" - shift 2 - ;; - -h | --help) - usage 0 - ;; - *) - echo "Unknown option: $1" - usage - ;; + -k | --keys-url) + AUTH_KEYS_URL="$2" + AUTH_KEYS_SOURCE="CLI argument" + shift 2 + ;; + -t | --token) + ZROK_TOKEN="$2" + ZROK_TOKEN_SOURCE="CLI argument" + shift 2 + ;; + -h | --help) + usage 0 + ;; + *) + echo "Unknown option: $1" + usage + ;; esac done # Apply environment variable fallback if CLI args not provided if [ -z "$AUTH_KEYS_URL" ] && [ -n "$KAGGLELINK_KEYS_URL" ]; then AUTH_KEYS_URL="$KAGGLELINK_KEYS_URL" + AUTH_KEYS_SOURCE="KAGGLELINK_KEYS_URL env var" fi if [ -z "$ZROK_TOKEN" ] && [ -n "$KAGGLELINK_TOKEN" ]; then ZROK_TOKEN="$KAGGLELINK_TOKEN" + ZROK_TOKEN_SOURCE="KAGGLELINK_TOKEN env var" +fi + +# Log configuration source for transparency +if [ -n "$AUTH_KEYS_URL" ]; then + echo "ℹ️ Using keys URL from: $AUTH_KEYS_SOURCE" +fi +if [ -n "$ZROK_TOKEN" ]; then + echo "ℹ️ Using token from: $ZROK_TOKEN_SOURCE" fi # Check for required parameters if [ -z "$AUTH_KEYS_URL" ]; then - echo "Error: Public key URL (-k or --keys-url) is required" - usage + echo "Error: Public key URL is required" + echo " Provide via: -k or --keys-url " + echo " Or set: KAGGLELINK_KEYS_URL environment variable" + echo " Run with --help for more information" + exit 1 fi if [ -z "$ZROK_TOKEN" ]; then - echo "Error: zrok token (-t or --token) is required" - usage + echo "Error: zrok token is required" + echo " Provide via: -t or --token " + echo " Or set: KAGGLELINK_TOKEN environment variable" + echo " Run with --help for more information" + exit 1 fi # Validate that AUTH_KEYS_URL uses HTTPS (security requirement) diff --git a/tests/unit/test_env_fallback.bats b/tests/unit/test_env_fallback.bats index 04bd8f5..d496224 100755 --- a/tests/unit/test_env_fallback.bats +++ b/tests/unit/test_env_fallback.bats @@ -19,11 +19,12 @@ if [[ "$*" == *"clone"* ]]; then # Extract the target directory (last argument) target="${@: -1}" mkdir -p "$target" - mkdir -p "$target" echo '#!/bin/bash' > "$target/setup_kaggle_zrok.sh" echo '#!/bin/bash' > "$target/start_zrok.sh" chmod +x "$target/setup_kaggle_zrok.sh" "$target/start_zrok.sh" + exit 0 fi +# For any other git command, just succeed exit 0 EOF chmod +x "$TEST_TEMP_DIR/git" @@ -74,7 +75,9 @@ teardown() { run bash "${PROJECT_ROOT}/setup.sh" -k "https://cli.com/keys" -t "cli-token" [ "$status" -eq 0 ] - # The CLI values should be used (verified by checking they were passed to scripts) + # Verify CLI values are actually used by checking source logging + [[ "$output" == *"Using keys URL from: CLI argument"* ]] + [[ "$output" == *"Using token from: CLI argument"* ]] } @test "P0: should fail when both CLI and env are missing (keys URL)" { @@ -102,3 +105,72 @@ teardown() { # Check for actionable error message [[ "$output" == *"Error"* ]] || [[ "$output" == *"required"* ]] } + +# ============================================================================= +# Configuration Source Logging Tests (AC4) +# ============================================================================= + +@test "P0: should log CLI source when -k provided" { + run bash "${PROJECT_ROOT}/setup.sh" -k "https://example.com/keys" -t "test-token" + [ "$status" -eq 0 ] + [[ "$output" == *"Using keys URL from: CLI argument"* ]] +} + +@test "P0: should log CLI source when -t provided" { + run bash "${PROJECT_ROOT}/setup.sh" -k "https://example.com/keys" -t "test-token" + [ "$status" -eq 0 ] + [[ "$output" == *"Using token from: CLI argument"* ]] +} + +@test "P0: should log env var source when KAGGLELINK_KEYS_URL used" { + export KAGGLELINK_KEYS_URL="https://example.com/keys" + export KAGGLELINK_TOKEN="test-token" + + run bash "${PROJECT_ROOT}/setup.sh" + [ "$status" -eq 0 ] + [[ "$output" == *"Using keys URL from: KAGGLELINK_KEYS_URL env var"* ]] +} + +@test "P0: should log env var source when KAGGLELINK_TOKEN used" { + export KAGGLELINK_KEYS_URL="https://example.com/keys" + export KAGGLELINK_TOKEN="test-token" + + run bash "${PROJECT_ROOT}/setup.sh" + [ "$status" -eq 0 ] + [[ "$output" == *"Using token from: KAGGLELINK_TOKEN env var"* ]] +} + +@test "P0: should log CLI source when both CLI and env var provided" { + export KAGGLELINK_KEYS_URL="https://env.com/keys" + export KAGGLELINK_TOKEN="env-token" + + run bash "${PROJECT_ROOT}/setup.sh" -k "https://cli.com/keys" -t "cli-token" + [ "$status" -eq 0 ] + [[ "$output" == *"Using keys URL from: CLI argument"* ]] + [[ "$output" == *"Using token from: CLI argument"* ]] +} + +# ============================================================================= +# Improved Error Messages Tests (AC5) +# ============================================================================= + +@test "P0: error message mentions both -k flag AND KAGGLELINK_KEYS_URL env var" { + run env -u KAGGLELINK_KEYS_URL KAGGLELINK_TOKEN="test-token" PATH="$TEST_TEMP_DIR:$PATH" bash "${PROJECT_ROOT}/setup.sh" + [ "$status" -ne 0 ] + [[ "$output" == *"-k"* ]] || [[ "$output" == *"--keys-url"* ]] + [[ "$output" == *"KAGGLELINK_KEYS_URL"* ]] +} + +@test "P0: error message mentions both -t flag AND KAGGLELINK_TOKEN env var" { + run env -u KAGGLELINK_TOKEN KAGGLELINK_KEYS_URL="https://example.com/keys" PATH="$TEST_TEMP_DIR:$PATH" bash "${PROJECT_ROOT}/setup.sh" + [ "$status" -ne 0 ] + [[ "$output" == *"-t"* ]] || [[ "$output" == *"--token"* ]] + [[ "$output" == *"KAGGLELINK_TOKEN"* ]] +} + +@test "P0: usage output includes environment variable documentation" { + run bash "${PROJECT_ROOT}/setup.sh" -h + [ "$status" -eq 0 ] + [[ "$output" == *"KAGGLELINK_KEYS_URL"* ]] + [[ "$output" == *"KAGGLELINK_TOKEN"* ]] +} diff --git a/tests/unit/test_url_validation.bats b/tests/unit/test_url_validation.bats index afed019..80be210 100755 --- a/tests/unit/test_url_validation.bats +++ b/tests/unit/test_url_validation.bats @@ -21,7 +21,9 @@ if [[ "$*" == *"clone"* ]]; then echo '#!/bin/bash' > "$target/setup_kaggle_zrok.sh" echo '#!/bin/bash' > "$target/start_zrok.sh" chmod +x "$target/setup_kaggle_zrok.sh" "$target/start_zrok.sh" + exit 0 fi +# For any other git command, just succeed exit 0 EOF chmod +x "$TEST_TEMP_DIR/git"