Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
162 changes: 110 additions & 52 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,93 +4,136 @@ A streamlined solution for accessing Kaggle computational resources via SSH and

## Overview

kagglelink allows you to ssh into Kaggle and leverage those kaggle resources, or you can run kaggles notebook remotely using VSCode, with more coding support, and better development environment
KaggleLink allows you to connect to Kaggle environments via SSH, enabling you to leverage Kaggle's computational resources

![Image](https://github.com/user-attachments/assets/db4454ff-5545-4094-adeb-47b74ab0c33a)
![](https://github.com/user-attachments/assets/db4454ff-5545-4094-adeb-47b74ab0c33a)

## Requirements
## Getting Started

1. A Zrok token is required for establishing the tunnel. Create an account at [myZrok.io](https://myzrok.io/) to get your token.
### Requirements

2. Ensure your account is on the Starter plan to utilize NetFoundry's public Zrok instance.
To use KaggleLink, you need:

3. You need to upload your public key to a github repository or a public file hosting service
1. **Zrok Token**: A Zrok token is essential for establishing the secure tunnel. Create an account at [myZrok.io](https://myzrok.io/) to obtain your token. Ensure your account is on the **Starter plan** to utilize NetFoundry's public Zrok instance, which offers 2 environment connections (one for your local machine, one for the Kaggle instance).
2. **Public SSH Key**: Your public SSH key needs to be accessible via a URL, either from a GitHub repository or another public file hosting service.

## Quick Setup
### Quick Setup (on Kaggle)

One line command setup?

Paste this into Kaggle cell
Execute the following one-line command in a Kaggle notebook cell. This script will set up Zrok and SSH on your Kaggle instance.

```bash
!curl -sS https://bhdai.github.io/setup | bash -s -- -k <public_key_url> -t <zrok_token>
```

> [!NOTE]
>
> replace <public_key_url> with the URL of your public key file and <zrok_token> with your Zrok token.
> Replace `<public_key_url>` with the URL of your public SSH key file and `<zrok_token>` with your Zrok token.

Wait for the setup to complete. You should see output similar to this upon successful configuration:

Wait for the setup to finish, you should see something like this at the end
![](https://github.com/user-attachments/assets/22f564f3-8622-4c6c-bb82-9c9c63dd322a)

![Image](https://github.com/user-attachments/assets/22f564f3-8622-4c6c-bb82-9c9c63dd322a)
#### How to set up your public SSH key?

### How to setup public key?
1. **Generate an SSH key pair** on your local machine (if you haven't already). Use a descriptive filename, for example:

Generate a new SSH key pair on your local machine (if you haven't already):
```bash
ssh-keygen -t rsa -b 4096 -C "kaggle_remote_ssh" -f ~/.ssh/kaggle_rsa
```

```bash
ssh-keygen -t rsa -b 4096 -C "kaggle_remote_ssh" -f ~/.ssh/kaggle_rsa
```
2. **Upload your public key** (`~/.ssh/kaggle_rsa.pub`) to a public GitHub repository or a similar public file hosting service.
3. **Obtain the Raw URL**: Navigate to your uploaded public key file in your repository and click the "Raw" button.

![](https://private-user-images.githubusercontent.com/140616004/444039100-ec9a884c-1c97-4be6-bd6d-03ac5dd16de7.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NjU0NjQyMzMsIm5iZiI6MTc2NTQ2MzkzMywicGF0aCI6Ii8xNDA2MTYwMDQvNDQ0MDM5MTAwLWVjOWE4ODRjLTFjOTctNGJlNi1iZDZkLTAzYWM1ZGQxNmRlNy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUxMjExJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MTIxMVQxNDM4NTNaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT04YjZiY2M1OWRiMDUzYWZiMDUwODUzMjg2NDA4ZTU5NDAxZTM3YWU3ZGJmMDRlMjFiZjA0YmFmOGJlNTJmNzg1JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.wDGsBk1CyVVAWFLSGh8wRldUbz2hiAOzw6t3Zf39K5A)

Copy the URL from your browser's address bar. It typically looks like `https://raw.githubusercontent.com/<username>/<repo_name>/refs/heads/main/<file_path>`.

#### How to get your Zrok token?

1. If you don't have one, create your Zrok account at [myZrok.io](https://myzrok.io/).
2. Go to the [billing page](https://myzrok.io/billing) and ensure your plan is set to **Starter**.
3. Create a new token.
4. Visit [https://api-v1.zrok.io](https://api-v1.zrok.io/) to retrieve and manage your Zrok tokens.

### Advanced: Environment Variables

For automated pipelines or power users, you can configure KaggleLink using environment variables instead of CLI flags.

| Variable | CLI Equivalent | Description |
|----------|----------------|-------------|
| `KAGGLELINK_KEYS_URL` | `-k` | URL to your public SSH key |
| `KAGGLELINK_TOKEN` | `-t` | Your Zrok token |

> [!NOTE]
> CLI arguments (`-k`, `-t`) always override environment variables if both are present.

#### Setting Environment Variables in Kaggle

Create a github repository and push the `~/.ssh/kaggle_rsa.pub` file to it. Make sure the repository is public. Once finished, you can get the public key URL by navigating to the file in your repository and clicking on the "Raw" button.
The most secure way to pass these credentials is using **Kaggle Secrets**.

![Image](https://github.com/user-attachments/assets/ec9a884c-1c97-4be6-bd6d-03ac5dd16de7)
1. Add your secrets in the Kaggle notebook sidebar (**Add-ons** -> **Secrets**).
2. Use the following Python snippet in a cell *before* running the setup script:

Copy the URL from your browser's address bar. It usually takes the form like this `https://raw.githubusercontent.com/<username>/<repo_name>/refs/heads/main/<file_path>`
```python
from kaggle_secrets import UserSecretsClient
import os

### How to get zrok token?
user_secrets = UserSecretsClient()

Create your zrok account, if you haven't already, go [here](https://myzrok.io/billing) and change your plan to Starter plan, and then create a new token. Finally visit [https://api-v1.zrok.io](https://api-v1.zrok.io/), you should setup and get your token there
# Set environment variables from secrets
# Ensure you have added 'KAGGLELINK_TOKEN' and 'KAGGLELINK_KEYS_URL' (optional) to your secrets
os.environ['KAGGLELINK_TOKEN'] = user_secrets.get_secret("KAGGLELINK_TOKEN")

## Client Setup
# You can also set the URL directly if it's public and not stored as a secret
os.environ['KAGGLELINK_KEYS_URL'] = "https://raw.githubusercontent.com/your/repo/main/key.pub"
```

Once the environment variables are set, you can run the setup script without arguments:

```bash
!curl -sS https://bhdai.github.io/setup | bash
```

## Usage

After completing the Kaggle setup, your Kaggle instance is ready for connection. The script will output a Zrok private token at the end which you'll use to connect from your local machine.

After completing the Kaggle setup, you'll receive a token. Follow these steps on your local machine:
### Client Setup (on your Local Machine)

1. Install Zrok locally by following the [official installation guide](https://docs.zrok.io/docs/guides/install/).
1. **Install Zrok locally**: Follow the [official Zrok installation guide](https://docs.zrok.io/docs/guides/install/).
For Arch-based distributions, you can use:

For Arch-based distributions, you can use:
```bash
yay -S zrok-bin
```
```bash
yay -S zrok-bin
```

2. Enable zrok in your local machine
```bash
zrok enable <zrok-token>
```
2. **Enable Zrok**: Enable Zrok on your local machine using your personal Zrok token:

2. Access your Kaggle instance using the token:
```bash
zrok access private <the_token_from_kaggle>
```
```bash
zrok enable <your_personal_zrok_token>
```

3. This will open a dashboard displaying your connection details, including a local address like `127.0.0.1:9191`.
3. **Access the private tunnel**: Use the Zrok `private_token` obtained from the Kaggle setup output to establish the connection:

## SSH Connection
```bash
zrok access private <the_private_token_from_kaggle_setup>
```

*For VSCode check out the [old instrunction](https://github.com/bhdai/kagglelink/blob/ngrok/README.md#connect-via-ssh) (will update this eventually)*
This command will open a dashboard in your terminal, displaying your connection details, including a local address like `127.0.0.1:9191`.

Connect to your Kaggle instance via SSH:
### SSH Connection

Connect to your Kaggle instance via SSH using the local address and port provided by Zrok (e.g., `127.0.0.1:9191`).

```bash
ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i ~/.ssh/kaggle_rsa -p 9191 root@127.0.0.1
```

Note: The port (e.g., 9191) generally remains consistent across sessions, so no need to adjust it for each new instance.
> [!NOTE]
> The port (e.g., 9191) generally remains consistent across sessions, so you typically won't need to adjust it for each new instance.

### SSH Configuration
#### SSH Configuration

To simplify future connections, add this configuration to your `~/.ssh/config` file:
To simplify future connections, add the following configuration to your `~/.ssh/config` file:

```
Host Kaggle
Expand All @@ -104,21 +147,36 @@ Host Kaggle

With this configuration, you can simply use `ssh Kaggle` to connect.

## File Transfer with Rsync
### File Transfer with Rsync

Transfer files between your local machine and Kaggle instance:
Transfer files between your local machine and Kaggle instance using `rsync`:

```bash
# From local to remote
rsync -e "ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i ~/.ssh/kaggle_rsa -p 9191" <path_to_local_file> root@127.0.0.1:/kaggle/working
rsync -e "ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i ~/.ssh/kaggle_rsa -p 9191" <path_to_local_file> root@127.0.0.1:<remote_destination_path>
# or if you have you SSH config set up (see above)
rsync -avz <path_to_local_file> Kaggle:<remote_destination_path>

# From remote to local
rsync -e "ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i ~/.ssh/kaggle_rsa -p 9191" root@127.0.0.1:<path_to_remote_file> <local_destination_path>
# or if you have you SSH config set up (see above)
rsync -avz Kaggle:<path_to_remote_file> <local_destination_path>
```

> [!NOTE]
>
> If you're using the Starter plan, they only offer 2 environment connection on this plan one for you local machine, one for kaggle instance. While the script will automatically release the Kaggle instance when you turn off Kaggle, but it's best to check [https://api-v1.zrok.io/](https://api-v1.zrok.io/) to make sure your local machine is connected and there are no other active connections before running the script again.
> [!IMPORTANT]
> The Zrok Starter plan limits you to two environment connections. While the script automatically releases the Kaggle instance's connection upon shutdown, it's good practice to verify your active connections at [https://api-v1.zrok.io/](https://api-v1.zrok.io/) before rerunning the script, ensuring your local machine is the primary active connection.

## Contributing

We welcome contributions to KaggleLink! If you're interested in improving this project, please follow these steps:

1. **Fork the repository**.
2. **Create a new branch** for your feature or bug fix (`git checkout -b feature/your-feature-name` or `bugfix/issue-description`).
3. **Make your changes**, adhering to the existing coding style and standards.
4. **Write and run tests** to ensure your changes work as expected and don't introduce regressions.
5. **Commit your changes** with clear and concise commit messages.
6. **Push your branch** to your forked repository.
7. **Open a Pull Request** to the main branch, providing a detailed description of your changes.

## License

Expand Down
64 changes: 44 additions & 20 deletions setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -45,50 +45,74 @@ usage() {
echo " -t, --token TOKEN Your zrok token"
echo " -h, --help Display this help message"
echo ""
echo "Environment Variables:"
echo "Environment Variables (fallback when CLI flags not provided):"
echo " KAGGLELINK_KEYS_URL URL to your authorized_keys file"
echo " KAGGLELINK_TOKEN Your zrok token"
echo " BRANCH Override default branch (current: ${KAGGLELINK_BRANCH})"
exit "$exit_code"
}

# Parse command line arguments
# Initialize source tracking variables
AUTH_KEYS_SOURCE=""
ZROK_TOKEN_SOURCE=""

while [[ $# -gt 0 ]]; do
case $1 in
-k | --keys-url)
AUTH_KEYS_URL="$2"
shift 2
;;
-t | --token)
ZROK_TOKEN="$2"
shift 2
;;
-h | --help)
usage 0
;;
*)
echo "Unknown option: $1"
usage
;;
-k | --keys-url)
AUTH_KEYS_URL="$2"
AUTH_KEYS_SOURCE="CLI argument"
shift 2
;;
-t | --token)
ZROK_TOKEN="$2"
ZROK_TOKEN_SOURCE="CLI argument"
shift 2
;;
-h | --help)
usage 0
;;
*)
echo "Unknown option: $1"
usage
;;
esac
done

# Apply environment variable fallback if CLI args not provided
if [ -z "$AUTH_KEYS_URL" ] && [ -n "$KAGGLELINK_KEYS_URL" ]; then
AUTH_KEYS_URL="$KAGGLELINK_KEYS_URL"
AUTH_KEYS_SOURCE="KAGGLELINK_KEYS_URL env var"
fi

if [ -z "$ZROK_TOKEN" ] && [ -n "$KAGGLELINK_TOKEN" ]; then
ZROK_TOKEN="$KAGGLELINK_TOKEN"
ZROK_TOKEN_SOURCE="KAGGLELINK_TOKEN env var"
fi

# Log configuration source for transparency
if [ -n "$AUTH_KEYS_URL" ]; then
echo "ℹ️ Using keys URL from: $AUTH_KEYS_SOURCE"
fi
if [ -n "$ZROK_TOKEN" ]; then
echo "ℹ️ Using token from: $ZROK_TOKEN_SOURCE"
fi

# Check for required parameters
if [ -z "$AUTH_KEYS_URL" ]; then
echo "Error: Public key URL (-k or --keys-url) is required"
usage
echo "Error: Public key URL is required"
echo " Provide via: -k <url> or --keys-url <url>"
echo " Or set: KAGGLELINK_KEYS_URL environment variable"
echo " Run with --help for more information"
exit 1
fi

if [ -z "$ZROK_TOKEN" ]; then
echo "Error: zrok token (-t or --token) is required"
usage
echo "Error: zrok token is required"
echo " Provide via: -t <token> or --token <token>"
echo " Or set: KAGGLELINK_TOKEN environment variable"
echo " Run with --help for more information"
exit 1
fi

# Validate that AUTH_KEYS_URL uses HTTPS (security requirement)
Expand Down
Loading