Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
2d4072a
move providers code from lattice to app
deep1401 Nov 21, 2025
f142ecf
first change to create a router for providers with auth
deep1401 Nov 21, 2025
6da196d
changes to make providers.yaml optional and making the router use the…
deep1401 Nov 21, 2025
399ca0b
Merge branch 'fix/add-back-process-envs-in-renderer' of https://githu…
deep1401 Nov 21, 2025
92d63b7
make example load from db first
deep1401 Nov 21, 2025
8410d3a
Merge branch 'main' of https://github.com/transformerlab/transformerl…
deep1401 Nov 21, 2025
d2bd97b
change the logic to import transformerlab.providers
deep1401 Nov 21, 2025
a64cc0f
Merge branch 'main' of https://github.com/transformerlab/transformerl…
deep1401 Nov 21, 2025
2a7a8c6
Add logic for doing all cluster operations - launch, submit, list, ca…
deep1401 Nov 21, 2025
41f0a16
linter
deep1401 Nov 21, 2025
57edfd9
Merge branch 'main' of https://github.com/transformerlab/transformerl…
deep1401 Nov 21, 2025
34ea757
add routes for things providers does for jobs
deep1401 Nov 21, 2025
bc8d699
fix all security except ssh vulnerability
deep1401 Nov 21, 2025
4bc9c1a
fix import
deep1401 Nov 21, 2025
02c954b
replace return formats correctly
deep1401 Nov 21, 2025
cdde01e
ruff
deep1401 Nov 24, 2025
db53b4d
use the tasks sidebar tab for launching with providers
deep1401 Nov 24, 2025
92ee3dc
Make the check status work with providers router
deep1401 Nov 24, 2025
ccfdae8
Merge branch 'main' into add/bridge
deep1401 Nov 24, 2025
918ab21
Make stop button work with providers
deep1401 Nov 24, 2025
2e51b18
Merge branch 'add/bridge' of https://github.com/transformerlab/transf…
deep1401 Nov 24, 2025
5ae13e5
Add support for user-defined env vars
deep1401 Nov 24, 2025
3d2eb57
merge conflict
deep1401 Nov 24, 2025
6fd061c
alembic migration
deep1401 Nov 24, 2025
f90a028
Merge branch 'main' of https://github.com/transformerlab/transformerl…
deep1401 Nov 24, 2025
d631955
format
deep1401 Nov 24, 2025
511bfba
fixes 1
deep1401 Nov 24, 2025
f728b99
fixes 2
deep1401 Nov 24, 2025
6df4c2f
make any script having sdk logic log correctly with fsspec
deep1401 Nov 24, 2025
64056da
Merge branch 'main' of https://github.com/transformerlab/transformerl…
deep1401 Nov 24, 2025
105657a
Merge branch 'main' into add/bridge
dadmobile Nov 25, 2025
3b09c37
Merge branch 'main' of https://github.com/transformerlab/transformerl…
deep1401 Nov 25, 2025
a9c02e4
Merge branch 'main' into add/bridge
dadmobile Nov 25, 2025
bc33bb4
Add Providers list to bottom of Team page
dadmobile Nov 25, 2025
435d4e0
Get providers from API
dadmobile Nov 25, 2025
40b6eda
Render details for providers.
dadmobile Nov 25, 2025
7f9c9fa
Add providers create endpoint
dadmobile Nov 25, 2025
c576bc2
Add modal for creating and editing providers
dadmobile Nov 25, 2025
47b7e80
Add button for creating a new provider
dadmobile Nov 25, 2025
b1766c0
Fix default selected provider type
dadmobile Nov 25, 2025
2e3792d
Fix layout issues with new provider modal.
dadmobile Nov 25, 2025
be4b0e0
Add raw config field for now for creating providers
dadmobile Nov 25, 2025
0ebbd01
Reset fields on successful provider creation
dadmobile Nov 25, 2025
3fe394d
Mutate providers list after update
dadmobile Nov 25, 2025
beae5b2
Merge branch 'main' into add/bridge
dadmobile Nov 26, 2025
bcd5b32
prettier
dadmobile Nov 26, 2025
d754842
Merge branch 'add/bridge' of https://github.com/transformerlab/transf…
dadmobile Nov 26, 2025
14003df
Merge branch 'main' into add/bridge
deep1401 Nov 26, 2025
307917f
Add network icon to providers list
dadmobile Nov 26, 2025
f9cc35a
Add edit and delete buttons to providers
dadmobile Nov 26, 2025
a61e5cb
fix providers list to use new auth
deep1401 Nov 26, 2025
46fbd2d
merge conflict
deep1401 Nov 26, 2025
cf34b12
Add remaining providers API endpoints
dadmobile Nov 26, 2025
55e54a2
show tasks tab only if we have atleast 1 provider
deep1401 Nov 26, 2025
1608af4
Merge branch 'add/bridge' of https://github.com/transformerlab/transf…
deep1401 Nov 26, 2025
2254c6b
tiny change to preserve migration.txt
deep1401 Nov 26, 2025
d44a26b
Add ability to edit existing provider
dadmobile Nov 26, 2025
5af8443
add functionality for file upload and fix bugs
deep1401 Nov 26, 2025
6578791
Merge branch 'add/bridge' of https://github.com/transformerlab/transf…
deep1401 Nov 26, 2025
a8c2731
prettier
deep1401 Nov 26, 2025
cb11a7f
change os to lab.storage in providers/slurm
deep1401 Nov 26, 2025
10332a5
fix skypilot logs return
deep1401 Nov 26, 2025
7f4ff2f
remove print statement
deep1401 Nov 26, 2025
73c75c2
add custom missing host policy
deep1401 Nov 26, 2025
b80e3ef
change to warning policy for checking
deep1401 Nov 26, 2025
7e4ce32
add back custom policy
deep1401 Nov 26, 2025
f702cd9
remove comment
deep1401 Nov 26, 2025
f45eec4
Factor code for updating and creating providers
dadmobile Nov 26, 2025
3ed04eb
add provider form updates
aliasaria Nov 26, 2025
d1dc35d
make delete providers work
deep1401 Nov 26, 2025
a4cc66d
make edit button work
deep1401 Nov 26, 2025
ede16ab
typo
deep1401 Nov 26, 2025
ab234c7
Switch providers to compute providers
deep1401 Nov 26, 2025
93e796a
frontend changes for providers to compute providers
deep1401 Nov 26, 2025
4ce20b7
Switch leftover references
deep1401 Nov 26, 2025
669cfa5
prettier
dadmobile Nov 26, 2025
960581b
put policy under a function
deep1401 Nov 27, 2025
8215df7
suppress huge errors which come up during the initial launch stage
deep1401 Nov 27, 2025
81b52fd
Update providers list on field updates
dadmobile Nov 27, 2025
a337522
Merge branch 'main' into add/bridge
deep1401 Nov 27, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions api/alembic/versions/63ca6eebc24c_add_team_providers_tables.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
"""add team_providers_tables

Revision ID: 63ca6eebc24c
Revises: f7661070ec23
Create Date: 2025-11-24 11:35:14.455588

"""
from typing import Sequence, Union

from alembic import op
import sqlalchemy as sa


# revision identifiers, used by Alembic.
revision: str = '63ca6eebc24c'
down_revision: Union[str, Sequence[str], None] = 'f7661070ec23'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None


def upgrade() -> None:
"""Upgrade schema."""
# ### commands auto generated by Alembic - please adjust! ###
op.create_table('team_providers',
sa.Column('id', sa.String(), nullable=False),
sa.Column('team_id', sa.String(), nullable=False),
sa.Column('name', sa.String(), nullable=False),
sa.Column('type', sa.String(), nullable=False),
sa.Column('config', sa.JSON(), nullable=False),
sa.Column('created_by_user_id', sa.String(), nullable=False),
sa.Column('created_at', sa.DateTime(), server_default=sa.text('(CURRENT_TIMESTAMP)'), nullable=False),
sa.Column('updated_at', sa.DateTime(), server_default=sa.text('(CURRENT_TIMESTAMP)'), nullable=False),
sa.PrimaryKeyConstraint('id')
)
op.create_index('idx_team_provider_name', 'team_providers', ['team_id', 'name'], unique=False)
op.create_index(op.f('ix_team_providers_team_id'), 'team_providers', ['team_id'], unique=False)
op.create_index(op.f('ix_team_providers_type'), 'team_providers', ['type'], unique=False)
# ### end Alembic commands ###


def downgrade() -> None:
"""Downgrade schema."""
# ### commands auto generated by Alembic - please adjust! ###
op.drop_index(op.f('ix_team_providers_type'), table_name='team_providers')
op.drop_index(op.f('ix_team_providers_team_id'), table_name='team_providers')
op.drop_index('idx_team_provider_name', table_name='team_providers')
op.drop_table('team_providers')
# ### end Alembic commands ###
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
"""rename_team_providers_to_compute_providers

Revision ID: be6b6cb9f784
Revises: 63ca6eebc24c
Create Date: 2025-11-26 14:47:16.424026

"""

from typing import Sequence, Union

from alembic import op


# revision identifiers, used by Alembic.
revision: str = "be6b6cb9f784"
down_revision: Union[str, Sequence[str], None] = "63ca6eebc24c"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None


def upgrade() -> None:
"""Upgrade schema."""
# Rename the table from team_providers to compute_providers
op.rename_table("team_providers", "compute_providers")

# Rename the index
op.drop_index("idx_team_provider_name", table_name="compute_providers")
op.create_index("idx_compute_provider_name", "compute_providers", ["team_id", "name"], unique=False)

# Update index names that use the table name pattern
# The ix_team_providers_* indexes will be automatically handled by SQLAlchemy/Alembic
# but we should verify they exist and update if needed
try:
op.drop_index(op.f("ix_team_providers_team_id"), table_name="compute_providers")
except Exception:
pass # Index might not exist or already dropped
try:
op.drop_index(op.f("ix_team_providers_type"), table_name="compute_providers")
except Exception:
pass # Index might not exist or already dropped

# Create new indexes with correct names (Alembic will auto-generate these on next autogenerate)
op.create_index(op.f("ix_compute_providers_team_id"), "compute_providers", ["team_id"], unique=False)
op.create_index(op.f("ix_compute_providers_type"), "compute_providers", ["type"], unique=False)


def downgrade() -> None:
"""Downgrade schema."""
# Drop new indexes
op.drop_index(op.f("ix_compute_providers_type"), table_name="compute_providers")
op.drop_index(op.f("ix_compute_providers_team_id"), table_name="compute_providers")
op.drop_index("idx_compute_provider_name", table_name="compute_providers")

# Rename the table back first
op.rename_table("compute_providers", "team_providers")

# Recreate old indexes on the renamed table
op.create_index("idx_team_provider_name", "team_providers", ["team_id", "name"], unique=False)
op.create_index(op.f("ix_team_providers_team_id"), "team_providers", ["team_id"], unique=False)
op.create_index(op.f("ix_team_providers_type"), "team_providers", ["type"], unique=False)
2 changes: 2 additions & 0 deletions api/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@
batched_prompts,
recipes,
teams,
compute_provider,
auth,
)
from transformerlab.routers.auth import get_user_and_team # noqa: E402
Expand Down Expand Up @@ -236,6 +237,7 @@ async def validation_exception_handler(request, exc):
app.include_router(batched_prompts.router, dependencies=[Depends(get_user_and_team)])
app.include_router(fastchat_openai_api.router, dependencies=[Depends(get_user_and_team)])
app.include_router(teams.router, dependencies=[Depends(get_user_and_team)])
app.include_router(compute_provider.router)
app.include_router(auth.router)

controller_process = None
Expand Down
13 changes: 13 additions & 0 deletions api/transformerlab/compute_providers/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
"""Compute provider bridge system for abstracting GPU orchestration providers."""

from .base import ComputeProvider
from .router import ComputeProviderRouter, get_provider
from .config import load_compute_providers_config, ComputeProviderConfig

__all__ = [
"ComputeProvider",
"ComputeProviderRouter",
"get_provider",
"load_compute_providers_config",
"ComputeProviderConfig",
]
131 changes: 131 additions & 0 deletions api/transformerlab/compute_providers/base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
"""Abstract base class for provider implementations."""

from abc import ABC, abstractmethod
from typing import Dict, List, Any, Optional, Union
from .models import (
ClusterConfig,
JobConfig,
ClusterStatus,
JobInfo,
ResourceInfo,
)


class ComputeProvider(ABC):
"""Abstract base class for all compute provider implementations."""

@abstractmethod
def launch_cluster(self, cluster_name: str, config: ClusterConfig) -> Dict[str, Any]:
"""
Launch/provision a new cluster.

Args:
cluster_name: Name of the cluster to launch
config: Cluster configuration

Returns:
Dictionary with launch result (e.g., request_id, cluster_name)
"""
raise NotImplementedError

@abstractmethod
def stop_cluster(self, cluster_name: str) -> Dict[str, Any]:
"""
Stop a running cluster (but don't tear it down).

Args:
cluster_name: Name of the cluster to stop

Returns:
Dictionary with stop result
"""
raise NotImplementedError

@abstractmethod
def get_cluster_status(self, cluster_name: str) -> ClusterStatus:
"""
Get the status of a cluster.

Args:
cluster_name: Name of the cluster

Returns:
ClusterStatus object with cluster information
"""
raise NotImplementedError

@abstractmethod
def get_cluster_resources(self, cluster_name: str) -> ResourceInfo:
"""
Get resource information for a cluster (GPUs, CPUs, memory, etc.).

Args:
cluster_name: Name of the cluster

Returns:
ResourceInfo object with resource details
"""
raise NotImplementedError

@abstractmethod
def submit_job(self, cluster_name: str, job_config: JobConfig) -> Dict[str, Any]:
"""
Submit a job to an existing cluster.

Args:
cluster_name: Name of the cluster
job_config: Job configuration

Returns:
Dictionary with job submission result (e.g., job_id)
"""
raise NotImplementedError

@abstractmethod
def get_job_logs(
self,
cluster_name: str,
job_id: Union[str, int],
tail_lines: Optional[int] = None,
follow: bool = False,
) -> Union[str, Any]:
"""
Get logs for a job.

Args:
cluster_name: Name of the cluster
job_id: Job identifier
tail_lines: Number of lines to retrieve from the end (None for all)
follow: Whether to stream/follow logs (returns stream if True)

Returns:
Log content as string, or stream object if follow=True
"""
raise NotImplementedError

@abstractmethod
def cancel_job(self, cluster_name: str, job_id: Union[str, int]) -> Dict[str, Any]:
"""
Cancel a running or queued job.

Args:
cluster_name: Name of the cluster
job_id: Job identifier

Returns:
Dictionary with cancellation result
"""
raise NotImplementedError

@abstractmethod
def list_jobs(self, cluster_name: str) -> List[JobInfo]:
"""
List all jobs for a cluster.

Args:
cluster_name: Name of the cluster

Returns:
List of JobInfo objects
"""
raise NotImplementedError
Loading
Loading