VoidRunner Development Guidelines

This document defines code style guidelines, review criteria, project-specific rules, and preferred patterns for the VoidRunner distributed task execution platform.

Project Overview

VoidRunner is a distributed task execution platform designed for secure, scalable code execution. The project follows an incremental development approach through well-defined Epic milestones.

Current Implementation Status (Epic 1-2 ✅ Complete)

Backend: Go + Gin framework + PostgreSQL (pgx driver)
API: RESTful API with JWT authentication and comprehensive task management
Database: PostgreSQL with optimized schema and cursor pagination
Container Execution: Docker executor with comprehensive security controls
Queue System: Redis-based task queuing with retry logic and dead letter handling
Worker Management: Embedded worker pool with concurrency controls and health monitoring
Testing: 80%+ code coverage with unit and integration tests
Documentation: OpenAPI/Swagger specs with comprehensive examples

Planned Architecture (Epic 3-4 📋 Roadmap)

Distributed Services: Separate API and worker services for horizontal scaling
Frontend: Svelte + SvelteKit + TypeScript web interface
Infrastructure: Kubernetes (GKE) deployment with microservices
Log Streaming: Real-time log collection and streaming
Monitoring: Real-time metrics, logging, and alerting systems

Go Code Standards

Project Structure

voidrunner/
├── cmd/                    # Application entrypoints
│   ├── api/               # ✅ API server main (implemented)
│   ├── migrate/           # ✅ Database migration tool (implemented)
│   └── scheduler/         # ✅ Scheduler service main (implemented - for future distributed mode)
├── internal/              # Private application code
│   ├── api/              # ✅ API handlers and routes (implemented)
│   ├── auth/             # ✅ Authentication logic (implemented)
│   ├── config/           # ✅ Configuration management (implemented)
│   ├── database/         # ✅ Database layer (implemented)
│   ├── models/           # ✅ Data models (implemented)
│   ├── services/         # ✅ Business logic services (implemented)
│   ├── executor/         # ✅ Task execution engine (implemented)
│   ├── queue/            # ✅ Redis queue integration (implemented)
│   └── worker/           # ✅ Worker management (implemented)
├── pkg/                   # Public libraries
│   ├── logger/           # ✅ Structured logging (implemented)
│   ├── utils/            # ✅ Shared utilities (implemented)
│   └── metrics/          # 📋 Prometheus metrics (planned - Epic 4)
├── api/                   # ✅ API specifications (OpenAPI) (implemented)
├── migrations/           # ✅ Database migrations (implemented)
├── tests/                # ✅ Integration tests (implemented)
├── scripts/              # ✅ Build and deployment scripts (implemented)
├── docs/                 # ✅ Documentation (implemented)
├── deployments/          # 📋 Kubernetes manifests (planned - Epic 3)
└── frontend/             # 📋 Svelte web interface (planned - Epic 3)

Epic Development Roadmap

Epic 1: Core API Infrastructure ✅ Complete

JWT authentication system
Task management CRUD operations
PostgreSQL database with pgx
Comprehensive testing suite
OpenAPI documentation

Epic 2: Container Execution Engine ✅ Complete

Docker client integration with security controls
Task execution workflow and state management
Embedded worker pool with concurrency management
Redis-based queue system with retry logic
Health monitoring and cleanup mechanisms

Epic 3: Frontend Interface 📋 Planned

Svelte project setup and architecture
Authentication UI and user management
Task creation and management interface
Real-time task status updates

Epic 4: Advanced Features 📋 Planned

Distributed services architecture (Issue #46)
Real-time log collection and streaming (Issue #11)
Enhanced error handling mechanisms (Issue #12)
Collaborative features and sharing
Advanced search and filtering
Real-time dashboard and system metrics
Advanced notifications and alerting

GitHub Issues Progress Tracking

Epic 1: Core API Infrastructure ✅ Complete

Issue #3: PostgreSQL Database Setup and Schema Design ✅ Closed
Issue #4: JWT Authentication System Implementation ✅ Closed
Issue #5: Task Management API Endpoints ✅ Closed
Issue #6: API Documentation and Testing Infrastructure ✅ Closed

Epic 2: Container Execution Engine ✅ Complete

Issue #9: Docker Client Integration and Security Configuration ✅ Closed
Issue #10: Task Execution Workflow and State Management ✅ Closed

Epic 2 Enhancements (Non-blocking improvements)

Issue #11: Log Collection and Real-time Streaming 📋 Open (Priority 1)
Issue #12: Error Handling and Cleanup Mechanisms 📋 Open (Priority 2)

Note: Issues #11-12 are enhancements to the completed Epic 2 functionality, not blockers. The core container execution engine with embedded workers is fully operational.

Epic 3: Frontend Interface 📋 Ready to Start

Issue #22: Frontend Interface 📋 Open
Issue #23: Svelte Project Setup and Architecture 📋 Open (Priority 0)
Issue #24: Authentication UI and User Management 📋 Open (Priority 0)
Issue #25: Task Creation and Management Interface 📋 Open (Priority 0)
Issue #26: Real-time Task Status Updates 📋 Open (Priority 0)
Issue #27: Real-time Features 📋 Open

Epic 4: Advanced Features 📋 Future Work

Issue #28: Real-time Dashboard and System Metrics 📋 Open (Priority 0)
Issue #29: Advanced Notifications and Alerting 📋 Open (Priority 1)
Issue #30: Advanced Search and Filtering 📋 Open (Priority 1)
Issue #31: Collaborative Features and Sharing 📋 Open (Priority 2)

Future Enhancements

Issue #46: Separate API and Worker Services for Horizontal Scaling 📋 Open
- This epic will transition from embedded workers to distributed services
- Currently tracked for future implementation when scaling requirements emerge

Current Focus

With Epic 1-2 complete, the project has a fully functional task execution platform with embedded workers. The next logical step is Epic 3 (Frontend Interface) to provide a web-based user interface, followed by Epic 4 advanced features and eventual transition to distributed services (Issue #46).

Coding Standards

1. Naming Conventions

Packages: lowercase, single words when possible (auth, database, executor)
Functions: CamelCase for exported, camelCase for private
Constants: ALL_CAPS for package-level constants
Interfaces: Add "er" suffix (TaskExecutor, LogStreamer)

2. Error Handling

// PREFERRED: Structured error handling with context
func (s *TaskService) CreateTask(ctx context.Context, req CreateTaskRequest) (*Task, error) {
    if err := s.validateTaskRequest(req); err != nil {
        return nil, fmt.Errorf("validation failed: %w", err)
    }

    task, err := s.repo.CreateTask(ctx, req)
    if err != nil {
        return nil, fmt.Errorf("failed to create task: %w", err)
    }

    return task, nil
}

// AVOID: Generic error messages without context
func (s *TaskService) CreateTask(req CreateTaskRequest) (*Task, error) {
    task, err := s.repo.CreateTask(req)
    if err != nil {
        return nil, err // Too generic
    }
    return task, nil
}

3. Database Interactions

// PREFERRED: Use pgx with prepared statements and proper error handling
func (r *TaskRepository) GetTaskByID(ctx context.Context, taskID string) (*Task, error) {
    query := `
        SELECT id, name, description, status, created_at, updated_at
        FROM tasks
        WHERE id = $1 AND deleted_at IS NULL
    `

    var task Task
    err := r.pool.QueryRow(ctx, query, taskID).Scan(
        &task.ID, &task.Name, &task.Description,
        &task.Status, &task.CreatedAt, &task.UpdatedAt,
    )

    if err != nil {
        if errors.Is(err, pgx.ErrNoRows) {
            return nil, ErrTaskNotFound
        }
        return nil, fmt.Errorf("failed to get task %s: %w", taskID, err)
    }

    return &task, nil
}

4. Dependency Injection

// PREFERRED: Constructor pattern with interfaces
type TaskService struct {
    repo     TaskRepository
    executor TaskExecutor
    logger   *slog.Logger
    metrics  *prometheus.Registry
}

func NewTaskService(
    repo TaskRepository,
    executor TaskExecutor,
    logger *slog.Logger,
    metrics *prometheus.Registry,
) *TaskService {
    return &TaskService{
        repo:     repo,
        executor: executor,
        logger:   logger,
        metrics:  metrics,
    }
}

5. Context Usage

// PREFERRED: Always pass context as first parameter
func (s *TaskService) ExecuteTask(ctx context.Context, taskID string) error {
    // Check context cancellation
    select {
    case <-ctx.Done():
        return ctx.Err()
    default:
    }

    // Use context in downstream calls
    task, err := s.repo.GetTaskByID(ctx, taskID)
    if err != nil {
        return err
    }

    return s.executor.Execute(ctx, task)
}

Security Guidelines

1. Container Execution Security

// REQUIRED: All container executions must use security constraints
func (e *DockerExecutor) Execute(ctx context.Context, task *Task) error {
    containerConfig := &container.Config{
        Image:      e.getExecutorImage(task.Language),
        User:       "1000:1000", // REQUIRED: Non-root execution
        WorkingDir: "/tmp/workspace",
        Env:        e.sanitizeEnvironment(task.Environment),
    }

    hostConfig := &container.HostConfig{
        Resources: container.Resources{
            Memory:    task.MemoryLimit,
            CPUQuota:  task.CPUQuota,
            PidsLimit: ptr(int64(128)), // REQUIRED: Limit processes
        },
        SecurityOpt: []string{
            "no-new-privileges",
            "seccomp=/opt/voidrunner/seccomp-profile.json",
        },
        NetworkMode:    "none", // REQUIRED: No network access
        ReadonlyRootfs: true,   // REQUIRED: Read-only filesystem
        AutoRemove:     true,   // REQUIRED: Automatic cleanup
    }

    return e.executeWithTimeout(ctx, containerConfig, hostConfig, task.Timeout)
}

2. Input Validation

// REQUIRED: Validate all user inputs
func validateTaskRequest(req CreateTaskRequest) error {
    if strings.TrimSpace(req.Name) == "" {
        return ErrTaskNameRequired
    }

    if len(req.Name) > 255 {
        return ErrTaskNameTooLong
    }

    if !isValidLanguage(req.Language) {
        return ErrUnsupportedLanguage
    }

    if len(req.Code) > MaxCodeSize {
        return ErrCodeTooLarge
    }

    // Sanitize code content
    if containsDangerousPatterns(req.Code) {
        return ErrDangerousCodePattern
    }

    return nil
}

Logging Standards

1. Structured Logging with slog

// PREFERRED: Use structured logging with context
func (s *TaskService) CreateTask(ctx context.Context, req CreateTaskRequest) (*Task, error) {
    logger := s.logger.With(
        "operation", "create_task",
        "user_id", getUserID(ctx),
        "task_name", req.Name,
    )

    logger.Info("creating new task")

    task, err := s.repo.CreateTask(ctx, req)
    if err != nil {
        logger.Error("failed to create task", "error", err)
        return nil, err
    }

    logger.Info("task created successfully", "task_id", task.ID)
    return task, nil
}

2. Log Levels

DEBUG: Detailed flow information for troubleshooting
INFO: General operational information
WARN: Something unexpected happened but system continues
ERROR: Error condition that needs attention

Testing Standards

1. Test File Organization

// File: internal/api/task_handler_test.go
package api

import (
    "context"
    "testing"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
)

func TestTaskHandler_CreateTask(t *testing.T) {
    tests := []struct {
        name           string
        request        CreateTaskRequest
        mockSetup      func(*MockTaskService)
        expectedStatus int
        expectedError  string
    }{
        {
            name: "successful task creation",
            request: CreateTaskRequest{
                Name:        "test-task",
                Language:    "python",
                Code:        "print('hello')",
            },
            mockSetup: func(m *MockTaskService) {
                m.On("CreateTask", mock.Anything, mock.Anything).
                    Return(&Task{ID: "123"}, nil)
            },
            expectedStatus: 201,
        },
        // More test cases...
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            // Test implementation
        })
    }
}

2. Test Classification Guidelines

Unit Tests (Located in internal/package/*_test.go)

Test individual functions and methods in isolation
Mock external dependencies (databases, Redis, HTTP clients)
Test validation logic, business rules, and error handling
Should run fast (< 1 second total)
No external service dependencies

// UNIT TEST: Tests validation logic only
func TestLogConfigValidation(t *testing.T) {
    invalidConfig := &LogConfig{
        BufferSize: -1, // Invalid
    }
    
    err := invalidConfig.Validate()
    assert.Error(t, err)
    assert.Contains(t, err.Error(), "buffer_size must be positive")
}

Integration Tests (Located in tests/integration/*_test.go)

Test interactions between multiple components
Test with real external dependencies (PostgreSQL, Redis, Docker)
Test system behavior under failure conditions
Use build tag //go:build integration
Package declaration: package integration_test

//go:build integration

package integration_test

// INTEGRATION TEST: Tests Redis dependency interaction
func TestLoggingServiceDependencies(t *testing.T) {
    service, err := logging.NewRedisStreamingService(nil, config, logger)
    
    assert.Nil(t, service)
    assert.Error(t, err)
    assert.Contains(t, err.Error(), "redis client is required")
}

Test Organization Rules

Unit tests stay co-located with the package they test
Integration tests go in tests/integration/
Use descriptive test names: TestComponentName_Functionality
Group related tests in the same file
Use build tags to separate unit from integration tests

3. Integration Tests

// REQUIRED: Integration tests for critical paths
func TestTaskExecution_Integration(t *testing.T) {
    if testing.Short() {
        t.Skip("skipping integration test")
    }

    // Setup test database
    db := setupTestDB(t)
    defer cleanupTestDB(t, db)

    // Setup test containers
    executor := setupTestExecutor(t)
    defer cleanupTestExecutor(t, executor)

    // Test execution flow
    service := NewTaskService(db, executor, logger)

    task, err := service.CreateTask(context.Background(), CreateTaskRequest{
        Name:     "integration-test",
        Language: "python",
        Code:     "print('integration test')",
    })
    require.NoError(t, err)

    err = service.ExecuteTask(context.Background(), task.ID)
    require.NoError(t, err)

    // Verify execution results
    result, err := service.GetTaskResult(context.Background(), task.ID)
    require.NoError(t, err)
    assert.Equal(t, "completed", result.Status)
}

Kubernetes and Infrastructure Standards

1. Resource Specifications

# REQUIRED: All deployments must specify resource limits
apiVersion: apps/v1
kind: Deployment
metadata:
  name: voidrunner-api
spec:
  template:
    spec:
      containers:
        - name: api
          image: voidrunner/api:latest
          resources:
            requests:
              memory: "256Mi"
              cpu: "100m"
            limits:
              memory: "1Gi"
              cpu: "500m"
          # REQUIRED: Security context
          securityContext:
            allowPrivilegeEscalation: false
            runAsNonRoot: true
            runAsUser: 1000
            readOnlyRootFilesystem: true

2. Health Checks

# REQUIRED: All services must have health checks
livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5
  timeoutSeconds: 3
  failureThreshold: 2

Code Review Criteria

Mandatory Checks

Security: No hardcoded secrets, proper input validation
Error Handling: All errors properly wrapped with context
Testing: Unit tests for new functionality, integration tests for critical paths
Performance: Database queries optimized, no N+1 problems
Logging: Structured logging with appropriate levels
Documentation: Public functions and complex logic documented

Performance Requirements

API response times: < 200ms for 95% of requests
Database queries: < 50ms median response time
Container startup: < 5 seconds for cold starts
Memory usage: < 1GB per API instance

Security Requirements

All user inputs validated and sanitized
Container execution with security constraints
Secrets managed through Kubernetes secrets
No privilege escalation in containers
Network policies enforced

Git Workflow and Commit Standards

Branch Naming

feature/issue-number-short-description
bugfix/issue-number-short-description
hotfix/issue-number-short-description

Commit Messages

type(scope): short description

Longer description if needed

Fixes #123

Types: feat, fix, docs, style, refactor, test, chore Scopes: api, frontend, executor, scheduler, k8s, security

Pull Request Requirements

Environment-Specific Configurations

Development

# config/development.yaml
database:
  host: localhost
  port: 5432
  ssl_mode: disable

executor:
  timeout: 30s
  memory_limit: 512Mi

logging:
  level: debug
  format: console

Production

# config/production.yaml
database:
  host: ${DB_HOST}
  port: 5432
  ssl_mode: require

executor:
  timeout: 3600s
  memory_limit: 1Gi

logging:
  level: info
  format: json

Testing

Testing configuration is unified between CI and local environments for consistency:

# Integration test environment variables (used by both CI and local)
TEST_DB_HOST=localhost
TEST_DB_PORT=5432
TEST_DB_USER=testuser
TEST_DB_PASSWORD=testpassword
TEST_DB_NAME=voidrunner_test
TEST_DB_SSLMODE=disable
JWT_SECRET_KEY=test-secret-key-for-integration

Key Principles:

Unified Configuration: Same database and JWT settings for CI and local testing
Environment Detection: CI=true used only for output formats (SARIF, coverage)
Database Independence: Tests automatically skip when database unavailable
Consistent Behavior: Integration tests behave identically in both environments

Common Patterns and Anti-Patterns

✅ Preferred Patterns

// Repository pattern with interfaces
type TaskRepository interface {
    CreateTask(ctx context.Context, task *Task) error
    GetTask(ctx context.Context, id string) (*Task, error)
    UpdateTaskStatus(ctx context.Context, id string, status TaskStatus) error
}

// Service layer with dependency injection
type TaskService struct {
    repo TaskRepository
    exec TaskExecutor
}

// Proper context cancellation handling
func (s *Service) LongRunningOperation(ctx context.Context) error {
    for {
        select {
        case <-ctx.Done():
            return ctx.Err()
        default:
            // Continue processing
        }
    }
}

❌ Anti-Patterns to Avoid

// DON'T: Global variables
var GlobalDB *sql.DB

// DON'T: Panic in library code
func ProcessTask(task *Task) {
    if task == nil {
        panic("task is nil") // Use error returns instead
    }
}

// DON'T: Ignoring errors
result, _ := dangerousOperation() // Always handle errors

// DON'T: Magic numbers
time.Sleep(300 * time.Second) // Use named constants

Monitoring and Observability

Metrics

// REQUIRED: Add metrics for all critical operations
var (
    taskExecutionDuration = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name: "voidrunner_task_execution_duration_seconds",
            Help: "Time spent executing tasks",
        },
        []string{"task_type", "status"},
    )
)

func (s *TaskService) ExecuteTask(ctx context.Context, task *Task) error {
    start := time.Now()
    defer func() {
        duration := time.Since(start)
        taskExecutionDuration.WithLabelValues(task.Language, task.Status).Observe(duration.Seconds())
    }()

    return s.executor.Execute(ctx, task)
}

Tracing

// REQUIRED: Add tracing for complex operations
func (s *TaskService) ExecuteTask(ctx context.Context, taskID string) error {
    ctx, span := tracer.Start(ctx, "TaskService.ExecuteTask")
    defer span.End()

    span.SetAttributes(attribute.String("task.id", taskID))

    // Implementation...
}

CLI Commands and Scripts

Development Commands

# Setup development environment
make setup

# Start development server with auto-reload
make dev

# Run tests
make test              # Unit tests (with coverage in CI)
make test-fast         # Fast unit tests (short mode)
make test-integration  # Integration tests
make test-all          # Both unit and integration tests

# Coverage analysis
make coverage          # Generate coverage report
make coverage-check    # Check coverage meets 80% threshold

# Code quality
make fmt               # Format code
make vet               # Run go vet
make lint              # Run linting (with format check in CI)
make security          # Security scan

# Build and run
make build             # Build API server
make run               # Run API server locally

# Documentation
make docs              # Generate API docs
make docs-serve        # Serve docs locally

# Development tools
make install-tools     # Install development tools
make clean             # Clean build artifacts

# Environment Management (Docker Compose)
make dev-up            # Start development environment (DB + Redis + API)
make dev-down          # Stop development environment  
make dev-logs          # Show development logs
make dev-restart       # Restart development environment
make dev-status        # Show development environment status

make prod-up           # Start production environment
make prod-down         # Stop production environment
make prod-logs         # Show production logs
make prod-restart      # Restart production environment
make prod-status       # Show production environment status

make env-status        # Show all environment status
make docker-clean      # Clean Docker resources

# Test services management (PostgreSQL + Redis)
make services-start    # Start test services (Docker)
make services-stop     # Stop test services
make services-reset    # Reset test services (clean slate)

# Database migrations
make migrate-up        # Run database migrations up
make migrate-down      # Run database migrations down (rollback one)
make migrate-reset     # Reset database (rollback all migrations)
make migration name=X  # Create new migration file

# Dependencies and setup
make deps              # Download and tidy dependencies
make deps-update       # Update dependencies
make setup             # Setup complete development environment

Database Operations

# Test services management (implemented)
make services-start    # Start test services containers (PostgreSQL + Redis)
make services-stop     # Stop test services containers
make services-reset    # Reset test services to clean state

# Migration management (implemented)
make migrate-up        # Apply all pending migrations
make migrate-down      # Rollback last migration
make migrate-reset     # Rollback all migrations
make migration name=add_feature  # Create new migration files

# Legacy scripts (planned for Epic 2)
./scripts/backup-db.sh production    # Database backup utility
./scripts/restore-db.sh backup.sql   # Database restore utility

Documentation Standards

API Documentation

OpenAPI specifications for all endpoints
Include request/response examples
Document error codes and meanings
Rate limiting information

Code Documentation

// TaskExecutor handles the execution of user-submitted code in secure containers.
// It manages the complete lifecycle from container creation to cleanup.
//
// Example usage:
//   executor := NewDockerExecutor(client, logger)
//   result, err := executor.Execute(ctx, task)
//   if err != nil {
//       return fmt.Errorf("execution failed: %w", err)
//   }
type TaskExecutor interface {
    // Execute runs the given task in a secure container environment.
    // It returns the execution result or an error if execution fails.
    Execute(ctx context.Context, task *Task) (*ExecutionResult, error)
}

Release and Deployment

Version Tagging

Use semantic versioning: v1.2.3
Tag format: git tag -a v1.2.3 -m "Release v1.2.3"

Deployment Checklist

Document Version: 1.1
Last Updated: 2025-07-10
Next Review: 2025-08-10

For questions about these guidelines, please reach out to the technical lead or create an issue in the repository.

FilesExpand file tree

CLAUDE.md

Latest commit

History