Skip to content

Latest commit

 

History

History
874 lines (700 loc) · 25.8 KB

File metadata and controls

874 lines (700 loc) · 25.8 KB

VoidRunner Development Guidelines

This document defines code style guidelines, review criteria, project-specific rules, and preferred patterns for the VoidRunner distributed task execution platform.

Project Overview

VoidRunner is a distributed task execution platform designed for secure, scalable code execution. The project follows an incremental development approach through well-defined Epic milestones.

Current Implementation Status (Epic 1-2 ✅ Complete)

  • Backend: Go + Gin framework + PostgreSQL (pgx driver)
  • API: RESTful API with JWT authentication and comprehensive task management
  • Database: PostgreSQL with optimized schema and cursor pagination
  • Container Execution: Docker executor with comprehensive security controls
  • Queue System: Redis-based task queuing with retry logic and dead letter handling
  • Worker Management: Embedded worker pool with concurrency controls and health monitoring
  • Testing: 80%+ code coverage with unit and integration tests
  • Documentation: OpenAPI/Swagger specs with comprehensive examples

Planned Architecture (Epic 3-4 📋 Roadmap)

  • Distributed Services: Separate API and worker services for horizontal scaling
  • Frontend: Svelte + SvelteKit + TypeScript web interface
  • Infrastructure: Kubernetes (GKE) deployment with microservices
  • Log Streaming: Real-time log collection and streaming
  • Monitoring: Real-time metrics, logging, and alerting systems

Go Code Standards

Project Structure

voidrunner/
├── cmd/                    # Application entrypoints
│   ├── api/               # ✅ API server main (implemented)
│   ├── migrate/           # ✅ Database migration tool (implemented)
│   └── scheduler/         # ✅ Scheduler service main (implemented - for future distributed mode)
├── internal/              # Private application code
│   ├── api/              # ✅ API handlers and routes (implemented)
│   ├── auth/             # ✅ Authentication logic (implemented)
│   ├── config/           # ✅ Configuration management (implemented)
│   ├── database/         # ✅ Database layer (implemented)
│   ├── models/           # ✅ Data models (implemented)
│   ├── services/         # ✅ Business logic services (implemented)
│   ├── executor/         # ✅ Task execution engine (implemented)
│   ├── queue/            # ✅ Redis queue integration (implemented)
│   └── worker/           # ✅ Worker management (implemented)
├── pkg/                   # Public libraries
│   ├── logger/           # ✅ Structured logging (implemented)
│   ├── utils/            # ✅ Shared utilities (implemented)
│   └── metrics/          # 📋 Prometheus metrics (planned - Epic 4)
├── api/                   # ✅ API specifications (OpenAPI) (implemented)
├── migrations/           # ✅ Database migrations (implemented)
├── tests/                # ✅ Integration tests (implemented)
├── scripts/              # ✅ Build and deployment scripts (implemented)
├── docs/                 # ✅ Documentation (implemented)
├── deployments/          # 📋 Kubernetes manifests (planned - Epic 3)
└── frontend/             # 📋 Svelte web interface (planned - Epic 3)

Epic Development Roadmap

Epic 1: Core API InfrastructureComplete

  • JWT authentication system
  • Task management CRUD operations
  • PostgreSQL database with pgx
  • Comprehensive testing suite
  • OpenAPI documentation

Epic 2: Container Execution EngineComplete

  • Docker client integration with security controls
  • Task execution workflow and state management
  • Embedded worker pool with concurrency management
  • Redis-based queue system with retry logic
  • Health monitoring and cleanup mechanisms

Epic 3: Frontend Interface 📋 Planned

  • Svelte project setup and architecture
  • Authentication UI and user management
  • Task creation and management interface
  • Real-time task status updates

Epic 4: Advanced Features 📋 Planned

  • Distributed services architecture (Issue #46)
  • Real-time log collection and streaming (Issue #11)
  • Enhanced error handling mechanisms (Issue #12)
  • Collaborative features and sharing
  • Advanced search and filtering
  • Real-time dashboard and system metrics
  • Advanced notifications and alerting

GitHub Issues Progress Tracking

Epic 1: Core API Infrastructure ✅ Complete

  • Issue #3: PostgreSQL Database Setup and Schema Design ✅ Closed
  • Issue #4: JWT Authentication System Implementation ✅ Closed
  • Issue #5: Task Management API Endpoints ✅ Closed
  • Issue #6: API Documentation and Testing Infrastructure ✅ Closed

Epic 2: Container Execution Engine ✅ Complete

  • Issue #9: Docker Client Integration and Security Configuration ✅ Closed
  • Issue #10: Task Execution Workflow and State Management ✅ Closed

Epic 2 Enhancements (Non-blocking improvements)

  • Issue #11: Log Collection and Real-time Streaming 📋 Open (Priority 1)
  • Issue #12: Error Handling and Cleanup Mechanisms 📋 Open (Priority 2)

Note: Issues #11-12 are enhancements to the completed Epic 2 functionality, not blockers. The core container execution engine with embedded workers is fully operational.

Epic 3: Frontend Interface 📋 Ready to Start

  • Issue #22: Frontend Interface 📋 Open
  • Issue #23: Svelte Project Setup and Architecture 📋 Open (Priority 0)
  • Issue #24: Authentication UI and User Management 📋 Open (Priority 0)
  • Issue #25: Task Creation and Management Interface 📋 Open (Priority 0)
  • Issue #26: Real-time Task Status Updates 📋 Open (Priority 0)
  • Issue #27: Real-time Features 📋 Open

Epic 4: Advanced Features 📋 Future Work

  • Issue #28: Real-time Dashboard and System Metrics 📋 Open (Priority 0)
  • Issue #29: Advanced Notifications and Alerting 📋 Open (Priority 1)
  • Issue #30: Advanced Search and Filtering 📋 Open (Priority 1)
  • Issue #31: Collaborative Features and Sharing 📋 Open (Priority 2)

Future Enhancements

  • Issue #46: Separate API and Worker Services for Horizontal Scaling 📋 Open
    • This epic will transition from embedded workers to distributed services
    • Currently tracked for future implementation when scaling requirements emerge

Current Focus

With Epic 1-2 complete, the project has a fully functional task execution platform with embedded workers. The next logical step is Epic 3 (Frontend Interface) to provide a web-based user interface, followed by Epic 4 advanced features and eventual transition to distributed services (Issue #46).

Coding Standards

1. Naming Conventions

  • Packages: lowercase, single words when possible (auth, database, executor)
  • Functions: CamelCase for exported, camelCase for private
  • Constants: ALL_CAPS for package-level constants
  • Interfaces: Add "er" suffix (TaskExecutor, LogStreamer)

2. Error Handling

// PREFERRED: Structured error handling with context
func (s *TaskService) CreateTask(ctx context.Context, req CreateTaskRequest) (*Task, error) {
    if err := s.validateTaskRequest(req); err != nil {
        return nil, fmt.Errorf("validation failed: %w", err)
    }

    task, err := s.repo.CreateTask(ctx, req)
    if err != nil {
        return nil, fmt.Errorf("failed to create task: %w", err)
    }

    return task, nil
}

// AVOID: Generic error messages without context
func (s *TaskService) CreateTask(req CreateTaskRequest) (*Task, error) {
    task, err := s.repo.CreateTask(req)
    if err != nil {
        return nil, err // Too generic
    }
    return task, nil
}

3. Database Interactions

// PREFERRED: Use pgx with prepared statements and proper error handling
func (r *TaskRepository) GetTaskByID(ctx context.Context, taskID string) (*Task, error) {
    query := `
        SELECT id, name, description, status, created_at, updated_at
        FROM tasks
        WHERE id = $1 AND deleted_at IS NULL
    `

    var task Task
    err := r.pool.QueryRow(ctx, query, taskID).Scan(
        &task.ID, &task.Name, &task.Description,
        &task.Status, &task.CreatedAt, &task.UpdatedAt,
    )

    if err != nil {
        if errors.Is(err, pgx.ErrNoRows) {
            return nil, ErrTaskNotFound
        }
        return nil, fmt.Errorf("failed to get task %s: %w", taskID, err)
    }

    return &task, nil
}

4. Dependency Injection

// PREFERRED: Constructor pattern with interfaces
type TaskService struct {
    repo     TaskRepository
    executor TaskExecutor
    logger   *slog.Logger
    metrics  *prometheus.Registry
}

func NewTaskService(
    repo TaskRepository,
    executor TaskExecutor,
    logger *slog.Logger,
    metrics *prometheus.Registry,
) *TaskService {
    return &TaskService{
        repo:     repo,
        executor: executor,
        logger:   logger,
        metrics:  metrics,
    }
}

5. Context Usage

// PREFERRED: Always pass context as first parameter
func (s *TaskService) ExecuteTask(ctx context.Context, taskID string) error {
    // Check context cancellation
    select {
    case <-ctx.Done():
        return ctx.Err()
    default:
    }

    // Use context in downstream calls
    task, err := s.repo.GetTaskByID(ctx, taskID)
    if err != nil {
        return err
    }

    return s.executor.Execute(ctx, task)
}

Security Guidelines

1. Container Execution Security

// REQUIRED: All container executions must use security constraints
func (e *DockerExecutor) Execute(ctx context.Context, task *Task) error {
    containerConfig := &container.Config{
        Image:      e.getExecutorImage(task.Language),
        User:       "1000:1000", // REQUIRED: Non-root execution
        WorkingDir: "/tmp/workspace",
        Env:        e.sanitizeEnvironment(task.Environment),
    }

    hostConfig := &container.HostConfig{
        Resources: container.Resources{
            Memory:    task.MemoryLimit,
            CPUQuota:  task.CPUQuota,
            PidsLimit: ptr(int64(128)), // REQUIRED: Limit processes
        },
        SecurityOpt: []string{
            "no-new-privileges",
            "seccomp=/opt/voidrunner/seccomp-profile.json",
        },
        NetworkMode:    "none", // REQUIRED: No network access
        ReadonlyRootfs: true,   // REQUIRED: Read-only filesystem
        AutoRemove:     true,   // REQUIRED: Automatic cleanup
    }

    return e.executeWithTimeout(ctx, containerConfig, hostConfig, task.Timeout)
}

2. Input Validation

// REQUIRED: Validate all user inputs
func validateTaskRequest(req CreateTaskRequest) error {
    if strings.TrimSpace(req.Name) == "" {
        return ErrTaskNameRequired
    }

    if len(req.Name) > 255 {
        return ErrTaskNameTooLong
    }

    if !isValidLanguage(req.Language) {
        return ErrUnsupportedLanguage
    }

    if len(req.Code) > MaxCodeSize {
        return ErrCodeTooLarge
    }

    // Sanitize code content
    if containsDangerousPatterns(req.Code) {
        return ErrDangerousCodePattern
    }

    return nil
}

Logging Standards

1. Structured Logging with slog

// PREFERRED: Use structured logging with context
func (s *TaskService) CreateTask(ctx context.Context, req CreateTaskRequest) (*Task, error) {
    logger := s.logger.With(
        "operation", "create_task",
        "user_id", getUserID(ctx),
        "task_name", req.Name,
    )

    logger.Info("creating new task")

    task, err := s.repo.CreateTask(ctx, req)
    if err != nil {
        logger.Error("failed to create task", "error", err)
        return nil, err
    }

    logger.Info("task created successfully", "task_id", task.ID)
    return task, nil
}

2. Log Levels

  • DEBUG: Detailed flow information for troubleshooting
  • INFO: General operational information
  • WARN: Something unexpected happened but system continues
  • ERROR: Error condition that needs attention

Testing Standards

1. Test File Organization

// File: internal/api/task_handler_test.go
package api

import (
    "context"
    "testing"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
)

func TestTaskHandler_CreateTask(t *testing.T) {
    tests := []struct {
        name           string
        request        CreateTaskRequest
        mockSetup      func(*MockTaskService)
        expectedStatus int
        expectedError  string
    }{
        {
            name: "successful task creation",
            request: CreateTaskRequest{
                Name:        "test-task",
                Language:    "python",
                Code:        "print('hello')",
            },
            mockSetup: func(m *MockTaskService) {
                m.On("CreateTask", mock.Anything, mock.Anything).
                    Return(&Task{ID: "123"}, nil)
            },
            expectedStatus: 201,
        },
        // More test cases...
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            // Test implementation
        })
    }
}

2. Test Classification Guidelines

Unit Tests (Located in internal/package/*_test.go)

  • Test individual functions and methods in isolation
  • Mock external dependencies (databases, Redis, HTTP clients)
  • Test validation logic, business rules, and error handling
  • Should run fast (< 1 second total)
  • No external service dependencies
// UNIT TEST: Tests validation logic only
func TestLogConfigValidation(t *testing.T) {
    invalidConfig := &LogConfig{
        BufferSize: -1, // Invalid
    }
    
    err := invalidConfig.Validate()
    assert.Error(t, err)
    assert.Contains(t, err.Error(), "buffer_size must be positive")
}

Integration Tests (Located in tests/integration/*_test.go)

  • Test interactions between multiple components
  • Test with real external dependencies (PostgreSQL, Redis, Docker)
  • Test system behavior under failure conditions
  • Use build tag //go:build integration
  • Package declaration: package integration_test
//go:build integration

package integration_test

// INTEGRATION TEST: Tests Redis dependency interaction
func TestLoggingServiceDependencies(t *testing.T) {
    service, err := logging.NewRedisStreamingService(nil, config, logger)
    
    assert.Nil(t, service)
    assert.Error(t, err)
    assert.Contains(t, err.Error(), "redis client is required")
}

Test Organization Rules

  • Unit tests stay co-located with the package they test
  • Integration tests go in tests/integration/
  • Use descriptive test names: TestComponentName_Functionality
  • Group related tests in the same file
  • Use build tags to separate unit from integration tests

3. Integration Tests

// REQUIRED: Integration tests for critical paths
func TestTaskExecution_Integration(t *testing.T) {
    if testing.Short() {
        t.Skip("skipping integration test")
    }

    // Setup test database
    db := setupTestDB(t)
    defer cleanupTestDB(t, db)

    // Setup test containers
    executor := setupTestExecutor(t)
    defer cleanupTestExecutor(t, executor)

    // Test execution flow
    service := NewTaskService(db, executor, logger)

    task, err := service.CreateTask(context.Background(), CreateTaskRequest{
        Name:     "integration-test",
        Language: "python",
        Code:     "print('integration test')",
    })
    require.NoError(t, err)

    err = service.ExecuteTask(context.Background(), task.ID)
    require.NoError(t, err)

    // Verify execution results
    result, err := service.GetTaskResult(context.Background(), task.ID)
    require.NoError(t, err)
    assert.Equal(t, "completed", result.Status)
}

Kubernetes and Infrastructure Standards

1. Resource Specifications

# REQUIRED: All deployments must specify resource limits
apiVersion: apps/v1
kind: Deployment
metadata:
  name: voidrunner-api
spec:
  template:
    spec:
      containers:
        - name: api
          image: voidrunner/api:latest
          resources:
            requests:
              memory: "256Mi"
              cpu: "100m"
            limits:
              memory: "1Gi"
              cpu: "500m"
          # REQUIRED: Security context
          securityContext:
            allowPrivilegeEscalation: false
            runAsNonRoot: true
            runAsUser: 1000
            readOnlyRootFilesystem: true

2. Health Checks

# REQUIRED: All services must have health checks
livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5
  timeoutSeconds: 3
  failureThreshold: 2

Code Review Criteria

Mandatory Checks

  • Security: No hardcoded secrets, proper input validation
  • Error Handling: All errors properly wrapped with context
  • Testing: Unit tests for new functionality, integration tests for critical paths
  • Performance: Database queries optimized, no N+1 problems
  • Logging: Structured logging with appropriate levels
  • Documentation: Public functions and complex logic documented

Performance Requirements

  • API response times: < 200ms for 95% of requests
  • Database queries: < 50ms median response time
  • Container startup: < 5 seconds for cold starts
  • Memory usage: < 1GB per API instance

Security Requirements

  • All user inputs validated and sanitized
  • Container execution with security constraints
  • Secrets managed through Kubernetes secrets
  • No privilege escalation in containers
  • Network policies enforced

Git Workflow and Commit Standards

Branch Naming

  • feature/issue-number-short-description
  • bugfix/issue-number-short-description
  • hotfix/issue-number-short-description

Commit Messages

type(scope): short description

Longer description if needed

Fixes #123

Types: feat, fix, docs, style, refactor, test, chore Scopes: api, frontend, executor, scheduler, k8s, security

Pull Request Requirements

  • All CI checks passing
  • Code coverage maintains > 80%
  • Security scan passes
  • Documentation updated
  • Breaking changes documented

Environment-Specific Configurations

Development

# config/development.yaml
database:
  host: localhost
  port: 5432
  ssl_mode: disable

executor:
  timeout: 30s
  memory_limit: 512Mi

logging:
  level: debug
  format: console

Production

# config/production.yaml
database:
  host: ${DB_HOST}
  port: 5432
  ssl_mode: require

executor:
  timeout: 3600s
  memory_limit: 1Gi

logging:
  level: info
  format: json

Testing

Testing configuration is unified between CI and local environments for consistency:

# Integration test environment variables (used by both CI and local)
TEST_DB_HOST=localhost
TEST_DB_PORT=5432
TEST_DB_USER=testuser
TEST_DB_PASSWORD=testpassword
TEST_DB_NAME=voidrunner_test
TEST_DB_SSLMODE=disable
JWT_SECRET_KEY=test-secret-key-for-integration

Key Principles:

  • Unified Configuration: Same database and JWT settings for CI and local testing
  • Environment Detection: CI=true used only for output formats (SARIF, coverage)
  • Database Independence: Tests automatically skip when database unavailable
  • Consistent Behavior: Integration tests behave identically in both environments

Common Patterns and Anti-Patterns

✅ Preferred Patterns

// Repository pattern with interfaces
type TaskRepository interface {
    CreateTask(ctx context.Context, task *Task) error
    GetTask(ctx context.Context, id string) (*Task, error)
    UpdateTaskStatus(ctx context.Context, id string, status TaskStatus) error
}

// Service layer with dependency injection
type TaskService struct {
    repo TaskRepository
    exec TaskExecutor
}

// Proper context cancellation handling
func (s *Service) LongRunningOperation(ctx context.Context) error {
    for {
        select {
        case <-ctx.Done():
            return ctx.Err()
        default:
            // Continue processing
        }
    }
}

❌ Anti-Patterns to Avoid

// DON'T: Global variables
var GlobalDB *sql.DB

// DON'T: Panic in library code
func ProcessTask(task *Task) {
    if task == nil {
        panic("task is nil") // Use error returns instead
    }
}

// DON'T: Ignoring errors
result, _ := dangerousOperation() // Always handle errors

// DON'T: Magic numbers
time.Sleep(300 * time.Second) // Use named constants

Monitoring and Observability

Metrics

// REQUIRED: Add metrics for all critical operations
var (
    taskExecutionDuration = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name: "voidrunner_task_execution_duration_seconds",
            Help: "Time spent executing tasks",
        },
        []string{"task_type", "status"},
    )
)

func (s *TaskService) ExecuteTask(ctx context.Context, task *Task) error {
    start := time.Now()
    defer func() {
        duration := time.Since(start)
        taskExecutionDuration.WithLabelValues(task.Language, task.Status).Observe(duration.Seconds())
    }()

    return s.executor.Execute(ctx, task)
}

Tracing

// REQUIRED: Add tracing for complex operations
func (s *TaskService) ExecuteTask(ctx context.Context, taskID string) error {
    ctx, span := tracer.Start(ctx, "TaskService.ExecuteTask")
    defer span.End()

    span.SetAttributes(attribute.String("task.id", taskID))

    // Implementation...
}

CLI Commands and Scripts

Development Commands

# Setup development environment
make setup

# Start development server with auto-reload
make dev

# Run tests
make test              # Unit tests (with coverage in CI)
make test-fast         # Fast unit tests (short mode)
make test-integration  # Integration tests
make test-all          # Both unit and integration tests

# Coverage analysis
make coverage          # Generate coverage report
make coverage-check    # Check coverage meets 80% threshold

# Code quality
make fmt               # Format code
make vet               # Run go vet
make lint              # Run linting (with format check in CI)
make security          # Security scan

# Build and run
make build             # Build API server
make run               # Run API server locally

# Documentation
make docs              # Generate API docs
make docs-serve        # Serve docs locally

# Development tools
make install-tools     # Install development tools
make clean             # Clean build artifacts

# Environment Management (Docker Compose)
make dev-up            # Start development environment (DB + Redis + API)
make dev-down          # Stop development environment  
make dev-logs          # Show development logs
make dev-restart       # Restart development environment
make dev-status        # Show development environment status

make prod-up           # Start production environment
make prod-down         # Stop production environment
make prod-logs         # Show production logs
make prod-restart      # Restart production environment
make prod-status       # Show production environment status

make env-status        # Show all environment status
make docker-clean      # Clean Docker resources

# Test services management (PostgreSQL + Redis)
make services-start    # Start test services (Docker)
make services-stop     # Stop test services
make services-reset    # Reset test services (clean slate)

# Database migrations
make migrate-up        # Run database migrations up
make migrate-down      # Run database migrations down (rollback one)
make migrate-reset     # Reset database (rollback all migrations)
make migration name=X  # Create new migration file

# Dependencies and setup
make deps              # Download and tidy dependencies
make deps-update       # Update dependencies
make setup             # Setup complete development environment

Database Operations

# Test services management (implemented)
make services-start    # Start test services containers (PostgreSQL + Redis)
make services-stop     # Stop test services containers
make services-reset    # Reset test services to clean state

# Migration management (implemented)
make migrate-up        # Apply all pending migrations
make migrate-down      # Rollback last migration
make migrate-reset     # Rollback all migrations
make migration name=add_feature  # Create new migration files

# Legacy scripts (planned for Epic 2)
./scripts/backup-db.sh production    # Database backup utility
./scripts/restore-db.sh backup.sql   # Database restore utility

Documentation Standards

API Documentation

  • OpenAPI specifications for all endpoints
  • Include request/response examples
  • Document error codes and meanings
  • Rate limiting information

Code Documentation

// TaskExecutor handles the execution of user-submitted code in secure containers.
// It manages the complete lifecycle from container creation to cleanup.
//
// Example usage:
//   executor := NewDockerExecutor(client, logger)
//   result, err := executor.Execute(ctx, task)
//   if err != nil {
//       return fmt.Errorf("execution failed: %w", err)
//   }
type TaskExecutor interface {
    // Execute runs the given task in a secure container environment.
    // It returns the execution result or an error if execution fails.
    Execute(ctx context.Context, task *Task) (*ExecutionResult, error)
}

Release and Deployment

Version Tagging

  • Use semantic versioning: v1.2.3
  • Tag format: git tag -a v1.2.3 -m "Release v1.2.3"

Deployment Checklist

  • All tests passing
  • Security scan completed
  • Database migrations tested
  • Rollback plan prepared
  • Monitoring dashboards updated
  • Documentation updated

Document Version: 1.1
Last Updated: 2025-07-10
Next Review: 2025-08-10

For questions about these guidelines, please reach out to the technical lead or create an issue in the repository.