Skip to content

Replace gcsfuse with s3fs-fuse for AWS #25

@UJ2202

Description

@UJ2202

User Story

As a CMBCluster user on AWS, I want seamless S3 storage mounting so that my research environments can access persistent storage with the same functionality as GCP Cloud Storage.

Description

Replace the current gcsfuse implementation with s3fs-fuse to enable S3 bucket mounting in user environments when deployed on AWS, maintaining feature parity with the GCP storage experience.

Current GCS Implementation Analysis

Based on codebase analysis:

  • Setup Script: scripts/setup-cluster.sh enables GcsFuseCsiDriver addon
  • Storage Integration: Used for mounting Cloud Storage buckets in user environments
  • User Experience: Transparent file system access to cloud storage
  • Configuration: Automated through Kubernetes CSI driver

AWS S3 Equivalent Requirements

  • S3 CSI Driver: Replace GcsFuseCsiDriver with s3fs-fuse integration
  • Bucket Mounting: Mount S3 buckets as filesystems in user pods
  • Authentication: Use IRSA (IAM Roles for Service Accounts) instead of Workload Identity
  • Performance: Maintain similar performance characteristics
  • Compatibility: Ensure existing user workflows continue to work

Technical Implementation

  1. CSI Driver Replacement

    • Remove GcsFuseCsiDriver dependency in AWS deployment
    • Implement s3fs-fuse mounting solution
    • Configure appropriate IAM permissions for S3 access
  2. Storage Class Updates

    • Create AWS-specific storage classes
    • Update volume provisioning for S3 integration
    • Maintain compatibility with existing PVC patterns
  3. Pod Configuration

    • Update user environment pod templates for s3fs-fuse
    • Configure S3 credentials and access patterns
    • Ensure proper filesystem permissions and security

Acceptance Criteria

  • s3fs-fuse successfully mounts S3 buckets in user environments
  • User environments can read/write to S3 storage transparently
  • Performance is comparable to gcsfuse implementation
  • Authentication works correctly with IRSA
  • Storage management APIs work with S3 buckets
  • Existing user workflows remain functional
  • Proper error handling and logging implemented
  • Documentation covers S3 storage configuration

Key Technical Differences

  • Authentication: IRSA vs Workload Identity
  • Mount Process: s3fs-fuse vs gcsfuse mounting
  • Performance: Different caching and performance characteristics
  • Configuration: S3-specific mount options and parameters

Files to Modify

  • AWS deployment scripts for s3fs-fuse setup
  • Helm charts with AWS-specific storage configuration
  • User environment pod templates
  • Storage management backend code for S3 API integration
  • Documentation for AWS storage setup

Testing Requirements

  • Functional testing of S3 bucket mounting
  • Performance comparison with GCS implementation
  • User environment compatibility testing
  • Error handling and edge case validation
  • Integration testing with CMBCluster storage APIs

Related to

Epic #22 - Multi-Cloud Support

Definition of Done

  • S3 storage mounting works reliably in user environments
  • Feature parity with GCS storage is achieved
  • Performance meets user expectations
  • Integration is seamless for end users
  • Documentation is complete and accurate

Metadata

Metadata

Assignees

No one assigned

    Labels

    awsAmazon Web Services relatedinfrastructureInfrastructure and deployment issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions