Skip to content
This repository was archived by the owner on Nov 23, 2020. It is now read-only.

add basic SFTP support#28

Open
mvoelske wants to merge 1 commit intoyuvipanda:masterfrom
mvoelske:sftp
Open

add basic SFTP support#28
mvoelske wants to merge 1 commit intoyuvipanda:masterfrom
mvoelske:sftp

Conversation

@mvoelske
Copy link

@mvoelske mvoelske commented May 2, 2020

See #26

This adds basic support for SFTP by delegating to an OpenSSH sftp-server subsystem running
in the user pod. Kind of work-in-progress: Getting this to work with asyncssh is a little ugly at the moment, and involves monkey-patching a function in the asyncssh code. SFTP file transfers into and out of the user pod work fine in my initial trials so far, but are a bit slow (<10 MB/s over a gigabit line), maybe due to the rather verbose logging. SCP doesn't work, yet -- if you simply set allow_scp=True, the SCP file transfers end up in the kubessh pod instead of the user pod. Also not 100% sure if downloading the sftp-server binary from github during the docker build is the best way to get it in there. Comments welcome :)

add support for SFTP by delegating to an OpenSSH sftp-server subsystem running
in the user pod. Doesn't support SCP, yet.
Copy link
Owner

@yuvipanda yuvipanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is beautiful :) I've left a couple comments about approaches...

How do you think we can test this? I think the project is now at a stage where tests are becoming necessary...

Either way, this is amazing work! I'm very excited to land this!

import asyncssh.stream
import asyncssh.sftp

def run_override(sftp_server, reader, writer):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can turn this into a function we explicitly call somewhere, instead of automatically running on import? I don't have much experience with monkey patching, however - so if you think this is the best way to do this, I'm ok with that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to have this in a function, probably better to make it explicit

parent=self.parent, namespace=self.namespace,
username=self.username).pod_name
# ensure sftp-server binary is copied into the user pod
self._run_setup_command(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of running kubectl cp, how about:

  1. We create a small container image that just has sftp-binary
  2. Set up an emptyDir volume by default in our launched pod
  3. Run an initContainer with our sftp-binary image, where we copy the binary to the emptyDir.
  4. When running kubectl exec, we just refer to the binary path in the emptyDir volume, which we will know.

This has a few advantages over kubectl cp:

  1. sftp-binary image is pulled only once per node, and should be pretty small
  2. The initContainer should execute pretty quickly, since the image is already there
  3. We can put other binaries (or whole packages) we want in the emptyDir volume later if we need
  4. SFTP can begin immediately, without us needing to copy a binary in.

What do you think of this suggestion?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! Plus, it removes the need to have tar available in the user image, as would be needed for kubectl cp to work.

config=True
)

def __init__(self, server, reader, writer):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think you can write docstrings for each of these methods? Would be very helpful in deocoding what exactly is happening here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do!


def _run_setup_command(self, *kubectl_command):
self.logger.info(f'executing: {kubectl_command}')
cmd = subprocess.Popen(kubectl_command)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be async? I think .wait() is blocking, so this would cause the entire server (not just this connection) to block until the request completes. Also I'm slightly confused about us starting the process and immediately waiting for it to terminate. Can you expand a little on what that does?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this should probably be async. It only executes one-off setup commands such as copying the sftp binary into the pod, and making sure it has execute permissions, that's why the immediate terminate. With the volume-based solution you suggested, none of these steps might be needed anymore, though :)

@mvoelske
Copy link
Author

mvoelske commented May 4, 2020

Regarding testing: I've kept the SFTP-related code in the superclass that doesn't know about kubernetes yet, because I thought at least that part should be fairly straightforward to write an automated test for. Will include that, too!

@yuvipanda
Copy link
Owner

Awesome! I'm really excited by where this can go!

@yuvipanda
Copy link
Owner

Hello! Thank you for your interest :) I'm currently focusing on jupyterhub-ssh instead, so I've marked this project as inactive. This PR was great, and I'm very grateful for your work here. I've SFTP working there, but with much more contraints - https://github.com/yuvipanda/jupyterhub-ssh/tree/main/jupyterhub-sftp. Hope you'll participate there too.

Thank you! <3

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants