We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi, I'm trying to run a simple "pytorch tensor add" on GPU under nsjail on a GCP nvidia-tesla-t4 node and i'm getting the following error.
nvidia-tesla-t4
nsjail_pytorch.cfg
mount { src: "/home/current_user_ldap/pytorch_env" dst: "/home/current_user_ldap/pytorch_env" is_bind: true } mount { src: "/dev/nvidia0" dst: "/dev/nvidia0" is_bind: true rw: true } mount { src: "/dev/nvidiactl" dst: "/dev/nvidiactl" is_bind: true rw: true } mount { src: "/dev/nvidia-uvm" dst: "/dev/nvidia-uvm" is_bind: true rw: true } mount { src: "/usr" dst: "/usr" is_bind: true rw: true } # for libs mount { src: "/lib64" dst: "/lib64" is_bind: true } mount { src: "/lib" dst: "/lib" is_bind: true rw: true } cwd: "/home/current_user_ldap/pytorch_env/"
nsjail -Mo --chroot / --rlimit_nproc 6553 --rlimit_fsize inf --rlimit_as inf -- /usr/bin/python3 -c "import torch; a = torch.tensor([1.0, 2.0], device='cpu') + torch.tensor([3.0, 4.0], device='cpu'); print(a)"
This prints the expected tensor output of [4, 6]
nsjail -Mo --config nsjail_pytorch.cfg --chroot / --rlimit_nproc 6553 --rlimit_fsize inf --rlimit_as inf -- /usr/bin/python3 -c "import torch; print(torch.cuda.is_available());"
[I][2024-08-10T02:03:04+0000] Mode: STANDALONE_ONCE [I][2024-08-10T02:03:04+0000] Jail parameters: hostname:'NSJAIL', chroot:'/', process:'/usr/bin/python3', bind:[::]:0, max_conns:0, max_conns_per_ip:0, time_limit:600, personality:0, daemonize:false, clone_newnet:true, clone_newuser:true, clone_newns:true, clone_newpid:true, clone_newipc:true, clone_newuts:true, clone_newcgroup:true, clone_newtime:false, keep_caps:false, disable_no_new_privs:false, max_cpus:0 [I][2024-08-10T02:03:04+0000] Mount: '/' -> '/' flags:MS_RDONLY|MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:true [I][2024-08-10T02:03:04+0000] Mount: '/home/current_user_ldap/pytorch_env' -> '/home/current_user_ldap/pytorch_env' flags:MS_RDONLY|MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:true [I][2024-08-10T02:03:04+0000] Mount: '/dev/nvidia0' -> '/dev/nvidia0' flags:MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:false [I][2024-08-10T02:03:04+0000] Mount: '/dev/nvidiactl' -> '/dev/nvidiactl' flags:MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:false [I][2024-08-10T02:03:04+0000] Mount: '/dev/nvidia-uvm' -> '/dev/nvidia-uvm' flags:MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:false [I][2024-08-10T02:03:04+0000] Mount: '/usr' -> '/usr' flags:MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:true [I][2024-08-10T02:03:04+0000] Mount: '/lib64' -> '/lib64' flags:MS_RDONLY|MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:true [I][2024-08-10T02:03:04+0000] Mount: '/lib' -> '/lib' flags:MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:true [I][2024-08-10T02:03:04+0000] Uid map: inside_uid:1002 outside_uid:1002 count:1 newuidmap:false [I][2024-08-10T02:03:04+0000] Gid map: inside_gid:1003 outside_gid:1003 count:1 newgidmap:false [I][2024-08-10T02:03:06+0000] Executing '/usr/bin/python3' for '[STANDALONE MODE]' /home/current_user_ldap/.local/lib/python3.9/site-packages/torch/cuda/__init__.py:128: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 304: OS call failed or operation not supported on this OS (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.) return torch._C._cuda_getDeviceCount() > 0 False [I][2024-08-10T02:03:08+0000] pid=28434 ([STANDALONE MODE]) exited with status: 0, (PIDs left: 0)
nsjail -Mo --config nsjail_pytorch.cfg --chroot / --rlimit_nproc 6553 --rlimit_as inf -- /bin/nvidia-smi
The above prints, the actual nvidia-smi output successfully.
This doesn't look like pytorch or the host issue provided pytorch works on GPU without nsjail. Any help appreciated.
The text was updated successfully, but these errors were encountered:
Hi!
I was wondering if you ever figured this out? Running into this issue myself.
Sorry, something went wrong.
Can't vouch for whether or not this works with pytorch as I'm using tensorflow myself but I was able to get things working by adding
clone_newnet: false clone_newuser: false clone_newns: false clone_newpid: false clone_newipc: false clone_newuts: false clone_newcgroup: false
to my nsjail.cfg file.
Source: #232 (comment)
No branches or pull requests
Hi, I'm trying to run a simple "pytorch tensor add" on GPU under nsjail on a GCP
nvidia-tesla-t4
node and i'm getting the following error.nsjail_pytorch.cfg
Running simple PyTorch Tensor Add on CPU works.
This prints the expected tensor output of [4, 6]
Running simple PyTorch Tensor Add on GPU fails
NVIDIA-SMI runs fine under nsjail
The above prints, the actual nvidia-smi output successfully.
Notes
This doesn't look like pytorch or the host issue provided pytorch works on GPU without nsjail. Any help appreciated.
The text was updated successfully, but these errors were encountered: