Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files over 128k not of size (N * 128k) create a "file too large for filesystem" error when using rdma #595

Open
consp opened this issue Jul 30, 2023 · 17 comments

Comments

@consp
Copy link

consp commented Jul 30, 2023

It has been a while since I used rdma since I switched to windows 11, trying to get it working again.

Issue

See title, files with offsets other than blocks of 128k will not transfer properly.

Setup

Version of kmod: git master
Version of tools: git master
kernel: 6.4.7-xanmod with patch for mellanox adapter to be recognised as rdma device
Changes from default: 8 rdma connection instead of 2
Adapter: mellanox ConnectX-4 EN 100gbit, winof2 on windows, inbox on linux
Client OS: Windows 11 for workstations
Server OS: Debian bookworm

Expected behaviour

Files copy normally

To reproduce

dd if=/dev/urandom of=127k.bin bs=1024 count=127
dd if=/dev/urandom of=128k.bin bs=1024 count=128
dd if=/dev/urandom of=129k.bin bs=1024 count=129

copy files from ksmbd location (linux, zfs) to local storage (windows, ntfs)

Result

image

Also tried from ext4 partition: same problem to avoid zfs being the issue.

Afterwards I created a 256k file since my testfile of 8gb copied fine:

dd if=/dev/urandom of=256k.bin bs=1024 count=256

Which worked. (not included in logs)

It look like there is some issue with non 128k sized files (offset issue?). The part which stands out:

[48394.612054] ksmbd: smb_direct: RDMA write, len 0x20000, needed credits 0x1
[48394.612572] ksmbd: filename 129k.bin, offset 131072, len 131072
[48394.614189] ksmbd: nbytes 1024, offset 132096 mincount 0
[48394.614585] ksmbd: Failed to process 8 [-22]

Attached are the logs (ksmbd.control -d all), somewhat sanitized

Additional notes:
Sometimes windows acts like it is transfering the file twice.
result.log

Note: verified it's actually RDMA, RSS works fine.

@consp
Copy link
Author

consp commented Jul 30, 2023

Did some digging but I'm now in over my head:

Error -EINVAL originates here:

 static int smb_direct_rdma_xmit(struct smb_direct_transport *t,
                 void *buf, int buf_len,
                 struct smb2_buffer_desc_v1 *desc,
                 unsigned int desc_len,
                 bool is_read)
 {
...
some code
....
for (i = 0; i < desc_len / sizeof(*desc); i++) {
         desc_buf_len = le32_to_cpu(desc[i].length);

         credits_needed += calc_rw_credits(t, desc_buf, desc_buf_len);
         desc_buf += desc_buf_len;
         total_length += desc_buf_len;
         if (desc_buf_len == 0 || total_length > buf_len ||
             total_length > t->max_rdma_rw_size)
         {
             ksmbd_debug(RDMA, "TEST RDMA %08X %08X %08X %08X", desc_buf_len, total_length, buf_len, t->max_rdma_rw_size);
             return -EINVAL;
         }
     }

Logging is my addition.

Is where it errors, seems total_length > buf_len triggers the exit. (hex since I do now know the value representation)
ksmbd: smb_direct: TEST RDMA 00020000 00020000 00000400 01000000

@namjaejeon
Copy link
Member

Thanks for your report! I will try to reproduce it.

@consp
Copy link
Author

consp commented Jul 31, 2023

I tried to understand it a bit and this seems to work. I'm not completely sure it is correct but transfering a large sum of files with many different small filesizes seems to work now, it looks like the read descriptor being set to 128k is the issue: (update to match pr)

diff --git a/smb2pdu.c b/smb2pdu.c
index 591d899..e9749fc 100644
--- a/smb2pdu.c
+++ b/smb2pdu.c
@@ -6792,7 +6792,13 @@ static ssize_t smb2_read_rdma_channel(struct ksmbd_work *work,
                                      size_t length)
 {
        int err;
+    struct smb2_buffer_desc_v1 *desc;

+    /* set descriptor lenght to nbytes if less than request size */
+    if (length < req->Length) {
+        desc = (struct smb2_buffer_desc_v1 *)((char *)req + le16_to_cpu(req->ReadChannelInfoOffset));
+        desc->length = cpu_to_le32(length);
+    }
        err = ksmbd_conn_rdma_write(work->conn, data_buf, length,
                                    (struct smb2_buffer_desc_v1 *)
                                    ((char *)req + le16_to_cpu(req->ReadChannelInfoOffset)),

Tested with (all seems to do fine):

#! /bin/bash
for n in {1..1000}; do
    dd if=/dev/zero of=file$( printf %03d "$n" ).bin bs=1024 count=$(( 128 + RANDOM / 1024 ))
done

Result master branch:

image

Result fix I made later (better place to do it):

image

@namjaejeon
Copy link
Member

@consp Hm.. I can not reproduce it with windows 10 workstation. In my case, multichannel -> RDMA is changed in the middle of copying while a lot of data is being copied. It seems that Windows cannot transfer such small files with RDMA. How did you get RDMA to transfer even for small files?

@namjaejeon
Copy link
Member

@consp I have added the patch for this issue? Can you review and verify it(namjaejeon@e255ebe) ?

git clone https://github.com/namjaejeon/ksmbd --branch=rdma

@namjaejeon
Copy link
Member

@consp ping ?

@nitroxis
Copy link

nitroxis commented Nov 8, 2023

Hi, I have the same issue. I can't clone your branch though, it says it doesn't exist. Did you delete it, or did you merge the fix into the master branch?

@nitroxis
Copy link

nitroxis commented Nov 8, 2023

I don't know if it's related but ksmbd is also spamming these lines into the kernel log non-stop:

...
[27547.422677] ksmbd: Invalid Buffer Size Requested
[27547.429593] ksmbd: Invalid Buffer Size Requested
[27547.435637] ksmbd: Invalid Buffer Size Requested
...

I'm using Windows 10 and Linux 6.6.0 ksmbd module over RDMA (ConnectX-3)

@nitroxis
Copy link

nitroxis commented Nov 8, 2023

I've tried with the master branch of https://github.com/namjaejeon/ksmbd, I still get these errors. Copying it from Windows onto Linux/ksmbd worked, but moving or renaming the file still produces "file too large" errors:
image

@namjaejeon
Copy link
Member

@nitroxis Can you check the below branch if problem is still happening ?

git clone https://github.com/consp/ksmbd --branch=reduce-descriptor-length

@namjaejeon
Copy link
Member

@nitroxis ping?

@nitroxis
Copy link

Hi, sorry for the late response. I'm unable to compile that branch against kernel 6.6.0 due to various errors:

make -C /lib/modules/6.6.0-arch1-1-custom/build M=/home/nitroxis/src/ksmbd2 modules
make[1]: Entering directory '/usr/lib/modules/6.6.0-arch1-1-custom/build'
  CC [M]  /home/nitroxis/src/ksmbd2/vfs.o
/home/nitroxis/src/ksmbd2/vfs.c: In function ‘ksmbd_vfs_fill_dentry_attrs’:
/home/nitroxis/src/ksmbd2/vfs.c:3000:33: warning: passing argument 2 of ‘generic_fillattr’ makes integer from pointer without a cast [-Wint-conversion]
 3000 |         generic_fillattr(idmap, d_inode(dentry), ksmbd_kstat->kstat);
      |                                 ^~~~~~~~~~~~~~~
      |                                 |
      |                                 struct inode *
In file included from /home/nitroxis/src/ksmbd2/vfs.c:8:
./include/linux/fs.h:3016:43: note: expected ‘u32’ {aka ‘unsigned int’} but argument is of type ‘struct inode *’
 3016 | void generic_fillattr(struct mnt_idmap *, u32, struct inode *, struct kstat *);
      |                                           ^~~
/home/nitroxis/src/ksmbd2/vfs.c:3000:61: error: passing argument 3 of ‘generic_fillattr’ from incompatible pointer type [-Werror=incompatible-pointer-types]
 3000 |         generic_fillattr(idmap, d_inode(dentry), ksmbd_kstat->kstat);
      |                                                  ~~~~~~~~~~~^~~~~~~
      |                                                             |
      |                                                             struct kstat *
./include/linux/fs.h:3016:48: note: expected ‘struct inode *’ but argument is of type ‘struct kstat *’
 3016 | void generic_fillattr(struct mnt_idmap *, u32, struct inode *, struct kstat *);
      |                                                ^~~~~~~~~~~~~~
/home/nitroxis/src/ksmbd2/vfs.c:3000:9: error: too few arguments to function ‘generic_fillattr’
 3000 |         generic_fillattr(idmap, d_inode(dentry), ksmbd_kstat->kstat);
      |         ^~~~~~~~~~~~~~~~
./include/linux/fs.h:3016:6: note: declared here
 3016 | void generic_fillattr(struct mnt_idmap *, u32, struct inode *, struct kstat *);
      |      ^~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
make[3]: *** [scripts/Makefile.build:243: /home/nitroxis/src/ksmbd2/vfs.o] Error 1
make[2]: *** [/usr/lib/modules/6.6.0-arch1-1-custom/build/Makefile:1913: /home/nitroxis/src/ksmbd2] Error 2
make[1]: *** [Makefile:234: __sub-make] Error 2
make[1]: Leaving directory '/usr/lib/modules/6.6.0-arch1-1-custom/build'
make: *** [Makefile:47: all] Error 2

@nitroxis
Copy link

Nevermind, I merged with cifsd-team/ksmbd master, that compiled successfully. However, loading the new module still leads to the same Windows error message as before.

@nitroxis
Copy link

nitroxis commented Nov 16, 2023

I am not sure if it's related to this issue, but opening files with some programs also leads to errors. For example, opening a small 1kB file in HxD gives me the following error:
image
Opening text files with VSCode simply shows an empty file. I guess it too encounters an error somewhere, but doesn't propagate it properly in the UI.
Playing a video file with mpv or opening an image with XnView on the other hand works fine, so I'm guessing it's only a specific function/request that fails, which is only triggered by some programs but not by others.

This also happens on master branch, not just the reduce-descriptor-length branch

@nitroxis
Copy link

Now VSCode actually displayed the following error:
image

@namjaejeon
Copy link
Member

@nitroxis Can you give me wireshark dump(or tcpdump) on problem situation ? Acutally I didn't reproduce it with windows client before. Once, want to check packets that you will give on your reproduction environment.

@namjaejeon
Copy link
Member

@nitroxis please give also me debug prints after enabling RDMA log(sudo ksmbd.control -d rdma).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants