Message ID | 20241016-fuse-uring-for-6-10-rfc4-v4-0-9739c753666e@ddn.com (mailing list archive) |
---|---|
Headers | show |
Series | fuse: fuse-over-io-uring | expand |
Please note that this is a preview only to show the current status. V5 should follow soon to separate the headers into its own buffer. I actually hope that this v4 is the last RFC version. Thanks, Bernd
On 2024-10-15 17:05, Bernd Schubert wrote: [...] > > The corresponding libfuse patches are on my uring branch, > but need cleanup for submission - will happen during the next > days. > https://github.com/bsbernd/libfuse/tree/uring > > Testing with that libfuse branch is possible by running something > like: > > example/passthrough_hp -o allow_other --debug-fuse --nopassthrough \ > --uring --uring-per-core-queue --uring-fg-depth=1 --uring-bg-depth=1 \ > /scratch/source /scratch/dest > > With the --debug-fuse option one should see CQE in the request type, > if requests are received via io-uring: > > cqe unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 16, pid: 7060 > unique: 4, result=104 > > Without the --uring option "cqe" is replaced by the default "dev" > > dev unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 7117 > unique: 4, success, outsize: 120 Hi Bernd, I applied this patchset to io_uring-6.12 branch with some minor conflicts. I'm running the following command: $ sudo ./build/example/passthrough_hp -o allow_other --debug-fuse --nopassthrough \ --uring --uring-per-core-queue --uring-fg-depth=1 --uring-bg-depth=1 \ /home/vmuser/scratch/source /home/vmuser/scratch/dest FUSE library version: 3.17.0 Creating ring per-core-queue=1 sync-depth=1 async-depth=1 arglen=1052672 dev unique: 2, opcode: INIT (26), nodeid: 0, insize: 104, pid: 0 INIT: 7.40 flags=0x73fffffb max_readahead=0x00020000 INIT: 7.40 flags=0x4041f429 max_readahead=0x00020000 max_write=0x00100000 max_background=0 congestion_threshold=0 time_gran=1 unique: 2, success, outsize: 80 I created the source and dest folders which are both empty. I see the following in dmesg: [ 2453.197510] uring is disabled [ 2453.198525] uring is disabled [ 2453.198749] uring is disabled ... If I then try to list the directory /home/vmuser/scratch: $ ls -l /home/vmuser/scratch ls: cannot access 'dest': Software caused connection abort And passthrough_hp terminates. My kconfig: CONFIG_FUSE_FS=m CONFIG_FUSE_PASSTHROUGH=y CONFIG_FUSE_IO_URING=y I'll look into it next week but, do you see anything obviously wrong?
Hi David, On 10/21/24 06:06, David Wei wrote: > [You don't often get email from dw@davidwei.uk. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > On 2024-10-15 17:05, Bernd Schubert wrote: > [...] >> ... > Hi Bernd, I applied this patchset to io_uring-6.12 branch with some > minor conflicts. I'm running the following command: > > $ sudo ./build/example/passthrough_hp -o allow_other --debug-fuse --nopassthrough \ > --uring --uring-per-core-queue --uring-fg-depth=1 --uring-bg-depth=1 \ > /home/vmuser/scratch/source /home/vmuser/scratch/dest > FUSE library version: 3.17.0 > Creating ring per-core-queue=1 sync-depth=1 async-depth=1 arglen=1052672 > dev unique: 2, opcode: INIT (26), nodeid: 0, insize: 104, pid: 0 > INIT: 7.40 > flags=0x73fffffb > max_readahead=0x00020000 > INIT: 7.40 > flags=0x4041f429 > max_readahead=0x00020000 > max_write=0x00100000 > max_background=0 > congestion_threshold=0 > time_gran=1 > unique: 2, success, outsize: 80 > > I created the source and dest folders which are both empty. > > I see the following in dmesg: > > [ 2453.197510] uring is disabled > [ 2453.198525] uring is disabled > [ 2453.198749] uring is disabled > ... > > If I then try to list the directory /home/vmuser/scratch: > > $ ls -l /home/vmuser/scratch > ls: cannot access 'dest': Software caused connection abort > > And passthrough_hp terminates. > > My kconfig: > > CONFIG_FUSE_FS=m > CONFIG_FUSE_PASSTHROUGH=y > CONFIG_FUSE_IO_URING=y > > I'll look into it next week but, do you see anything obviously wrong? thanks for testing it! I just pushed a fix to my libfuse branches to avoid the abort for -EOPNOTSUPP. It will gracefully fall back to /dev/fuse IO now. Could you please use the rfcv4 branch, as the plain uring branch will soon get incompatible updates for rfc5? https://github.com/bsbernd/libfuse/tree/uring-for-rfcv4 The short answer to let you enable fuse-io-uring: echo 1 >/sys/module/fuse/parameters/enable_uring (With that the "uring is disabled" should be fixed.) The long answer for Miklos and others IOCTL removal introduced a design issue, as now fuse-client (kernel) does not know if fuse-server/libfuse wants to set up io-uring communication. It is not even possible to forbid FUSE_URING_REQ_FETCH after FUSE_INIT reply, as io-uring is async. What happens is that fuse-client (kernel) receives all FUSE_URING_REQ_FETCH commands only after FUSE_INIT reply. And that although FUSE_URING_REQ_FETCH is send out from libuse *before* replying to FUSE_INIT. I had also added a comment for that into the code. And the other issue is that libfuse now does not know if kernel supports fuse-io-uring. That has some implications - libfuse cannot write at start up time a clear error message like "Kernel does not support fuse-over-io-uring, falling back to /dev/fuse IO" - In the fallback code path one might want to adjust number of libfuse /dev/fuse threads if io-uring is not supported - with io-uring typically one thread might be sufficient - to handle FUSE_INTERRUPT. My suggestion is that we introduce the new FUSE_URING_REQ_REGISTER (or replace FUSE_URING_REQ_FETCH with that) and then wait in fuse-server for completion of that command before sending out FUSE_URING_REQ_FETCH. Thanks, Bernd
On 2024-10-21 04:47, Bernd Schubert wrote: > Hi David, > > On 10/21/24 06:06, David Wei wrote: >> [You don't often get email from dw@davidwei.uk. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >> >> On 2024-10-15 17:05, Bernd Schubert wrote: >> [...] >>> > > ... > >> Hi Bernd, I applied this patchset to io_uring-6.12 branch with some >> minor conflicts. I'm running the following command: >> >> $ sudo ./build/example/passthrough_hp -o allow_other --debug-fuse --nopassthrough \ >> --uring --uring-per-core-queue --uring-fg-depth=1 --uring-bg-depth=1 \ >> /home/vmuser/scratch/source /home/vmuser/scratch/dest >> FUSE library version: 3.17.0 >> Creating ring per-core-queue=1 sync-depth=1 async-depth=1 arglen=1052672 >> dev unique: 2, opcode: INIT (26), nodeid: 0, insize: 104, pid: 0 >> INIT: 7.40 >> flags=0x73fffffb >> max_readahead=0x00020000 >> INIT: 7.40 >> flags=0x4041f429 >> max_readahead=0x00020000 >> max_write=0x00100000 >> max_background=0 >> congestion_threshold=0 >> time_gran=1 >> unique: 2, success, outsize: 80 >> >> I created the source and dest folders which are both empty. >> >> I see the following in dmesg: >> >> [ 2453.197510] uring is disabled >> [ 2453.198525] uring is disabled >> [ 2453.198749] uring is disabled >> ... >> >> If I then try to list the directory /home/vmuser/scratch: >> >> $ ls -l /home/vmuser/scratch >> ls: cannot access 'dest': Software caused connection abort >> >> And passthrough_hp terminates. >> >> My kconfig: >> >> CONFIG_FUSE_FS=m >> CONFIG_FUSE_PASSTHROUGH=y >> CONFIG_FUSE_IO_URING=y >> >> I'll look into it next week but, do you see anything obviously wrong? > > > thanks for testing it! I just pushed a fix to my libfuse branches to > avoid the abort for -EOPNOTSUPP. It will gracefully fall back to > /dev/fuse IO now. > > Could you please use the rfcv4 branch, as the plain uring > branch will soon get incompatible updates for rfc5? > > https://github.com/bsbernd/libfuse/tree/uring-for-rfcv4 > > > The short answer to let you enable fuse-io-uring: > > echo 1 >/sys/module/fuse/parameters/enable_uring > > > (With that the "uring is disabled" should be fixed.) Thanks, using this branch fixed the issue and now I can see the dest folder mirroring that of the source folder. There are two issues I noticed: [63490.068211] ---[ end trace 0000000000000000 ]--- [64010.242963] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:330 [64010.243531] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 11057, name: fuse-ring-1 [64010.244092] preempt_count: 1, expected: 0 [64010.244346] RCU nest depth: 0, expected: 0 [64010.244599] 2 locks held by fuse-ring-1/11057: [64010.244886] #0: ffff888105db20a8 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x900/0xd80 [64010.245476] #1: ffff88810f941818 (&fc->lock){+.+.}-{2:2}, at: fuse_uring_cmd+0x83e/0x1890 [fuse] [64010.246031] CPU: 1 UID: 0 PID: 11057 Comm: fuse-ring-1 Tainted: G W 6.11.0-10089-g0d2090ccdbbe #2 [64010.246655] Tainted: [W]=WARN [64010.246853] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [64010.247542] Call Trace: [64010.247705] <TASK> [64010.247860] dump_stack_lvl+0xb0/0xd0 [64010.248090] __might_resched+0x2f8/0x510 [64010.248338] __kmalloc_cache_noprof+0x2aa/0x390 [64010.248614] ? lockdep_init_map_type+0x2cb/0x7b0 [64010.248923] ? fuse_uring_cmd+0xcc2/0x1890 [fuse] [64010.249215] fuse_uring_cmd+0xcc2/0x1890 [fuse] [64010.249506] io_uring_cmd+0x214/0x500 [64010.249745] io_issue_sqe+0x588/0x1810 [64010.249999] ? __pfx_io_issue_sqe+0x10/0x10 [64010.250254] ? io_alloc_async_data+0x88/0x120 [64010.250516] ? io_alloc_async_data+0x88/0x120 [64010.250811] ? io_uring_cmd_prep+0x2eb/0x9f0 [64010.251103] io_submit_sqes+0x796/0x1f80 [64010.251387] __do_sys_io_uring_enter+0x90a/0xd80 [64010.251696] ? do_user_addr_fault+0x26f/0xb60 [64010.251991] ? __pfx___do_sys_io_uring_enter+0x10/0x10 [64010.252333] ? __up_read+0x3ba/0x750 [64010.252565] ? __pfx___up_read+0x10/0x10 [64010.252868] do_syscall_64+0x68/0x140 [64010.253121] entry_SYSCALL_64_after_hwframe+0x76/0x7e [64010.253444] RIP: 0033:0x7f03a03fb7af [64010.253679] Code: 45 0f b6 90 d0 00 00 00 41 8b b8 cc 00 00 00 45 31 c0 41 b9 08 00 00 00 41 83 e2 01 41 c1 e2 04 41 09 c2 b8 aa 01 00 00 0f 05 <c3> a8 02 74 cc f0 48 83 0c 24 00 49 8b 40 20 8b 00 a8 01 74 bc b8 [64010.254801] RSP: 002b:00007f039f3ffd08 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa [64010.255261] RAX: ffffffffffffffda RBX: 0000561ab7c1ced0 RCX: 00007f03a03fb7af [64010.255695] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000009 [64010.256127] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000008 [64010.256556] R10: 0000000000000000 R11: 0000000000000246 R12: 0000561ab7c1d7a8 [64010.256990] R13: 0000561ab7c1da00 R14: 0000561ab7c1d520 R15: 0000000000000001 [64010.257442] </TASK> If I am already in dest when I do the mount using passthrough_hp and then e.g. ls, it hangs indefinitely even if I kill passthrough_hp.
On 10/21/24 22:57, David Wei wrote: > On 2024-10-21 04:47, Bernd Schubert wrote: >> Hi David, >> >> On 10/21/24 06:06, David Wei wrote: >>> [You don't often get email from dw@davidwei.uk. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >>> >>> On 2024-10-15 17:05, Bernd Schubert wrote: >>> [...] >>>> >> >> ... >> >>> Hi Bernd, I applied this patchset to io_uring-6.12 branch with some >>> minor conflicts. I'm running the following command: >>> >>> $ sudo ./build/example/passthrough_hp -o allow_other --debug-fuse --nopassthrough \ >>> --uring --uring-per-core-queue --uring-fg-depth=1 --uring-bg-depth=1 \ >>> /home/vmuser/scratch/source /home/vmuser/scratch/dest >>> FUSE library version: 3.17.0 >>> Creating ring per-core-queue=1 sync-depth=1 async-depth=1 arglen=1052672 >>> dev unique: 2, opcode: INIT (26), nodeid: 0, insize: 104, pid: 0 >>> INIT: 7.40 >>> flags=0x73fffffb >>> max_readahead=0x00020000 >>> INIT: 7.40 >>> flags=0x4041f429 >>> max_readahead=0x00020000 >>> max_write=0x00100000 >>> max_background=0 >>> congestion_threshold=0 >>> time_gran=1 >>> unique: 2, success, outsize: 80 >>> >>> I created the source and dest folders which are both empty. >>> >>> I see the following in dmesg: >>> >>> [ 2453.197510] uring is disabled >>> [ 2453.198525] uring is disabled >>> [ 2453.198749] uring is disabled >>> ... >>> >>> If I then try to list the directory /home/vmuser/scratch: >>> >>> $ ls -l /home/vmuser/scratch >>> ls: cannot access 'dest': Software caused connection abort >>> >>> And passthrough_hp terminates. >>> >>> My kconfig: >>> >>> CONFIG_FUSE_FS=m >>> CONFIG_FUSE_PASSTHROUGH=y >>> CONFIG_FUSE_IO_URING=y >>> >>> I'll look into it next week but, do you see anything obviously wrong? >> >> >> thanks for testing it! I just pushed a fix to my libfuse branches to >> avoid the abort for -EOPNOTSUPP. It will gracefully fall back to >> /dev/fuse IO now. >> >> Could you please use the rfcv4 branch, as the plain uring >> branch will soon get incompatible updates for rfc5? >> >> https://github.com/bsbernd/libfuse/tree/uring-for-rfcv4 >> >> >> The short answer to let you enable fuse-io-uring: >> >> echo 1 >/sys/module/fuse/parameters/enable_uring >> >> >> (With that the "uring is disabled" should be fixed.) > > Thanks, using this branch fixed the issue and now I can see the dest > folder mirroring that of the source folder. There are two issues I > noticed: > > [63490.068211] ---[ end trace 0000000000000000 ]--- > [64010.242963] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:330 > [64010.243531] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 11057, name: fuse-ring-1 > [64010.244092] preempt_count: 1, expected: 0 > [64010.244346] RCU nest depth: 0, expected: 0 > [64010.244599] 2 locks held by fuse-ring-1/11057: > [64010.244886] #0: ffff888105db20a8 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x900/0xd80 > [64010.245476] #1: ffff88810f941818 (&fc->lock){+.+.}-{2:2}, at: fuse_uring_cmd+0x83e/0x1890 [fuse] > [64010.246031] CPU: 1 UID: 0 PID: 11057 Comm: fuse-ring-1 Tainted: G W 6.11.0-10089-g0d2090ccdbbe #2 > [64010.246655] Tainted: [W]=WARN > [64010.246853] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 > [64010.247542] Call Trace: > [64010.247705] <TASK> > [64010.247860] dump_stack_lvl+0xb0/0xd0 > [64010.248090] __might_resched+0x2f8/0x510 > [64010.248338] __kmalloc_cache_noprof+0x2aa/0x390 > [64010.248614] ? lockdep_init_map_type+0x2cb/0x7b0 > [64010.248923] ? fuse_uring_cmd+0xcc2/0x1890 [fuse] > [64010.249215] fuse_uring_cmd+0xcc2/0x1890 [fuse] > [64010.249506] io_uring_cmd+0x214/0x500 > [64010.249745] io_issue_sqe+0x588/0x1810 > [64010.249999] ? __pfx_io_issue_sqe+0x10/0x10 > [64010.250254] ? io_alloc_async_data+0x88/0x120 > [64010.250516] ? io_alloc_async_data+0x88/0x120 > [64010.250811] ? io_uring_cmd_prep+0x2eb/0x9f0 > [64010.251103] io_submit_sqes+0x796/0x1f80 > [64010.251387] __do_sys_io_uring_enter+0x90a/0xd80 > [64010.251696] ? do_user_addr_fault+0x26f/0xb60 > [64010.251991] ? __pfx___do_sys_io_uring_enter+0x10/0x10 > [64010.252333] ? __up_read+0x3ba/0x750 > [64010.252565] ? __pfx___up_read+0x10/0x10 > [64010.252868] do_syscall_64+0x68/0x140 > [64010.253121] entry_SYSCALL_64_after_hwframe+0x76/0x7e > [64010.253444] RIP: 0033:0x7f03a03fb7af > [64010.253679] Code: 45 0f b6 90 d0 00 00 00 41 8b b8 cc 00 00 00 45 31 c0 41 b9 08 00 00 00 41 83 e2 01 41 c1 e2 04 41 09 c2 b8 aa 01 00 00 0f 05 <c3> a8 02 74 cc f0 48 83 0c 24 00 49 8b 40 20 8b 00 a8 01 74 bc b8 > [64010.254801] RSP: 002b:00007f039f3ffd08 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa > [64010.255261] RAX: ffffffffffffffda RBX: 0000561ab7c1ced0 RCX: 00007f03a03fb7af > [64010.255695] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000009 > [64010.256127] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000008 > [64010.256556] R10: 0000000000000000 R11: 0000000000000246 R12: 0000561ab7c1d7a8 > [64010.256990] R13: 0000561ab7c1da00 R14: 0000561ab7c1d520 R15: 0000000000000001 > [64010.257442] </TASK> Regarding issue one, does this patch solve it? diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c index e518d4379aa1..304919bc12fb 100644 --- a/fs/fuse/dev_uring.c +++ b/fs/fuse/dev_uring.c @@ -168,6 +168,12 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring, queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT); if (!queue) return ERR_PTR(-ENOMEM); + pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL); + if (!pq) { + kfree(queue); + return ERR_PTR(-ENOMEM); + } + spin_lock(&fc->lock); if (ring->queues[qid]) { spin_unlock(&fc->lock); @@ -186,11 +192,6 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring, INIT_LIST_HEAD(&queue->ent_in_userspace); INIT_LIST_HEAD(&queue->fuse_req_queue); - pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL); - if (!pq) { - kfree(queue); - return ERR_PTR(-ENOMEM); - } queue->fpq.processing = pq; fuse_pqueue_init(&queue->fpq); I think we don't need GFP_ATOMIC, but can do allocations before taking the lock. This pq allocation is new in v4 and I forgot to put it into the right place and it slipped through my very basic testing (I'm concentrating on the design changes for now - testing will come back with v6). > > If I am already in dest when I do the mount using passthrough_hp and > then e.g. ls, it hangs indefinitely even if I kill passthrough_hp. I'm going to check in a bit. I hope it is not a recursion issue. Thanks, Bernd
On 10/22/24 12:24, Bernd Schubert wrote: > On 10/21/24 22:57, David Wei wrote: >> If I am already in dest when I do the mount using passthrough_hp and >> then e.g. ls, it hangs indefinitely even if I kill passthrough_hp. > > I'm going to check in a bit. I hope it is not a recursion issue. > Hmm, I cannot reproduce this bernd@squeeze1 dest>pwd /scratch/dest bernd@squeeze1 dest>/home/bernd/src/libfuse/github//build-debian/example/passthrough_hp -o allow_other --nopassthrough --uring --uring-per-core-queue --uring-fg-depth=1 --uring-bg-depth=1 /scratch/source /scratch/dest bernd@squeeze1 dest>ll total 6.4G drwxr-xr-x 2 fusetests fusetests 4.0K Jul 30 17:59 scratch_mnt drwxr-xr-x 2 fusetests fusetests 4.0K Jul 30 17:59 test_dir -rw-r--r-- 1 bernd bernd 50G Sep 12 14:20 testfile -rwxr-xr-x 1 bernd bernd 6.3G Sep 12 14:39 testfile1 Same when running in foreground and doing operations from another console cqe unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 16, pid: 732 unique: 4, result=104 cqe unique: 6, opcode: STATFS (17), nodeid: 1, insize: 0, pid: 732 unique: 6, result=80 In order to check it is not a recursion issue I also switched my VM to one core - still no issue. What is your setup? Also, I'm still on 6.10, I want to send out v5 with separated headers later this week and next week v6 (and maybe without RFC) for 6.12 next week. Thanks, Bernd
On 2024-10-22 05:46, Bernd Schubert wrote: > > > On 10/22/24 12:24, Bernd Schubert wrote: >> On 10/21/24 22:57, David Wei wrote: >>> If I am already in dest when I do the mount using passthrough_hp and >>> then e.g. ls, it hangs indefinitely even if I kill passthrough_hp. >> >> I'm going to check in a bit. I hope it is not a recursion issue. >> > > Hmm, I cannot reproduce this > > bernd@squeeze1 dest>pwd > /scratch/dest > > bernd@squeeze1 dest>/home/bernd/src/libfuse/github//build-debian/example/passthrough_hp -o allow_other --nopassthrough --uring --uring-per-core-queue --uring-fg-depth=1 --uring-bg-depth=1 /scratch/source /scratch/dest > > bernd@squeeze1 dest>ll > total 6.4G > drwxr-xr-x 2 fusetests fusetests 4.0K Jul 30 17:59 scratch_mnt > drwxr-xr-x 2 fusetests fusetests 4.0K Jul 30 17:59 test_dir > -rw-r--r-- 1 bernd bernd 50G Sep 12 14:20 testfile > -rwxr-xr-x 1 bernd bernd 6.3G Sep 12 14:39 testfile1 > > > Same when running in foreground and doing operations from another console > > > cqe unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 16, pid: 732 > unique: 4, result=104 > cqe unique: 6, opcode: STATFS (17), nodeid: 1, insize: 0, pid: 732 > unique: 6, result=80 > > > In order to check it is not a recursion issue I also switched my VM to > one core - still no issue. What is your setup? > Also, I'm still on 6.10, I want to send out v5 with separated headers > later this week and next week v6 (and maybe without RFC) for 6.12 next > week. I tried this again and could not repro anymore. I think your latest libfuse that falls back to /dev/fuse fixed it. Sorry for the noise! > > > Thanks, > Bernd
On 2024-10-22 03:24, Bernd Schubert wrote: > > > On 10/21/24 22:57, David Wei wrote: >> On 2024-10-21 04:47, Bernd Schubert wrote: >>> Hi David, >>> >>> On 10/21/24 06:06, David Wei wrote: >>>> [You don't often get email from dw@davidwei.uk. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >>>> >>>> On 2024-10-15 17:05, Bernd Schubert wrote: >>>> [...] >>>>> >>> >>> ... >>> >>>> Hi Bernd, I applied this patchset to io_uring-6.12 branch with some >>>> minor conflicts. I'm running the following command: >>>> >>>> $ sudo ./build/example/passthrough_hp -o allow_other --debug-fuse --nopassthrough \ >>>> --uring --uring-per-core-queue --uring-fg-depth=1 --uring-bg-depth=1 \ >>>> /home/vmuser/scratch/source /home/vmuser/scratch/dest >>>> FUSE library version: 3.17.0 >>>> Creating ring per-core-queue=1 sync-depth=1 async-depth=1 arglen=1052672 >>>> dev unique: 2, opcode: INIT (26), nodeid: 0, insize: 104, pid: 0 >>>> INIT: 7.40 >>>> flags=0x73fffffb >>>> max_readahead=0x00020000 >>>> INIT: 7.40 >>>> flags=0x4041f429 >>>> max_readahead=0x00020000 >>>> max_write=0x00100000 >>>> max_background=0 >>>> congestion_threshold=0 >>>> time_gran=1 >>>> unique: 2, success, outsize: 80 >>>> >>>> I created the source and dest folders which are both empty. >>>> >>>> I see the following in dmesg: >>>> >>>> [ 2453.197510] uring is disabled >>>> [ 2453.198525] uring is disabled >>>> [ 2453.198749] uring is disabled >>>> ... >>>> >>>> If I then try to list the directory /home/vmuser/scratch: >>>> >>>> $ ls -l /home/vmuser/scratch >>>> ls: cannot access 'dest': Software caused connection abort >>>> >>>> And passthrough_hp terminates. >>>> >>>> My kconfig: >>>> >>>> CONFIG_FUSE_FS=m >>>> CONFIG_FUSE_PASSTHROUGH=y >>>> CONFIG_FUSE_IO_URING=y >>>> >>>> I'll look into it next week but, do you see anything obviously wrong? >>> >>> >>> thanks for testing it! I just pushed a fix to my libfuse branches to >>> avoid the abort for -EOPNOTSUPP. It will gracefully fall back to >>> /dev/fuse IO now. >>> >>> Could you please use the rfcv4 branch, as the plain uring >>> branch will soon get incompatible updates for rfc5? >>> >>> https://github.com/bsbernd/libfuse/tree/uring-for-rfcv4 >>> >>> >>> The short answer to let you enable fuse-io-uring: >>> >>> echo 1 >/sys/module/fuse/parameters/enable_uring >>> >>> >>> (With that the "uring is disabled" should be fixed.) >> >> Thanks, using this branch fixed the issue and now I can see the dest >> folder mirroring that of the source folder. There are two issues I >> noticed: >> >> [63490.068211] ---[ end trace 0000000000000000 ]--- >> [64010.242963] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:330 >> [64010.243531] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 11057, name: fuse-ring-1 >> [64010.244092] preempt_count: 1, expected: 0 >> [64010.244346] RCU nest depth: 0, expected: 0 >> [64010.244599] 2 locks held by fuse-ring-1/11057: >> [64010.244886] #0: ffff888105db20a8 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x900/0xd80 >> [64010.245476] #1: ffff88810f941818 (&fc->lock){+.+.}-{2:2}, at: fuse_uring_cmd+0x83e/0x1890 [fuse] >> [64010.246031] CPU: 1 UID: 0 PID: 11057 Comm: fuse-ring-1 Tainted: G W 6.11.0-10089-g0d2090ccdbbe #2 >> [64010.246655] Tainted: [W]=WARN >> [64010.246853] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 >> [64010.247542] Call Trace: >> [64010.247705] <TASK> >> [64010.247860] dump_stack_lvl+0xb0/0xd0 >> [64010.248090] __might_resched+0x2f8/0x510 >> [64010.248338] __kmalloc_cache_noprof+0x2aa/0x390 >> [64010.248614] ? lockdep_init_map_type+0x2cb/0x7b0 >> [64010.248923] ? fuse_uring_cmd+0xcc2/0x1890 [fuse] >> [64010.249215] fuse_uring_cmd+0xcc2/0x1890 [fuse] >> [64010.249506] io_uring_cmd+0x214/0x500 >> [64010.249745] io_issue_sqe+0x588/0x1810 >> [64010.249999] ? __pfx_io_issue_sqe+0x10/0x10 >> [64010.250254] ? io_alloc_async_data+0x88/0x120 >> [64010.250516] ? io_alloc_async_data+0x88/0x120 >> [64010.250811] ? io_uring_cmd_prep+0x2eb/0x9f0 >> [64010.251103] io_submit_sqes+0x796/0x1f80 >> [64010.251387] __do_sys_io_uring_enter+0x90a/0xd80 >> [64010.251696] ? do_user_addr_fault+0x26f/0xb60 >> [64010.251991] ? __pfx___do_sys_io_uring_enter+0x10/0x10 >> [64010.252333] ? __up_read+0x3ba/0x750 >> [64010.252565] ? __pfx___up_read+0x10/0x10 >> [64010.252868] do_syscall_64+0x68/0x140 >> [64010.253121] entry_SYSCALL_64_after_hwframe+0x76/0x7e >> [64010.253444] RIP: 0033:0x7f03a03fb7af >> [64010.253679] Code: 45 0f b6 90 d0 00 00 00 41 8b b8 cc 00 00 00 45 31 c0 41 b9 08 00 00 00 41 83 e2 01 41 c1 e2 04 41 09 c2 b8 aa 01 00 00 0f 05 <c3> a8 02 74 cc f0 48 83 0c 24 00 49 8b 40 20 8b 00 a8 01 74 bc b8 >> [64010.254801] RSP: 002b:00007f039f3ffd08 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa >> [64010.255261] RAX: ffffffffffffffda RBX: 0000561ab7c1ced0 RCX: 00007f03a03fb7af >> [64010.255695] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000009 >> [64010.256127] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000008 >> [64010.256556] R10: 0000000000000000 R11: 0000000000000246 R12: 0000561ab7c1d7a8 >> [64010.256990] R13: 0000561ab7c1da00 R14: 0000561ab7c1d520 R15: 0000000000000001 >> [64010.257442] </TASK> > > Regarding issue one, does this patch solve it? > > diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c > index e518d4379aa1..304919bc12fb 100644 > --- a/fs/fuse/dev_uring.c > +++ b/fs/fuse/dev_uring.c > @@ -168,6 +168,12 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring, > queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT); > if (!queue) > return ERR_PTR(-ENOMEM); > + pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL); > + if (!pq) { > + kfree(queue); > + return ERR_PTR(-ENOMEM); > + } > + > spin_lock(&fc->lock); > if (ring->queues[qid]) { > spin_unlock(&fc->lock); > @@ -186,11 +192,6 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring, > INIT_LIST_HEAD(&queue->ent_in_userspace); > INIT_LIST_HEAD(&queue->fuse_req_queue); > > - pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL); > - if (!pq) { > - kfree(queue); > - return ERR_PTR(-ENOMEM); > - } > queue->fpq.processing = pq; > fuse_pqueue_init(&queue->fpq); > > > I think we don't need GFP_ATOMIC, but can do allocations before taking > the lock. This pq allocation is new in v4 and I forgot to put it into > the right place and it slipped through my very basic testing (I'm > concentrating on the design changes for now - testing will come back > with v6). Thanks, this patch fixed it for me. > >> >> If I am already in dest when I do the mount using passthrough_hp and >> then e.g. ls, it hangs indefinitely even if I kill passthrough_hp. > > I'm going to check in a bit. I hope it is not a recursion issue. > > > Thanks, > Bernd
On 2024-10-15 17:05, Bernd Schubert wrote: > RFCv1 and RFCv2 have been tested with multiple xfstest runs in a VM > (32 cores) with a kernel that has several debug options > enabled (like KASAN and MSAN). RFCv3 is not that well tested yet. > O_DIRECT is currently not working well with /dev/fuse and > also these patches, a patch has been submitted to fix that (although > the approach is refused) > https://www.spinics.net/lists/linux-fsdevel/msg280028.html Hi Bernd, I applied this patch and the associated libfuse patch at: https://github.com/bsbernd/libfuse/tree/aligned-writes I have a simple Python FUSE client that is still returning EINVAL for write(): with open(sys.argv[1], 'r+b') as f: mmapped_file = mmap.mmap(f.fileno(), 0) shm = shared_memory.SharedMemory(create=True, size=mmapped_file.size()) shm.buf[:mmapped_file.size()] = mmapped_file[:] fd = os.open("/home/vmuser/scratch/dest/out", O_RDWR|O_CREAT|O_DIRECT) with open(fd, 'w+b') as f2: f2.write(bytes(shm.buf)) mmapped_file.close() shm.unlink() shm.close() I'll keep looking at this but letting you know in case it's something obvious again.
Hi David, On 10/23/24 00:10, David Wei wrote: > [You don't often get email from dw@davidwei.uk. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > On 2024-10-15 17:05, Bernd Schubert wrote: >> RFCv1 and RFCv2 have been tested with multiple xfstest runs in a VM >> (32 cores) with a kernel that has several debug options >> enabled (like KASAN and MSAN). RFCv3 is not that well tested yet. >> O_DIRECT is currently not working well with /dev/fuse and >> also these patches, a patch has been submitted to fix that (although >> the approach is refused) >> https://www.spinics.net/lists/linux-fsdevel/msg280028.html > > Hi Bernd, I applied this patch and the associated libfuse patch at: > > https://github.com/bsbernd/libfuse/tree/aligned-writes > > I have a simple Python FUSE client that is still returning EINVAL for > write(): > > with open(sys.argv[1], 'r+b') as f: > mmapped_file = mmap.mmap(f.fileno(), 0) > shm = shared_memory.SharedMemory(create=True, size=mmapped_file.size()) > shm.buf[:mmapped_file.size()] = mmapped_file[:] > fd = os.open("/home/vmuser/scratch/dest/out", O_RDWR|O_CREAT|O_DIRECT) > with open(fd, 'w+b') as f2: > f2.write(bytes(shm.buf)) > mmapped_file.close() > shm.unlink() > shm.close() > > I'll keep looking at this but letting you know in case it's something > obvious again. the 'aligned-writes' libfuse branch would need another kernel patch. Please hold on a little bit, I hope to send out a new version later today or tomorrow that separates headers from payload - alignment is guaranteed. Thanks, Bernd
On 11/4/24 09:24, Bernd Schubert wrote: > Hi David, > > On 10/23/24 00:10, David Wei wrote: >> [You don't often get email from dw@davidwei.uk. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] >> >> On 2024-10-15 17:05, Bernd Schubert wrote: >>> RFCv1 and RFCv2 have been tested with multiple xfstest runs in a VM >>> (32 cores) with a kernel that has several debug options >>> enabled (like KASAN and MSAN). RFCv3 is not that well tested yet. >>> O_DIRECT is currently not working well with /dev/fuse and >>> also these patches, a patch has been submitted to fix that (although >>> the approach is refused) >>> https://www.spinics.net/lists/linux-fsdevel/msg280028.html >> >> Hi Bernd, I applied this patch and the associated libfuse patch at: >> >> https://github.com/bsbernd/libfuse/tree/aligned-writes >> >> I have a simple Python FUSE client that is still returning EINVAL for >> write(): >> >> with open(sys.argv[1], 'r+b') as f: >> mmapped_file = mmap.mmap(f.fileno(), 0) >> shm = shared_memory.SharedMemory(create=True, size=mmapped_file.size()) >> shm.buf[:mmapped_file.size()] = mmapped_file[:] >> fd = os.open("/home/vmuser/scratch/dest/out", O_RDWR|O_CREAT|O_DIRECT) >> with open(fd, 'w+b') as f2: >> f2.write(bytes(shm.buf)) >> mmapped_file.close() >> shm.unlink() >> shm.close() >> >> I'll keep looking at this but letting you know in case it's something >> obvious again. > > the 'aligned-writes' libfuse branch would need another kernel patch. Please > hold on a little bit, I hope to send out a new version later today or > tomorrow that separates headers from payload - alignment is guaranteed. > If you are very brave, you could try out this (sorry, still on 6.10) https://github.com/bsbernd/linux/tree/fuse-uring-for-6.10-rfc5 https://github.com/bsbernd/libfuse/tree/uring Right now #fuse-uring-for-6.10-rfc5 is rather similar to fuse-uring-for-6.10-rfc4, with two additional patches to separate headers from payload. The head commit, which updates fuse-io-uring is going to be rebased into the other commits tomorrow. Also, I just noticed a tear down issue, when the daemon is killed while IO is going on - busy inodes on sb shutdown. Some fuse requests are probably not correctly released, I guess that is also already present on rfcv4. Will look into it in the morning. Thanks, Bernd