Message ID | 31452.1574721589@warthog.procyon.org.uk (mailing list archive) |
---|---|
State | Mainlined |
Commit | 6a965666b7e7475c2f8c8e724703db58b8a8a445 |
Headers | show |
Series | [GIT,PULL] pipe: Notification queue preparation | expand |
The pull request you sent on Mon, 25 Nov 2019 22:39:49 +0000:
> git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/notifications-pipe-prep-20191115
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/6a965666b7e7475c2f8c8e724703db58b8a8a445
Thank you!
On Mon, Nov 25, 2019 at 10:39:49PM +0000, David Howells wrote: > David Howells (12): > pipe: Reduce #inclusion of pipe_fs_i.h > Remove the nr_exclusive argument from __wake_up_sync_key() > Add wake_up_interruptible_sync_poll_locked() > pipe: Use head and tail pointers for the ring, not cursor and length This commit (8cefc107ca54c8b06438b7dc9) causes hangs of 'btrfs send' commands, that's eg. inside fstests or in btrfs-progs testsuite. The 'send' uses pipe/splice to get data from kernel to userspace. The test that reliably worked for me leaves the process hanging with this stacktrace (no CPU or IO activity): [<0>] pipe_write+0x1be/0x4b0 [<0>] new_sync_write+0x11e/0x1c0 [<0>] vfs_write+0xc1/0x1d0 [<0>] kernel_write+0x2c/0x40 [<0>] send_cmd+0x78/0xf0 [btrfs] [<0>] send_extent_data+0x4b1/0x52c [btrfs] [<0>] process_extent+0xe46/0xe9d [btrfs] [<0>] changed_cb+0xcf5/0xd2f [btrfs] [<0>] send_subvol+0x8ad/0xc0b [btrfs] [<0>] btrfs_ioctl_send+0xe50/0xf30 [btrfs] [<0>] _btrfs_ioctl_send+0x80/0x110 [btrfs] [<0>] btrfs_ioctl+0x18e1/0x3450 [btrfs] [<0>] do_vfs_ioctl+0xa5/0x710 [<0>] ksys_ioctl+0x70/0x80 [<0>] __x64_sys_ioctl+0x16/0x20 [<0>] do_syscall_64+0x56/0x220 [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe I've bisected that to the mentioned commit, using test in btrfs-progs testsuite with command "make TEST=016\* test-misc". Normally the test can run up to 10 seconds, in the buggy case it stays there. I can help testing fixes or gathering more information, please let me know. Full bisect log: # bad: [3c0edea9b29f9be6c093f236f762202b30ac9431] pipe: Remove sync on wake_ups # good: [da0c9ea146cbe92b832f1b0f694840ea8eb33cce] Linux 5.4-rc2 git bisect start '3c0edea9b29f9be6c093f236f762202b30ac9431' 'd055b4fb4d165b06d912e7f846610d120c3bb9fb^' # bad: [b667b867344301e24f21d4a4c844675ff61d89e1] pipe: Advance tail pointer inside of wait spinlock in pipe_read() git bisect bad b667b867344301e24f21d4a4c844675ff61d89e1 # good: [f94df9890e98f2090c6a8d70c795134863b70201] Add wake_up_interruptible_sync_poll_locked() git bisect good f94df9890e98f2090c6a8d70c795134863b70201 # bad: [6718b6f855a0b4962d54bd625be2718cb820cec6] pipe: Allow pipes to have kernel-reserved slots git bisect bad 6718b6f855a0b4962d54bd625be2718cb820cec6 # bad: [8cefc107ca54c8b06438b7dc9cc08bc0a11d5b98] pipe: Use head and tail pointers for the ring, not cursor and length git bisect bad 8cefc107ca54c8b06438b7dc9cc08bc0a11d5b98 # first bad commit: [8cefc107ca54c8b06438b7dc9cc08bc0a11d5b98] pipe: Use head and tail pointers for the ring, not cursor and length > pipe: Allow pipes to have kernel-reserved slots > pipe: Advance tail pointer inside of wait spinlock in pipe_read() > pipe: Conditionalise wakeup in pipe_read() > pipe: Rearrange sequence in pipe_write() to preallocate slot > pipe: Remove redundant wakeup from pipe_write() > pipe: Check for ring full inside of the spinlock in pipe_write() > pipe: Increase the writer-wakeup threshold to reduce context-switch count > pipe: Remove sync on wake_ups
David Sterba <dsterba@suse.cz> wrote:
> [<0>] pipe_write+0x1be/0x4b0
Can you get me a line number of that? Assuming you've built with -g, load
vmlinux into gdb and do "i li pipe_write+0x1be".
David
On Thu, Dec 5, 2019 at 5:57 AM David Howells <dhowells@redhat.com> wrote: > > David Sterba <dsterba@suse.cz> wrote: > > > [<0>] pipe_write+0x1be/0x4b0 > > Can you get me a line number of that? Assuming you've built with -g, load > vmlinux into gdb and do "i li pipe_write+0x1be". If the kernel is built with debug info (which you need for the gdb command anyway), it's much better to just use ./scripts/decode_stacktrace.sh which gives all the information for the whole backtrace. It would be interesting to hear if somebody else is waiting on the read side too. Linus
On Thu, Dec 05, 2019 at 01:56:57PM +0000, David Howells wrote: > David Sterba <dsterba@suse.cz> wrote: > > > [<0>] pipe_write+0x1be/0x4b0 > > Can you get me a line number of that? Assuming you've built with -g, load > vmlinux into gdb and do "i li pipe_write+0x1be". I built it with -g (DEBUG_INFO) but there's no output for the command (gdb 8.2): (gdb) i li pipe_write+0x1be Function "pipe_write+0x1be" not defined. But the address can tell something: (gdb) l *(pipe_write+0x1be) 0xffffffff81390b8e is in pipe_write (fs/pipe.c:509). warning: Source file is more recent than executable. 504 kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN); 505 do_wakeup = 0; 506 } 507 pipe->waiting_writers++; 508 pipe_wait(pipe); 509 pipe->waiting_writers--; 510 } 511 out: 512 __pipe_unlock(pipe); 513 if (do_wakeup) { I rerun the test again (with a different address where it's stuck), there's nothing better I can get from the debug info, it always points to pipe_wait, disassembly points to: 0xffffffff81390b71 <+417>: jne 0xffffffff81390c23 <pipe_write+595> 0xffffffff81390b77 <+423>: test %ecx,%ecx 0xffffffff81390b79 <+425>: jne 0xffffffff81390b95 <pipe_write+453> 0xffffffff81390b7b <+427>: addl $0x1,0x110(%rbx) 0xffffffff81390b82 <+434>: mov %rbx,%rdi 0xffffffff81390b85 <+437>: callq 0xffffffff813908c0 <pipe_wait> 0xffffffff81390b8a <+442>: subl $0x1,0x110(%rbx) (pipe_write+0x1ba == 0xffffffff81390b8a)
On Thu, Dec 5, 2019 at 9:12 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > It would be interesting to hear if somebody else is waiting on the > read side too. Looking once more at that commit, I find at least one bug, but I'm pretty sure that's not the cause of this problem. DavidH, watch out for things like this: - for (idx = 0; idx < pipe->nrbufs && rem < len; idx++) + for (idx = tail; idx < head && rem < len; idx++) which is a completely buggy conversion. You can't compare tail and head with an inequality, it gets the wraparound case wrong. But I found only one of those, and it's fuse-specific, plus the overflow would take a long time to trigger, so I'm pretty sure this isn't what DavidS is reporting. Linus
I've just posted a couple of patches - can you check to see if they fix your problem? https://lore.kernel.org/linux-fsdevel/157556649610.20869.8537079649495343567.stgit@warthog.procyon.org.uk/T/#t Thanks, David
On Thu, Dec 5, 2019 at 9:22 AM David Sterba <dsterba@suse.cz> wrote: > > I rerun the test again (with a different address where it's stuck), there's > nothing better I can get from the debug info, it always points to pipe_wait, > disassembly points to. Hah. I see another bug. "pipe_wait()" depends on the fact that all events that wake it up happen with the pipe lock held. But we do some of the "do_wakeup()" handling outside the pipe lock now on the reader side __pipe_unlock(pipe); /* Signal writers asynchronously that there is more room. */ if (do_wakeup) { wake_up_interruptible_poll(&pipe->wait, EPOLLOUT | EPOLLWRNORM); kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT); } However, that isn't new to this series _either_, so I don't think that's it. It does wake up things inside the lock _too_ if it ended up emptying a whole buffer. So it could be triggered by timing and behavior changes, but I doubt this pipe_wait() thing is it either. The fact that it bisects to the thing that changes things to use head/tail pointers makes me think there's some other incorrect update or comparison somewhere. That said, "pipe_wait()" is an abomination. It should use a proper wait condition and use wait_event(), but the code predates all of that. I suspect pipe_wait() goes back to the dark ages with the BKL and no actual races between kernel code. Linus
On Thu, Dec 05, 2019 at 05:25:20PM +0000, David Howells wrote: > I've just posted a couple of patches - can you check to see if they fix your > problem? > > https://lore.kernel.org/linux-fsdevel/157556649610.20869.8537079649495343567.stgit@warthog.procyon.org.uk/T/#t Not fixed, the test still hangs with the same call stack.