Message ID | 20200217184613.19668-1-willy@infradead.org (mailing list archive) |
---|---|
Headers | show |
Series | Change readahead API | expand |
On Mon, Feb 17, 2020 at 10:45:41AM -0800, Matthew Wilcox wrote:
> This series adds a readahead address_space operation to eventually
*sigh*. Clearly I forgot to rm -rf an earlier version. Please disregard
any patches labelled n/16. I can send a v7 if this is too much hassle.
On Mon, Feb 17, 2020 at 10:45:41AM -0800, Matthew Wilcox wrote: > From: "Matthew Wilcox (Oracle)" <willy@infradead.org> > > This series adds a readahead address_space operation to eventually > replace the readpages operation. The key difference is that > pages are added to the page cache as they are allocated (and > then looked up by the filesystem) instead of passing them on a > list to the readpages operation and having the filesystem add > them to the page cache. It's a net reduction in code for each > implementation, more efficient than walking a list, and solves > the direct-write vs buffered-read problem reported by yu kuai at > https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3@huawei.com/ > > The only unconverted filesystems are those which use fscache. > Their conversion is pending Dave Howells' rewrite which will make the > conversion substantially easier. Latest version in your git tree: $ ▶ glo -n 5 willy/readahead 4be497096c04 mm: Use memalloc_nofs_save in readahead path ff63497fcb98 iomap: Convert from readpages to readahead 26aee60e89b5 iomap: Restructure iomap_readpages_actor 8115bcca7312 fuse: Convert from readpages to readahead 3db3d10d9ea1 f2fs: Convert from readpages to readahead $ merged into a 5.6-rc2 tree fails at boot on my test vm: [ 2.423116] ------------[ cut here ]------------ [ 2.424957] list_add double add: new=ffffea000efff4c8, prev=ffff8883bfffee60, next=ffffea000efff4c8. [ 2.428259] WARNING: CPU: 4 PID: 1 at lib/list_debug.c:29 __list_add_valid+0x67/0x70 [ 2.430617] CPU: 4 PID: 1 Comm: sh Not tainted 5.6.0-rc2-dgc+ #1800 [ 2.432405] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 2.434744] RIP: 0010:__list_add_valid+0x67/0x70 [ 2.436107] Code: c6 4c 89 ca 48 c7 c7 10 41 58 82 e8 55 29 89 ff 0f 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 60 41 58 82 e8 3b 29 89 ff <0f> 0b 31 c7 [ 2.441161] RSP: 0000:ffffc900018a3bb0 EFLAGS: 00010082 [ 2.442548] RAX: 0000000000000000 RBX: ffffea000efff4c0 RCX: 0000000000000256 [ 2.444432] RDX: 0000000000000001 RSI: 0000000000000086 RDI: ffffffff8288a8b0 [ 2.446315] RBP: ffffea000efff4c8 R08: ffffc900018a3a65 R09: 0000000000000256 [ 2.448199] R10: 0000000000000008 R11: ffffc900018a3a65 R12: ffffea000efff4c8 [ 2.450072] R13: ffff8883bfffee60 R14: 0000000000000010 R15: 0000000000000001 [ 2.451959] FS: 0000000000000000(0000) GS:ffff8883b9c00000(0000) knlGS:0000000000000000 [ 2.454083] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.455604] CR2: 00000000ffffffff CR3: 00000003b9a37002 CR4: 0000000000060ee0 [ 2.457484] Call Trace: [ 2.458171] __pagevec_lru_add_fn+0x15f/0x2c0 [ 2.459376] pagevec_lru_move_fn+0x87/0xd0 [ 2.460500] ? pagevec_move_tail_fn+0x2d0/0x2d0 [ 2.461712] lru_add_drain_cpu+0x8d/0x160 [ 2.462787] lru_add_drain+0x18/0x20 [ 2.463757] shift_arg_pages+0xb8/0x180 [ 2.464789] ? vprintk_emit+0x101/0x1c0 [ 2.465813] ? printk+0x58/0x6f [ 2.466659] setup_arg_pages+0x205/0x240 [ 2.467716] load_elf_binary+0x34a/0x1560 [ 2.468789] ? get_user_pages_remote+0x159/0x280 [ 2.470024] ? selinux_inode_permission+0x10d/0x1e0 [ 2.471323] ? _raw_read_unlock+0xa/0x20 [ 2.472375] ? load_misc_binary+0x2b2/0x410 [ 2.473492] search_binary_handler+0x60/0xe0 [ 2.474634] __do_execve_file.isra.0+0x512/0x850 [ 2.475888] ? rest_init+0xc6/0xc6 [ 2.476801] do_execve+0x21/0x30 [ 2.477671] try_to_run_init_process+0x10/0x34 [ 2.478855] kernel_init+0xe2/0xfa [ 2.479776] ret_from_fork+0x1f/0x30 [ 2.480737] ---[ end trace e77079de9b22dc6a ]--- I just dropped the ext4 conversion from my local tree so I can boot the machine and test XFS. Might have some more info when that crashes and burns... Cheers, Dave.
On Tue, Feb 18, 2020 at 03:56:33PM +1100, Dave Chinner wrote: > Latest version in your git tree: > > $ ▶ glo -n 5 willy/readahead > 4be497096c04 mm: Use memalloc_nofs_save in readahead path > ff63497fcb98 iomap: Convert from readpages to readahead > 26aee60e89b5 iomap: Restructure iomap_readpages_actor > 8115bcca7312 fuse: Convert from readpages to readahead > 3db3d10d9ea1 f2fs: Convert from readpages to readahead > $ > > merged into a 5.6-rc2 tree fails at boot on my test vm: > > [ 2.423116] ------------[ cut here ]------------ > [ 2.424957] list_add double add: new=ffffea000efff4c8, prev=ffff8883bfffee60, next=ffffea000efff4c8. > [ 2.428259] WARNING: CPU: 4 PID: 1 at lib/list_debug.c:29 __list_add_valid+0x67/0x70 > [ 2.457484] Call Trace: > [ 2.458171] __pagevec_lru_add_fn+0x15f/0x2c0 > [ 2.459376] pagevec_lru_move_fn+0x87/0xd0 > [ 2.460500] ? pagevec_move_tail_fn+0x2d0/0x2d0 > [ 2.461712] lru_add_drain_cpu+0x8d/0x160 > [ 2.462787] lru_add_drain+0x18/0x20 Are you sure that was 4be497096c04 ? I ask because there was a version pushed to that git tree that did contain a list double-add (due to a mismerge when shuffling patches). I noticed it and fixed it, and 4be497096c04 doesn't have that problem. I also test with CONFIG_DEBUG_LIST turned on, but this problem you hit is going to be probabilistic because it'll depend on the timing between whatever other list is being used and the page actually being added to the LRU.
On 2/17/20 10:45 AM, Matthew Wilcox wrote: > From: "Matthew Wilcox (Oracle)" <willy@infradead.org> > > This series adds a readahead address_space operation to eventually > replace the readpages operation. The key difference is that > pages are added to the page cache as they are allocated (and > then looked up by the filesystem) instead of passing them on a > list to the readpages operation and having the filesystem add > them to the page cache. It's a net reduction in code for each > implementation, more efficient than walking a list, and solves > the direct-write vs buffered-read problem reported by yu kuai at > https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3@huawei.com/ > > The only unconverted filesystems are those which use fscache. > Their conversion is pending Dave Howells' rewrite which will make the > conversion substantially easier. Hi Matthew, I see that Dave Chinner is reviewing this series, but I'm trying out his recent advice about code reviews [1], and so I'm not going to read his comments first. So you may see some duplication or contradictions this time around. [1] Somewhere in this thread, "[LSF/MM/BPF TOPIC] FS Maintainers Don't Scale": https://lore.kernel.org/r/20200131052520.GC6869@magnolia thanks,
On Tue, Feb 18, 2020 at 05:42:30AM -0800, Matthew Wilcox wrote: > On Tue, Feb 18, 2020 at 03:56:33PM +1100, Dave Chinner wrote: > > Latest version in your git tree: > > > > $ ▶ glo -n 5 willy/readahead > > 4be497096c04 mm: Use memalloc_nofs_save in readahead path > > ff63497fcb98 iomap: Convert from readpages to readahead > > 26aee60e89b5 iomap: Restructure iomap_readpages_actor > > 8115bcca7312 fuse: Convert from readpages to readahead > > 3db3d10d9ea1 f2fs: Convert from readpages to readahead > > $ > > > > merged into a 5.6-rc2 tree fails at boot on my test vm: > > > > [ 2.423116] ------------[ cut here ]------------ > > [ 2.424957] list_add double add: new=ffffea000efff4c8, prev=ffff8883bfffee60, next=ffffea000efff4c8. > > [ 2.428259] WARNING: CPU: 4 PID: 1 at lib/list_debug.c:29 __list_add_valid+0x67/0x70 > > [ 2.457484] Call Trace: > > [ 2.458171] __pagevec_lru_add_fn+0x15f/0x2c0 > > [ 2.459376] pagevec_lru_move_fn+0x87/0xd0 > > [ 2.460500] ? pagevec_move_tail_fn+0x2d0/0x2d0 > > [ 2.461712] lru_add_drain_cpu+0x8d/0x160 > > [ 2.462787] lru_add_drain+0x18/0x20 > > Are you sure that was 4be497096c04 ? I ask because there was a Yes, because it's the only version I've actually merged into my working tree, compiled and tried to run. :P > version pushed to that git tree that did contain a list double-add > (due to a mismerge when shuffling patches). I noticed it and fixed > it, and 4be497096c04 doesn't have that problem. I also test with > CONFIG_DEBUG_LIST turned on, but this problem you hit is going to be > probabilistic because it'll depend on the timing between whatever other > list is being used and the page actually being added to the LRU. I'll see if I can reproduce it. Cheers, Dave.
On Wed, Feb 19, 2020 at 08:26:52AM +1100, Dave Chinner wrote: > On Tue, Feb 18, 2020 at 05:42:30AM -0800, Matthew Wilcox wrote: > > On Tue, Feb 18, 2020 at 03:56:33PM +1100, Dave Chinner wrote: > > > Latest version in your git tree: > > > > > > $ ▶ glo -n 5 willy/readahead > > > 4be497096c04 mm: Use memalloc_nofs_save in readahead path > > > ff63497fcb98 iomap: Convert from readpages to readahead > > > 26aee60e89b5 iomap: Restructure iomap_readpages_actor > > > 8115bcca7312 fuse: Convert from readpages to readahead > > > 3db3d10d9ea1 f2fs: Convert from readpages to readahead > > > $ > > > > > > merged into a 5.6-rc2 tree fails at boot on my test vm: > > > > > > [ 2.423116] ------------[ cut here ]------------ > > > [ 2.424957] list_add double add: new=ffffea000efff4c8, prev=ffff8883bfffee60, next=ffffea000efff4c8. > > > [ 2.428259] WARNING: CPU: 4 PID: 1 at lib/list_debug.c:29 __list_add_valid+0x67/0x70 > > > [ 2.457484] Call Trace: > > > [ 2.458171] __pagevec_lru_add_fn+0x15f/0x2c0 > > > [ 2.459376] pagevec_lru_move_fn+0x87/0xd0 > > > [ 2.460500] ? pagevec_move_tail_fn+0x2d0/0x2d0 > > > [ 2.461712] lru_add_drain_cpu+0x8d/0x160 > > > [ 2.462787] lru_add_drain+0x18/0x20 > > > > Are you sure that was 4be497096c04 ? I ask because there was a > > Yes, because it's the only version I've actually merged into my > working tree, compiled and tried to run. :P > > > version pushed to that git tree that did contain a list double-add > > (due to a mismerge when shuffling patches). I noticed it and fixed > > it, and 4be497096c04 doesn't have that problem. I also test with > > CONFIG_DEBUG_LIST turned on, but this problem you hit is going to be > > probabilistic because it'll depend on the timing between whatever other > > list is being used and the page actually being added to the LRU. > > I'll see if I can reproduce it. Just updated to a current TOT Linus kernel and your latest branch, and so far this is 100% reproducable. Not sure how I'm going to debug it yet, because it's init that is triggering it.... Cheers, Dave.
On Wed, Feb 19, 2020 at 02:45:25PM +1100, Dave Chinner wrote: > On Wed, Feb 19, 2020 at 08:26:52AM +1100, Dave Chinner wrote: > > On Tue, Feb 18, 2020 at 05:42:30AM -0800, Matthew Wilcox wrote: > > > On Tue, Feb 18, 2020 at 03:56:33PM +1100, Dave Chinner wrote: > > > > Latest version in your git tree: > > > > > > > > $ ▶ glo -n 5 willy/readahead > > > > 4be497096c04 mm: Use memalloc_nofs_save in readahead path > > > > ff63497fcb98 iomap: Convert from readpages to readahead > > > > 26aee60e89b5 iomap: Restructure iomap_readpages_actor > > > > 8115bcca7312 fuse: Convert from readpages to readahead > > > > 3db3d10d9ea1 f2fs: Convert from readpages to readahead > > > > $ > > > > > > > > merged into a 5.6-rc2 tree fails at boot on my test vm: > > > > > > > > [ 2.423116] ------------[ cut here ]------------ > > > > [ 2.424957] list_add double add: new=ffffea000efff4c8, prev=ffff8883bfffee60, next=ffffea000efff4c8. > > > > [ 2.428259] WARNING: CPU: 4 PID: 1 at lib/list_debug.c:29 __list_add_valid+0x67/0x70 > > > > [ 2.457484] Call Trace: > > > > [ 2.458171] __pagevec_lru_add_fn+0x15f/0x2c0 > > > > [ 2.459376] pagevec_lru_move_fn+0x87/0xd0 > > > > [ 2.460500] ? pagevec_move_tail_fn+0x2d0/0x2d0 > > > > [ 2.461712] lru_add_drain_cpu+0x8d/0x160 > > > > [ 2.462787] lru_add_drain+0x18/0x20 > > > > > > Are you sure that was 4be497096c04 ? I ask because there was a > > > > Yes, because it's the only version I've actually merged into my > > working tree, compiled and tried to run. :P > > > > > version pushed to that git tree that did contain a list double-add > > > (due to a mismerge when shuffling patches). I noticed it and fixed > > > it, and 4be497096c04 doesn't have that problem. I also test with > > > CONFIG_DEBUG_LIST turned on, but this problem you hit is going to be > > > probabilistic because it'll depend on the timing between whatever other > > > list is being used and the page actually being added to the LRU. > > > > I'll see if I can reproduce it. > > Just updated to a current TOT Linus kernel and your latest branch, > and so far this is 100% reproducable. > > Not sure how I'm going to debug it yet, because it's init that is > triggering it.... Eric found it ... still not sure why I don't see it.
On Tue, Feb 18, 2020 at 07:48:32PM -0800, Matthew Wilcox wrote: > On Wed, Feb 19, 2020 at 02:45:25PM +1100, Dave Chinner wrote: > > On Wed, Feb 19, 2020 at 08:26:52AM +1100, Dave Chinner wrote: > > > On Tue, Feb 18, 2020 at 05:42:30AM -0800, Matthew Wilcox wrote: > > > > On Tue, Feb 18, 2020 at 03:56:33PM +1100, Dave Chinner wrote: > > > > > Latest version in your git tree: > > > > > > > > > > $ ▶ glo -n 5 willy/readahead > > > > > 4be497096c04 mm: Use memalloc_nofs_save in readahead path > > > > > ff63497fcb98 iomap: Convert from readpages to readahead > > > > > 26aee60e89b5 iomap: Restructure iomap_readpages_actor > > > > > 8115bcca7312 fuse: Convert from readpages to readahead > > > > > 3db3d10d9ea1 f2fs: Convert from readpages to readahead > > > > > $ > > > > > > > > > > merged into a 5.6-rc2 tree fails at boot on my test vm: > > > > > > > > > > [ 2.423116] ------------[ cut here ]------------ > > > > > [ 2.424957] list_add double add: new=ffffea000efff4c8, prev=ffff8883bfffee60, next=ffffea000efff4c8. > > > > > [ 2.428259] WARNING: CPU: 4 PID: 1 at lib/list_debug.c:29 __list_add_valid+0x67/0x70 > > > > > [ 2.457484] Call Trace: > > > > > [ 2.458171] __pagevec_lru_add_fn+0x15f/0x2c0 > > > > > [ 2.459376] pagevec_lru_move_fn+0x87/0xd0 > > > > > [ 2.460500] ? pagevec_move_tail_fn+0x2d0/0x2d0 > > > > > [ 2.461712] lru_add_drain_cpu+0x8d/0x160 > > > > > [ 2.462787] lru_add_drain+0x18/0x20 > > > > > > > > Are you sure that was 4be497096c04 ? I ask because there was a > > > > > > Yes, because it's the only version I've actually merged into my > > > working tree, compiled and tried to run. :P > > > > > > > version pushed to that git tree that did contain a list double-add > > > > (due to a mismerge when shuffling patches). I noticed it and fixed > > > > it, and 4be497096c04 doesn't have that problem. I also test with > > > > CONFIG_DEBUG_LIST turned on, but this problem you hit is going to be > > > > probabilistic because it'll depend on the timing between whatever other > > > > list is being used and the page actually being added to the LRU. > > > > > > I'll see if I can reproduce it. > > > > Just updated to a current TOT Linus kernel and your latest branch, > > and so far this is 100% reproducable. > > > > Not sure how I'm going to debug it yet, because it's init that is > > triggering it.... > > Eric found it ... Yeah, just saw that and am applying his patch to test it... > still not sure why I don't see it. No readahead configured on your device? Cheers, Dave.
From: "Matthew Wilcox (Oracle)" <willy@infradead.org> This series adds a readahead address_space operation to eventually replace the readpages operation. The key difference is that pages are added to the page cache as they are allocated (and then looked up by the filesystem) instead of passing them on a list to the readpages operation and having the filesystem add them to the page cache. It's a net reduction in code for each implementation, more efficient than walking a list, and solves the direct-write vs buffered-read problem reported by yu kuai at https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3@huawei.com/ The only unconverted filesystems are those which use fscache. Their conversion is pending Dave Howells' rewrite which will make the conversion substantially easier. v6: - Name the private members of readahead_control with a leading underscore (suggested by Christoph Hellwig) - Fix whitespace in rst file - Remove misleading comment in btrfs patch - Add readahead_next() API and use it in iomap - Add iomap_readahead kerneldoc. - Fix the mpage_readahead kerneldoc - Make various readahead functions return void - Keep readahead_index() and readahead_offset() pointing to the start of this batch through the body. No current user requires this, but it's less surprising. - Add kerneldoc for page_cache_readahead_limit - Make page_idx an unsigned long, and rename it to just 'i' - Get rid of page_offset local variable - Add patch to call memalloc_nofs_save() before allocating pages (suggested by Michal Hocko) - Resplit a lot of patches for more logical progression and easier review (suggested by John Hubbard) - Added sign-offs where received, and I deemed still relevant v5 switched to passing a readahead_control struct (mirroring the writepages_control struct passed to writepages). This has a number of advantages: - It fixes a number of bugs in various implementations, eg forgetting to increment 'start', an off-by-one error in 'nr_pages' or treating 'start' as a byte offset instead of a page offset. - It allows us to change the arguments without changing all the implementations of ->readahead which just call mpage_readahead() or iomap_readahead() - Figuring out which pages haven't been attempted by the implementation is more natural this way. - There's less code in each implementation. Matthew Wilcox (Oracle) (19): mm: Return void from various readahead functions mm: Ignore return value of ->readpages mm: Use readahead_control to pass arguments mm: Rearrange readahead loop mm: Remove 'page_offset' from readahead loop mm: rename readahead loop variable to 'i' mm: Put readahead pages in cache earlier mm: Add readahead address space operation mm: Add page_cache_readahead_limit fs: Convert mpage_readpages to mpage_readahead btrfs: Convert from readpages to readahead erofs: Convert uncompressed files from readpages to readahead erofs: Convert compressed files from readpages to readahead ext4: Convert from readpages to readahead f2fs: Convert from readpages to readahead fuse: Convert from readpages to readahead iomap: Restructure iomap_readpages_actor iomap: Convert from readpages to readahead mm: Use memalloc_nofs_save in readahead path Documentation/filesystems/locking.rst | 6 +- Documentation/filesystems/vfs.rst | 13 ++ drivers/staging/exfat/exfat_super.c | 7 +- fs/block_dev.c | 7 +- fs/btrfs/extent_io.c | 46 ++----- fs/btrfs/extent_io.h | 3 +- fs/btrfs/inode.c | 16 +-- fs/erofs/data.c | 39 ++---- fs/erofs/zdata.c | 29 ++-- fs/ext2/inode.c | 10 +- fs/ext4/ext4.h | 3 +- fs/ext4/inode.c | 23 ++-- fs/ext4/readpage.c | 22 ++- fs/ext4/verity.c | 35 +---- fs/f2fs/data.c | 50 +++---- fs/f2fs/f2fs.h | 5 +- fs/f2fs/verity.c | 35 +---- fs/fat/inode.c | 7 +- fs/fuse/file.c | 46 +++---- fs/gfs2/aops.c | 23 ++-- fs/hpfs/file.c | 7 +- fs/iomap/buffered-io.c | 118 +++++++---------- fs/iomap/trace.h | 2 +- fs/isofs/inode.c | 7 +- fs/jfs/inode.c | 7 +- fs/mpage.c | 38 ++---- fs/nilfs2/inode.c | 15 +-- fs/ocfs2/aops.c | 34 ++--- fs/omfs/file.c | 7 +- fs/qnx6/inode.c | 7 +- fs/reiserfs/inode.c | 8 +- fs/udf/inode.c | 7 +- fs/xfs/xfs_aops.c | 13 +- fs/zonefs/super.c | 7 +- include/linux/fs.h | 2 + include/linux/iomap.h | 3 +- include/linux/mpage.h | 4 +- include/linux/pagemap.h | 90 +++++++++++++ include/trace/events/erofs.h | 6 +- include/trace/events/f2fs.h | 6 +- mm/internal.h | 8 +- mm/migrate.c | 2 +- mm/readahead.c | 184 +++++++++++++++++--------- 43 files changed, 474 insertions(+), 533 deletions(-) base-commit: 11a48a5a18c63fd7621bb050228cebf13566e4d8