Message ID | 20200201151240.24082-1-willy@infradead.org (mailing list archive) |
---|---|
Headers | show |
Series | Change readahead API | expand |
On Sat, Feb 01, 2020 at 07:12:28AM -0800, Matthew Wilcox wrote: > From: "Matthew Wilcox (Oracle)" <willy@infradead.org> > > I would particularly value feedback on this from the gfs2 and ocfs2 > maintainers. They have non-trivial changes, and a review on patch 5 > would be greatly appreciated. > > This series adds a readahead address_space operation to eventually > replace the readpages operation. The key difference is that > pages are added to the page cache as they are allocated (and > then looked up by the filesystem) instead of passing them on a > list to the readpages operation and having the filesystem add > them to the page cache. It's a net reduction in code for each > implementation, more efficient than walking a list, and solves > the direct-write vs buffered-read problem reported by yu kuai at > https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3@huawei.com/ > > v4: > - Rebase on current Linus (a62aa6f7f50a ("Merge tag 'gfs2-for-5.6'")) I've tried to test the patchset but haven't got very far, it crashes at boot ritht after VFS mounts the root. The patches are from mailinglist, applied on current master, bug I saw the same crash with the git branch in your repo (probably v1). (gdb) l *(ext4_mpage_readpages+0x1da/0xc20) 0xffffffff813753f0 is in ext4_mpage_readpages (fs/ext4/readpage.c:226). 221 return i_size_read(inode); 222 } 223 224 int ext4_mpage_readpages(struct address_space *mapping, pgoff_t start, 225 struct page *page, unsigned nr_pages, bool is_readahead) 226 { 227 struct bio *bio = NULL; 228 sector_t last_block_in_bio = 0; 229 230 struct inode *inode = mapping->host; [ 8.008531] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 8.011482] #PF: supervisor read access in kernel mode [ 8.014121] #PF: error_code(0x0000) - not-present page [ 8.016767] PGD 0 P4D 0 [ 8.018352] Oops: 0000 [#1] SMP [ 8.019716] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.5.0-default+ #955 [ 8.021746] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 [ 8.025244] RIP: 0010:ext4_mpage_readpages+0x1da/0xc20 [ 8.026817] Code: 7c 24 4e 00 0f 85 23 04 00 00 44 29 74 24 3c 83 6c 24 48 01 0f 84 4d 04 00 00 80 7c 24 4e 00 0f 85 fc 05 00 00 48 8b 4c 24 18 <48> 8b 01 f6 c4 20 75 89 4c 8b 69 20 b9 0c 00 00 00 2b 4c 24 38 83 [ 8.031957] RSP: 0000:ffffb34f40013988 EFLAGS: 00010292 [ 8.033691] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 8.035533] RDX: 0000000000000001 RSI: ffffffff960934c0 RDI: ffffffff9681a080 [ 8.036900] RBP: 0000000000000001 R08: ffffb34f40013a68 R09: 0000000000000000 [ 8.038461] R10: 0000000000000038 R11: 0000000000000000 R12: 0000000000000004 [ 8.040698] R13: ffff9668ba4e18e0 R14: 0000000000000001 R15: 0000000000000000 [ 8.042805] FS: 0000000000000000(0000) GS:ffff9668bda00000(0000) knlGS:0000000000000000 [ 8.045396] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.047233] CR2: 0000000000000000 CR3: 000000002e011001 CR4: 0000000000160ee0 [ 8.049337] Call Trace: [ 8.050435] ? __lock_acquire+0xee0/0x1320 [ 8.051833] ? release_pages+0x310/0x380 [ 8.053265] ? mark_held_locks+0x50/0x80 [ 8.054468] ext4_readahead+0x3b/0x50 [ 8.055877] read_pages+0x65/0x1a0 [ 8.057167] ? put_pages_list+0x90/0x90 [ 8.058689] __do_page_cache_readahead+0x24b/0x2a0 [ 8.060394] generic_file_buffered_read+0x7cf/0x9f0 [ 8.062137] ? sched_clock+0x5/0x10 [ 8.063451] ? up_read+0x18/0x240 [ 8.064774] ? ext4_xattr_get+0x97/0x2c0 [ 8.066178] new_sync_read+0x111/0x1a0 [ 8.067423] vfs_read+0xc5/0x180 [ 8.068572] kernel_read+0x2c/0x40 [ 8.069788] prepare_binprm+0x171/0x1b0 [ 8.071311] load_script+0x1c1/0x250 [ 8.072643] search_binary_handler+0x5f/0x210 [ 8.074135] exec_binprm+0xd7/0x290 [ 8.075463] __do_execve_file.isra.0+0x570/0x800 [ 8.077400] ? rest_init+0x2f1/0x2f5 [ 8.078979] do_execve+0x21/0x30 [ 8.080420] kernel_init+0xa4/0x11b [ 8.081856] ? rest_init+0x2f5/0x2f5 [ 8.083173] ret_from_fork+0x24/0x30 [ 8.084695] Modules linked in: [ 8.086055] CR2: 0000000000000000 [ 8.087572] ---[ end trace 0890c371a706b34a ]--- [ 8.089417] RIP: 0010:ext4_mpage_readpages+0x1da/0xc20 [ 8.116836] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:38 [ 8.119626] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 [ 8.122392] INFO: lockdep is turned off. [ 8.123694] irq event stamp: 18341344 [ 8.124735] hardirqs last enabled at (18341343): [<ffffffff95230c42>] free_unref_page_list+0x232/0x270 [ 8.127918] hardirqs last disabled at (18341344): [<ffffffff95002b4b>] trace_hardirqs_off_thunk+0x1a/0x1c [ 8.131145] softirqs last enabled at (18341250): [<ffffffff95a00358>] __do_softirq+0x358/0x52b [ 8.143060] softirqs last disabled at (18341243): [<ffffffff9508ae3d>] irq_exit+0x9d/0xb0 [ 8.145603] CPU: 2 PID: 1 Comm: swapper/0 Tainted: G D 5.5.0-default+ #955 [ 8.148474] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 [ 8.152440] Call Trace: [ 8.153747] dump_stack+0x71/0xa0 [ 8.155238] ___might_sleep.cold+0xa6/0xf9 [ 8.156903] exit_signals+0x31/0x310 [ 8.158431] ? __do_execve_file.isra.0+0x570/0x800 [ 8.160179] do_exit+0xa8/0xd60 [ 8.161632] ? rest_init+0x2f1/0x2f5 [ 8.163204] rewind_stack_do_exit+0x17/0x20 [ 8.164931] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 [ 8.167575] Kernel Offset: 0x14000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
On Tue, Feb 04, 2020 at 04:32:27PM +0100, David Sterba wrote: > On Sat, Feb 01, 2020 at 07:12:28AM -0800, Matthew Wilcox wrote: > > From: "Matthew Wilcox (Oracle)" <willy@infradead.org> > > > > I would particularly value feedback on this from the gfs2 and ocfs2 > > maintainers. They have non-trivial changes, and a review on patch 5 > > would be greatly appreciated. > > > > This series adds a readahead address_space operation to eventually > > replace the readpages operation. The key difference is that > > pages are added to the page cache as they are allocated (and > > then looked up by the filesystem) instead of passing them on a > > list to the readpages operation and having the filesystem add > > them to the page cache. It's a net reduction in code for each > > implementation, more efficient than walking a list, and solves > > the direct-write vs buffered-read problem reported by yu kuai at > > https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3@huawei.com/ > > > > v4: > > - Rebase on current Linus (a62aa6f7f50a ("Merge tag 'gfs2-for-5.6'")) > > I've tried to test the patchset but haven't got very far, it crashes at boot > ritht after VFS mounts the root. The patches are from mailinglist, applied on > current master, bug I saw the same crash with the git branch in your > repo (probably v1). Yeah, I wasn't able to test at the time due to what turned out to be the hpet bug in Linus' tree. Now that's fixed, I've found & fixed a couple more bugs. There'll be a v5 once I fix the remaining problem (looks like a missing page unlock somewhere).
From: "Matthew Wilcox (Oracle)" <willy@infradead.org> I would particularly value feedback on this from the gfs2 and ocfs2 maintainers. They have non-trivial changes, and a review on patch 5 would be greatly appreciated. This series adds a readahead address_space operation to eventually replace the readpages operation. The key difference is that pages are added to the page cache as they are allocated (and then looked up by the filesystem) instead of passing them on a list to the readpages operation and having the filesystem add them to the page cache. It's a net reduction in code for each implementation, more efficient than walking a list, and solves the direct-write vs buffered-read problem reported by yu kuai at https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3@huawei.com/ v4: - Rebase on current Linus (a62aa6f7f50a ("Merge tag 'gfs2-for-5.6'")) - Add comment to __do_page_cache_readahead() acknowledging we don't care _that_ much about setting PageReadahead. - Fix the return value check of add_to_page_cache_lru(). - Add a missing call to put_page() in __do_page_cache_readahead() if we fail to insert the page. - Improve the documentation of ->readahead (including indentation problem identified by Randy). - Fix off by one error in read_pages() (Dave Chinner). - Fix nr_pages manipulation in btrfs (Dave Chinner). - Remove bogus refcount fix in erofs (Gao Xiang, Dave Chinner). - Update ext4 patch for Merkle tree readahead. - Update f2fs patch for Merkle tree readahead. - Reinstate next_page label in f2fs_readpages() now it's used by the compression code. - Reinstate call to fuse_wait_on_page_writeback (Miklos Szeredi). - Remove a double-unlock in the error path in fuse. - Remove an odd fly-speck in fuse_readpages(). - Make nr_pages loop in fuse_readpages less convoluted (Dave Chinner). Matthew Wilcox (Oracle) (12): mm: Fix the return type of __do_page_cache_readahead readahead: Ignore return value of ->readpages readahead: Put pages in cache earlier mm: Add readahead address space operation fs: Convert mpage_readpages to mpage_readahead btrfs: Convert from readpages to readahead erofs: Convert uncompressed files from readpages to readahead erofs: Convert compressed files from readpages to readahead ext4: Convert from readpages to readahead f2fs: Convert from readpages to readahead fuse: Convert from readpages to readahead iomap: Convert from readpages to readahead Documentation/filesystems/locking.rst | 7 ++- Documentation/filesystems/vfs.rst | 14 +++++ drivers/staging/exfat/exfat_super.c | 9 +-- fs/block_dev.c | 9 +-- fs/btrfs/extent_io.c | 19 +++--- fs/btrfs/extent_io.h | 2 +- fs/btrfs/inode.c | 18 +++--- fs/erofs/data.c | 33 ++++------ fs/erofs/zdata.c | 21 +++---- fs/ext2/inode.c | 12 ++-- fs/ext4/ext4.h | 5 +- fs/ext4/inode.c | 24 ++++---- fs/ext4/readpage.c | 20 +++--- fs/ext4/verity.c | 16 +++-- fs/f2fs/data.c | 35 +++++------ fs/f2fs/f2fs.h | 5 +- fs/f2fs/verity.c | 16 +++-- fs/fat/inode.c | 8 +-- fs/fuse/file.c | 37 +++++------ fs/gfs2/aops.c | 20 +++--- fs/hpfs/file.c | 8 +-- fs/iomap/buffered-io.c | 74 +++++----------------- fs/iomap/trace.h | 2 +- fs/isofs/inode.c | 9 +-- fs/jfs/inode.c | 8 +-- fs/mpage.c | 38 ++++-------- fs/nilfs2/inode.c | 13 ++-- fs/ocfs2/aops.c | 32 +++++----- fs/omfs/file.c | 8 +-- fs/qnx6/inode.c | 8 +-- fs/reiserfs/inode.c | 10 +-- fs/udf/inode.c | 8 +-- fs/xfs/xfs_aops.c | 10 +-- include/linux/fs.h | 2 + include/linux/iomap.h | 2 +- include/linux/mpage.h | 2 +- include/linux/pagemap.h | 12 ++++ include/trace/events/erofs.h | 6 +- include/trace/events/f2fs.h | 6 +- mm/internal.h | 2 +- mm/migrate.c | 2 +- mm/readahead.c | 89 ++++++++++++++++++--------- 42 files changed, 332 insertions(+), 349 deletions(-)