Message ID | 20230209102954.528942-4-dhowells@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | iov_iter: Improve page extraction (pin or just list) | expand |
> The code is loosely based on filemap_read() and might belong in > mm/filemap.c with that as it needs to use filemap_get_pages(). Yes, I thunk it should go into filemap.c > + while (spliced < size && > + !pipe_full(pipe->head, pipe->tail, pipe->max_usage)) { > + struct pipe_buffer *buf = &pipe->bufs[pipe->head & (pipe->ring_size - 1)]; Can you please facto this calculation, that is also duplicated in patch one into a helper? static inline struct pipe_buffer *pipe_head_buf(struct pipe_inode_info *pipe) { return &pipe->bufs[pipe->head & (pipe->ring_size - 1)]; } > + struct folio_batch fbatch; > + size_t total_spliced = 0, used, npages; > + loff_t isize, end_offset; > + bool writably_mapped; > + int i, error = 0; > + > + struct kiocb iocb = { Why the empty line before this declaration? > + .ki_filp = in, > + .ki_pos = *ppos, > + }; Also why doesn't this use init_sync_kiocb? > if (in->f_flags & O_DIRECT) > return generic_file_direct_splice_read(in, ppos, pipe, len, flags); > + return generic_file_buffered_splice_read(in, ppos, pipe, len, flags); Btw, can we drop the verbose generic_file_ prefix here? generic_file_buffered_splice_read really should be filemap_splice_read and be in filemap.c. generic_file_direct_splice_read I'd just name direct_splice_read.
Christoph Hellwig <hch@infradead.org> wrote: > Also why doesn't this use init_sync_kiocb? I'm not sure I want ki_flags. > > if (in->f_flags & O_DIRECT) > > return generic_file_direct_splice_read(in, ppos, pipe, len, flags); > > + return generic_file_buffered_splice_read(in, ppos, pipe, len, flags); > > Btw, can we drop the verbose generic_file_ prefix here? Probably. Note that at some point cifs, for example, running in "unbuffered" mode might want to call [generic_file_]direct_splice_read() directly. David
On Mon, Feb 13, 2023 at 10:11:01AM +0000, David Howells wrote: > Christoph Hellwig <hch@infradead.org> wrote: > > > Also why doesn't this use init_sync_kiocb? > > I'm not sure I want ki_flags. Why?
Christoph Hellwig <hch@infradead.org> wrote: > > > Also why doesn't this use init_sync_kiocb? > > > > I'm not sure I want ki_flags. > > Why? I'm not sure I want ki_flags setting from f_iocb_flags I should've said. I'm not sure how the IOCB_* flags that I import from there will affect the operation of the synchronous read splice. IOCB_NOWAIT, for example, or, for that matter, IOCB_APPEND. David
On Mon, Feb 13, 2023 at 11:15:37AM +0000, David Howells wrote: > I'm not sure I want ki_flags setting from f_iocb_flags I should've said. I'm > not sure how the IOCB_* flags that I import from there will affect the > operation of the synchronous read splice. The same way as they did in the old ITER_PIPE based generic_file_splice_read that uses init_sync_kiocb? And if there's any questions about them we need to do a deep audit. > IOCB_NOWAIT, for example, or, for I'd expect a set IOCB_NOWAIT to make the function return -EAGAIN when it has to block. > that matter, IOCB_APPEND. IOCB_APPEND has no effect on reads of any kind.
Hi, On Thu, Feb 09, 2023 at 10:29:45AM +0000, David Howells wrote: > Provide a function to do splice read from a buffered file, pulling the > folios out of the pagecache directly by calling filemap_get_pages() to do > any required reading and then pasting the returned folios into the pipe. > > A helper function is provided to do the actual folio pasting and will > handle multipage folios by splicing as many of the relevant subpages as > will fit into the pipe. > > The ITER_BVEC-based splicing previously added is then only used for > splicing from O_DIRECT files. > > The code is loosely based on filemap_read() and might belong in > mm/filemap.c with that as it needs to use filemap_get_pages(). > > With this, ITER_PIPE is no longer used. > > Signed-off-by: David Howells <dhowells@redhat.com> With this patch in the tree, the "collie" and "mps2" qemu emulations crash for me. Crash logs are attached. I also attached the bisect log for "collie". Unfortunately I can not revert the patch to confirm because the revert results in compile failures. Guenter --- bisect log # bad: [09e41676e35ab06e4bce8870ea3bf1f191c3cb90] Add linux-next specific files for 20230213 # good: [4ec5183ec48656cec489c49f989c508b68b518e3] Linux 6.2-rc7 git bisect start 'HEAD' 'v6.2-rc7' # good: [8b065aee8dfbecc978324b204fc897168c9adcd0] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git git bisect good 8b065aee8dfbecc978324b204fc897168c9adcd0 # bad: [72655d7bf4966cc46ac85ef74b26eb74e251ae4a] Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git git bisect bad 72655d7bf4966cc46ac85ef74b26eb74e251ae4a # good: [55461ffd2b7ee0a8fe4a1f98ae6f4a33771e8193] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git git bisect good 55461ffd2b7ee0a8fe4a1f98ae6f4a33771e8193 # bad: [0f1bf464790dad200077e97d35cd8bb9dd7b8341] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply.git git bisect bad 0f1bf464790dad200077e97d35cd8bb9dd7b8341 # good: [c72ebd41e0737e1f1d30dc6eb3d167e8d16dcc3a] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git git bisect good c72ebd41e0737e1f1d30dc6eb3d167e8d16dcc3a # bad: [501053535caca01f20a9323d3c8dec9ecb7a06b1] Merge branch 'for-6.3/iov-extract' into for-next git bisect bad 501053535caca01f20a9323d3c8dec9ecb7a06b1 # good: [efde918ac66958c568926120841e7692b1e9bd9d] rxrpc: use bvec_set_page to initialize a bvec git bisect good efde918ac66958c568926120841e7692b1e9bd9d # good: [6938b812a638d9f02d3eb4fd07c7aab4fd44076d] Merge branch 'for-6.3/io_uring' into for-next git bisect good 6938b812a638d9f02d3eb4fd07c7aab4fd44076d # good: [1972d038a5401781377d3ce2d901bf7763a43589] ublk: pass NULL to blk_mq_alloc_disk() as queuedata git bisect good 1972d038a5401781377d3ce2d901bf7763a43589 # good: [f37bf75ca73d523ebaa7ceb44c45d8ecd05374fe] block, bfq: cleanup 'bfqg->online' git bisect good f37bf75ca73d523ebaa7ceb44c45d8ecd05374fe # bad: [34c5b3634708864d5845cbadad03833c30051e0b] iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing git bisect bad 34c5b3634708864d5845cbadad03833c30051e0b # bad: [d9722a47571104f7fa1eeb5ec59044d3607c6070] splice: Do splice read from a buffered file without using ITER_PIPE git bisect bad d9722a47571104f7fa1eeb5ec59044d3607c6070 # good: [cd119d2fa647945d63941d3fd64f4acc9f6eec24] mm: Pass info, not iter, into filemap_get_pages() and unstatic it git bisect good cd119d2fa647945d63941d3fd64f4acc9f6eec24 # first bad commit: [d9722a47571104f7fa1eeb5ec59044d3607c6070] splice: Do splice read from a buffered file without using ITER_PIPE --- arm:collie crash 8<--- cut here --- Unable to handle kernel NULL pointer dereference at virtual address 00000000 when execute [00000000] *pgd=c14b4831c14b4831, *pte=c14b4000, *ppte=e09b5a14 8<--- cut here --- Unhandled fault: page domain fault (0x01b) at 0x00000000 [00000000] *pgd=c14b4831, *pte=00000000, *ppte=00000000 Internal error: : 1b [#1] ARM CPU: 0 PID: 58 Comm: cat Not tainted 6.2.0-rc7-next-20230213 #1 Hardware name: Sharp-Collie PC is at copy_from_kernel_nofault+0x124/0x23c LR is at 0xe09b5a84 pc : [<c009d894>] lr : [<e09b5a84>] psr: 20000193 sp : e09b5a4c ip : e09b5a84 fp : e09b5a80 r10: 00000214 r9 : 60000113 r8 : 00000004 r7 : c08b91fc r6 : e09b5a84 r5 : 00000004 r4 : 00000000 r3 : 00000001 r2 : c14a6ca0 r1 : 00000000 r0 : 00000001 Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none Control: 0000717f Table: c14d4000 DAC: 00000051 Register r0 information: non-paged memory Register r1 information: NULL pointer Register r2 information: slab task_struct start c14a6ca0 pointer offset 0 size 3232 Register r3 information: non-paged memory Register r4 information: NULL pointer Register r5 information: non-paged memory Register r6 information: 2-page vmalloc region starting at 0xe09b4000 allocated at kernel_clone+0x78/0x474 Register r7 information: non-slab/vmalloc memory Register r8 information: non-paged memory Register r9 information: non-paged memory Register r10 information: non-paged memory Register r11 information: 2-page vmalloc region starting at 0xe09b4000 allocated at kernel_clone+0x78/0x474 Register r12 information: 2-page vmalloc region starting at 0xe09b4000 allocated at kernel_clone+0x78/0x474 Process cat (pid: 58, stack limit = 0xfabdb807) Stack: (0xe09b5a4c to 0xe09b6000) 5a40: e09b5a70 e09b5a5c c005a4e4 c087b82c e09b5c14 5a60: e09b5c14 00000000 c08b91fc 80000005 60000113 e09b5a98 e09b5a84 c000cfa8 5a80: c009d77c e09b5ab0 c087b82c e09b5ac0 e09b5a9c c06b8020 c000cf84 c149a840 5aa0: c07df1e0 e09b5c14 c07df1c8 c087f3e4 c08b91fc e09b5b40 e09b5ac4 c000cc28 5ac0: c06b8010 e09b5af0 e09b5ad4 c005e7a4 c005d0b8 e09b5af8 e09b5ae4 c005d0d4 5ae0: c14b4000 e09b5b08 e09b5af4 c06de218 c005e72c e09b5b10 c087b82c e09b5b40 5b00: e09b5b1c c06dcba8 c06de1f4 c07ded54 c087b82c 00000000 80000005 00000000 5b20: c149a840 c07df1e0 c149a8b8 00000004 00000214 e09b5b58 e09b5b44 c000dcec 5b40: c000cb98 80000005 e09b5c14 e09b5b80 e09b5b5c c000dd78 c000dc84 e09b5c14 5b60: e09b5b6c e09b5c14 80000005 00000000 c149a840 e09b5bc0 e09b5b84 c000df6c 5b80: c000dd2c 00000000 c087ba8c c14a6ca0 00010000 c08bacc0 00000005 e09b5c14 5ba0: c087f688 c000e184 00000000 c14a6ca0 c149f158 e09b5bd8 e09b5bc4 c000e22c 5bc0: c000ddc0 00000005 e09b5c14 e09b5c10 e09b5bdc c000e33c c000e190 c14a6ca0 5be0: c00526a4 c14a6ca0 c088607c 60000013 00000000 20000013 ffffffff e09b5c48 5c00: dfb10900 e09b5cb4 e09b5c14 c0008e10 c000e304 c14b3a20 dfb10900 dfb10900 5c20: 60000093 dfb10900 00000000 c14b3a20 60000013 dfb10900 00000000 c149f158 5c40: e09b5cb4 e09b5cb8 e09b5c60 c0093a7c 00000000 20000013 ffffffff 00000051 5c60: c00a5a88 00000cc0 dfb10900 00000000 00000cc0 c149f128 e09b5cb4 e09b5c88 5c80: c009507c c087b82c e09b5c90 e09b5d94 e09b5d68 00000000 c149f128 dfb10900 5ca0: 00000000 c149f158 e09b5d3c e09b5cb8 c0099714 c0093a20 dfff1d14 c14a7258 5cc0: c00e28bc 00000002 e09b5d1c e09b5cd8 00010000 00000001 c14b3af0 c14b3a20 5ce0: c14a6ca0 00000010 00000001 c14b3a20 c149f128 c14b3af0 00000000 00000000 5d00: 00000000 00000000 00000000 c087b82c 80000113 00000000 00000000 e09b5f18 5d20: 00010000 c14b5600 00000000 00010000 e09b5e04 e09b5d40 c013019c c0099324 5d40: 60000113 00000000 c14a6ca0 c14a6ca0 e09b5d84 c14b3a20 c087ba8c c14a6ca0 5d60: 00000000 00000000 c14b3a20 00000000 00000000 00000000 00000000 00000000 5d80: 00000000 00000000 00000000 00000000 fffffffe ffff0000 000001a9 00000000 5da0: c832b34f c088607c 60000013 00000000 e09b5f18 c0a90ec8 01000000 c14b5600 5dc0: e09b5e04 e09b5dd0 c0055788 c0054dec 00000000 c087b82c c0102d90 c14b5600 5de0: 00000000 e09b5f18 e09b5f18 c14b3a20 00000000 00010000 e09b5e94 e09b5e08 5e00: c0130ec0 c01300d0 00000001 00000000 c0102d90 c14b5600 e09b5e9c e09b5e28 5e20: c06ef2a8 c005582c 00000001 00000000 c0102d90 e09b5e40 c0055f60 c0052e4c 5e40: e09b5e5c e09b5e50 c14b5634 00000001 00000001 00000002 c000f22c c149a894 5e60: 00000000 c087b82c c14d1e00 00010000 c14b3a20 c14b5600 e09b5f18 c0130e48 5e80: 01000000 00000001 e09b5ec4 e09b5e98 c012fee0 c0130e54 00000000 c06ef210 5ea0: c0102d90 00000000 c14b5600 c14b3a20 e09b5f18 00000000 e09b5ef4 e09b5ec8 5ec0: c0131d94 c012fe50 00000000 c0052e4c c14b3a20 00000000 00000000 01000000 5ee0: c14b3a20 c0c2aa20 e09b5f5c e09b5ef8 c00f72d8 c0131d28 00000000 c149a8b8 5f00: 00000002 00000255 c14b5600 00000000 c0c2aa20 c00f8f4c 00000000 00000000 5f20: 00000000 00000000 e09b5f74 c087b82c c000df18 00000000 00000000 00000003 5f40: 000000ef c0008420 c14a6ca0 01000000 e09b5fa4 e09b5f60 c00f8f4c c00f7170 5f60: 7fffffff 00000000 e09b5fac e09b5f78 c000e278 c087b82c befeee88 01000000 5f80: 00000000 01000000 000000ef c0008420 c14a6ca0 00000000 00000000 e09b5fa8 5fa0: c0008260 c00f8e38 01000000 00000000 00000001 00000003 00000000 01000000 5fc0: 01000000 00000000 01000000 000000ef 00000001 00000001 00000000 00000000 5fe0: b6e485d0 befedc74 00019764 b6e485dc 60000010 00000001 00000000 00000000 Backtrace: copy_from_kernel_nofault from is_valid_bugaddr+0x30/0x7c r9:60000113 r8:80000005 r7:c08b91fc r6:00000000 r5:e09b5c14 r4:e09b5c14 is_valid_bugaddr from report_bug+0x1c/0x114 report_bug from die+0x9c/0x398 r7:c08b91fc r6:c087f3e4 r5:c07df1c8 r4:e09b5c14 die from die_kernel_fault+0x74/0xa8 r10:00000214 r9:00000004 r8:c149a8b8 r7:c07df1e0 r6:c149a840 r5:00000000 r4:80000005 die_kernel_fault from __do_kernel_fault.part.0+0x58/0x94 r7:e09b5c14 r4:80000005 __do_kernel_fault.part.0 from do_page_fault+0x1b8/0x338 r7:c149a840 r6:00000000 r5:80000005 r4:e09b5c14 do_page_fault from do_translation_fault+0xa8/0xb0 r10:c149f158 r9:c14a6ca0 r8:00000000 r7:c000e184 r6:c087f688 r5:e09b5c14 r4:00000005 do_translation_fault from do_PrefetchAbort+0x44/0x98 r5:e09b5c14 r4:00000005 do_PrefetchAbort from __pabt_svc+0x50/0x80 Exception stack(0xe09b5c14 to 0xe09b5c5c) 5c00: c14b3a20 dfb10900 dfb10900 5c20: 60000093 dfb10900 00000000 c14b3a20 60000013 dfb10900 00000000 c149f158 5c40: e09b5cb4 e09b5cb8 e09b5c60 c0093a7c 00000000 20000013 ffffffff r8:dfb10900 r7:e09b5c48 r6:ffffffff r5:20000013 r4:00000000 filemap_read_folio from filemap_get_pages+0x3fc/0x7a4 r10:c149f158 r9:00000000 r8:dfb10900 r7:c149f128 r6:00000000 r5:e09b5d68 r4:e09b5d94 filemap_get_pages from generic_file_buffered_splice_read.constprop.0+0xd8/0x400 r10:00010000 r9:00000000 r8:c14b5600 r7:00010000 r6:e09b5f18 r5:00000000 r4:00000000 generic_file_buffered_splice_read.constprop.0 from generic_file_splice_read+0x78/0x310 r10:00010000 r9:00000000 r8:c14b3a20 r7:e09b5f18 r6:e09b5f18 r5:00000000 r4:c14b5600 generic_file_splice_read from do_splice_to+0x9c/0xbc r10:00000001 r9:01000000 r8:c0130e48 r7:e09b5f18 r6:c14b5600 r5:c14b3a20 r4:00010000 do_splice_to from splice_file_to_pipe+0x78/0x80 r8:00000000 r7:e09b5f18 r6:c14b3a20 r5:c14b5600 r4:00000000 splice_file_to_pipe from do_sendfile+0x174/0x59c r9:c0c2aa20 r8:c14b3a20 r7:01000000 r6:00000000 r5:00000000 r4:c14b3a20 do_sendfile from sys_sendfile64+0x120/0x148 r10:01000000 r9:c14a6ca0 r8:c0008420 r7:000000ef r6:00000003 r5:00000000 r4:00000000 sys_sendfile64 from ret_fast_syscall+0x0/0x44 Exception stack(0xe09b5fa8 to 0xe09b5ff0) 5fa0: 01000000 00000000 00000001 00000003 00000000 01000000 5fc0: 01000000 00000000 01000000 000000ef 00000001 00000001 00000000 00000000 5fe0: b6e485d0 befedc74 00019764 b6e485dc r10:00000000 r9:c14a6ca0 r8:c0008420 r7:000000ef r6:01000000 r5:00000000 r4:01000000 Code: e21e1003 1a000011 e3550003 9a00000f (e4943000) ---[ end trace 0000000000000000 ]--- --- arm:mps2 [ 4.659693] [ 4.659693] Unhandled exception: IPSR = 00000006 LR = fffffff1 [ 4.659888] CPU: 0 PID: 155 Comm: cat Tainted: G N 6.2.0-rc7-next-20230213 #1 [ 4.660030] Hardware name: Generic DT based system [ 4.660118] PC is at 0x0 [ 4.660248] LR is at filemap_read_folio+0x17/0x4e [ 4.660468] pc : [<00000000>] lr : [<21044c97>] psr: 0000000b [ 4.660534] sp : 2185bd10 ip : 2185bcd0 fp : 00080001 [ 4.660591] r10: 21757b40 r9 : 2175eea4 r8 : 2185bdf4 [ 4.660649] r7 : 2175ee88 r6 : 21757b40 r5 : 21fecb40 r4 : 00000000 [ 4.660718] r3 : 00000001 r2 : 00000001 r1 : 21fecb40 r0 : 21757b40 [ 4.660785] xPSR: 0000000b [ 4.661126] filemap_read_folio from filemap_get_pages+0x127/0x36e [ 4.661247] filemap_get_pages from generic_file_buffered_splice_read.constprop.5+0x85/0x244 [ 4.661342] generic_file_buffered_splice_read.constprop.5 from generic_file_splice_read+0x1c3/0x1e2 [ 4.661436] generic_file_splice_read from splice_file_to_pipe+0x2f/0x48 [ 4.661509] splice_file_to_pipe from do_sendfile+0x193/0x1b8 [ 4.661573] do_sendfile from sys_sendfile64+0x63/0x70 [ 4.661653] sys_sendfile64 from ret_fast_syscall+0x1/0x4c [ 4.661740] Exception stack(0x2185bfa8 to 0x2185bff0) [ 4.661854] bfa0: 000000ef 00000000 00000001 00000003 00000000 01000000 [ 4.661944] bfc0: 000000ef 00000000 21b56e48 000000ef 00000001 00000001 00000000 21b51770 [ 4.662023] bfe0: 21b4e791 21b56e48 21b174c9 21b0265e
Guenter Roeck <linux@roeck-us.net> wrote: > [ 4.660118] PC is at 0x0 > [ 4.660248] LR is at filemap_read_folio+0x17/0x4e Do you know what the filesystem is that's being read from? I think the problem is that there are a few filesystems/drivers that call generic_file_splice_read() but don't have a ->read_folio(). Now most of these can be made to call direct_splice_read() instead, leaving just coda, overlayfs and shmem. Coda and overlayfs can be made to pass the request down a layer. I'm about to look into shmem. David
On 2/13/23 14:43, David Howells wrote: > Guenter Roeck <linux@roeck-us.net> wrote: > >> [ 4.660118] PC is at 0x0 >> [ 4.660248] LR is at filemap_read_folio+0x17/0x4e > > Do you know what the filesystem is that's being read from? > > I think the problem is that there are a few filesystems/drivers that call > generic_file_splice_read() but don't have a ->read_folio(). Now most of these > can be made to call direct_splice_read() instead, leaving just coda, overlayfs > and shmem. > > Coda and overlayfs can be made to pass the request down a layer. I'm about to > look into shmem. > Both are initrd. Guenter
Guenter Roeck <linux@roeck-us.net> wrote:
> Both are initrd.
Do you mean rootfs? And, if so, is that tmpfs-based or ramfs-based?
David
On 2/13/23 15:12, David Howells wrote: > Guenter Roeck <linux@roeck-us.net> wrote: > >> Both are initrd. > > Do you mean rootfs? And, if so, is that tmpfs-based or ramfs-based? > Both are provided to the kernel using the -initrd qemu option, which usually means that the address/location is passed to the kernel through either a register or a data structure. I have not really paid much attention to what the kernel is doing with that information. It is in cpio format, so it must be decompressed, but I don't know how it is actually handled (nor why this doesn't fail on other boots from initrd). Guenter
diff --git a/fs/splice.c b/fs/splice.c index b4be6fc314a1..963cbf20abc8 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -22,6 +22,7 @@ #include <linux/fs.h> #include <linux/file.h> #include <linux/pagemap.h> +#include <linux/pagevec.h> #include <linux/splice.h> #include <linux/memcontrol.h> #include <linux/mm_inline.h> @@ -375,6 +376,135 @@ static ssize_t generic_file_direct_splice_read(struct file *in, loff_t *ppos, return ret; } +/* + * Splice subpages from a folio into a pipe. + */ +static size_t splice_folio_into_pipe(struct pipe_inode_info *pipe, + struct folio *folio, + loff_t fpos, size_t size) +{ + struct page *page; + size_t spliced = 0, offset = offset_in_folio(folio, fpos); + + page = folio_page(folio, offset / PAGE_SIZE); + size = min(size, folio_size(folio) - offset); + offset %= PAGE_SIZE; + + while (spliced < size && + !pipe_full(pipe->head, pipe->tail, pipe->max_usage)) { + struct pipe_buffer *buf = &pipe->bufs[pipe->head & (pipe->ring_size - 1)]; + size_t part = min_t(size_t, PAGE_SIZE - offset, size - spliced); + + *buf = (struct pipe_buffer) { + .ops = &page_cache_pipe_buf_ops, + .page = page, + .offset = offset, + .len = part, + }; + folio_get(folio); + pipe->head++; + page++; + spliced += part; + offset = 0; + } + + return spliced; +} + +/* + * Splice folios from the pagecache of a buffered (ie. non-O_DIRECT) file into + * a pipe. + */ +static ssize_t generic_file_buffered_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, + size_t len, + unsigned int flags) +{ + struct folio_batch fbatch; + size_t total_spliced = 0, used, npages; + loff_t isize, end_offset; + bool writably_mapped; + int i, error = 0; + + struct kiocb iocb = { + .ki_filp = in, + .ki_pos = *ppos, + }; + + /* Work out how much data we can actually add into the pipe */ + used = pipe_occupancy(pipe->head, pipe->tail); + npages = max_t(ssize_t, pipe->max_usage - used, 0); + len = min_t(size_t, len, npages * PAGE_SIZE); + + folio_batch_init(&fbatch); + + do { + cond_resched(); + + if (*ppos >= i_size_read(file_inode(in))) + break; + + iocb.ki_pos = *ppos; + error = filemap_get_pages(&iocb, len, &fbatch, true); + if (error < 0) + break; + + /* + * i_size must be checked after we know the pages are Uptodate. + * + * Checking i_size after the check allows us to calculate + * the correct value for "nr", which means the zero-filled + * part of the page is not copied back to userspace (unless + * another truncate extends the file - this is desired though). + */ + isize = i_size_read(file_inode(in)); + if (unlikely(*ppos >= isize)) + break; + end_offset = min_t(loff_t, isize, *ppos + len); + + /* + * Once we start copying data, we don't want to be touching any + * cachelines that might be contended: + */ + writably_mapped = mapping_writably_mapped(in->f_mapping); + + for (i = 0; i < folio_batch_count(&fbatch); i++) { + struct folio *folio = fbatch.folios[i]; + size_t n; + + if (folio_pos(folio) >= end_offset) + goto out; + folio_mark_accessed(folio); + + /* + * If users can be writing to this folio using arbitrary + * virtual addresses, take care of potential aliasing + * before reading the folio on the kernel side. + */ + if (writably_mapped) + flush_dcache_folio(folio); + + n = splice_folio_into_pipe(pipe, folio, *ppos, len); + if (!n) + goto out; + len -= n; + total_spliced += n; + *ppos += n; + in->f_ra.prev_pos = *ppos; + if (pipe_full(pipe->head, pipe->tail, pipe->max_usage)) + goto out; + } + + folio_batch_release(&fbatch); + } while (len); + +out: + folio_batch_release(&fbatch); + file_accessed(in); + + return total_spliced ? total_spliced : error; +} + /** * generic_file_splice_read - splice data from file to a pipe * @in: file to splice from @@ -392,32 +522,13 @@ ssize_t generic_file_splice_read(struct file *in, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags) { - struct iov_iter to; - struct kiocb kiocb; - int ret; - + if (unlikely(*ppos >= file_inode(in)->i_sb->s_maxbytes)) + return 0; + if (unlikely(!len)) + return 0; if (in->f_flags & O_DIRECT) return generic_file_direct_splice_read(in, ppos, pipe, len, flags); - - iov_iter_pipe(&to, ITER_DEST, pipe, len); - init_sync_kiocb(&kiocb, in); - kiocb.ki_pos = *ppos; - ret = call_read_iter(in, &kiocb, &to); - if (ret > 0) { - *ppos = kiocb.ki_pos; - file_accessed(in); - } else if (ret < 0) { - /* free what was emitted */ - pipe_discard_from(pipe, to.start_head); - /* - * callers of ->splice_read() expect -EAGAIN on - * "can't put anything in there", rather than -EFAULT. - */ - if (ret == -EFAULT) - ret = -EAGAIN; - } - - return ret; + return generic_file_buffered_splice_read(in, ppos, pipe, len, flags); } EXPORT_SYMBOL(generic_file_splice_read);
Provide a function to do splice read from a buffered file, pulling the folios out of the pagecache directly by calling filemap_get_pages() to do any required reading and then pasting the returned folios into the pipe. A helper function is provided to do the actual folio pasting and will handle multipage folios by splicing as many of the relevant subpages as will fit into the pipe. The ITER_BVEC-based splicing previously added is then only used for splicing from O_DIRECT files. The code is loosely based on filemap_read() and might belong in mm/filemap.c with that as it needs to use filemap_get_pages(). With this, ITER_PIPE is no longer used. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Christoph Hellwig <hch@lst.de> cc: Al Viro <viro@zeniv.linux.org.uk> cc: David Hildenbrand <david@redhat.com> cc: John Hubbard <jhubbard@nvidia.com> cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org --- fs/splice.c | 159 ++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 135 insertions(+), 24 deletions(-)