Message ID | 20241024050021.627350-1-hch@lst.de (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages | expand |
On Thu, 24 Oct 2024 07:00:15 +0200, Christoph Hellwig wrote: > The iov_iter_extract_pages interface allows to return physically > discontiguous pages, as long as all but the first and last page > in the array are page aligned and page size. Rewrite > iov_iter_extract_bvec_pages to take advantage of that instead of only > returning ranges of physically contiguous pages. > > > [...] Applied, thanks! [1/1] iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages (no commit info) Best regards,
Hi, On 2024-10-24 07:00, Christoph Hellwig wrote: > From: Ming Lei <ming.lei@redhat.com> > > The iov_iter_extract_pages interface allows to return physically > discontiguous pages, as long as all but the first and last page > in the array are page aligned and page size. Rewrite > iov_iter_extract_bvec_pages to take advantage of that instead of only > returning ranges of physically contiguous pages. > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > [hch: minor cleanups, new commit log] > Signed-off-by: Christoph Hellwig <hch@lst.de> With this patch (e4e535bff2bc82bb49a633775f9834beeaa527db in next-20241030), I'm unable to connect via nvme-tcp with this in the log: nvme nvme1: failed to send request -5 nvme nvme1: Connect command failed: host path error nvme nvme1: failed to connect queue: 0 ret=880 With the patch reverted it works as expected: nvme nvme1: creating 24 I/O queues. nvme nvme1: mapped 24/0/0 default/read/poll queues. nvme nvme1: new ctrl: NQN "nqn.2018-06.eu.kasm.int:freenas:backup:parmesan.int.kasm.eu", addr [2001:0678:0a5c:1204:6245:cbff:fe9c:4f59]:4420, hostnqn: nqn.2018-06.eu.kasm.int:parmesan Please let me know if there's anything else you need. Regards, Klara Modin +CC: linux-nvme > --- > lib/iov_iter.c | 67 +++++++++++++++++++++++++++++++++----------------- > 1 file changed, 45 insertions(+), 22 deletions(-) > > diff --git a/lib/iov_iter.c b/lib/iov_iter.c > index 1abb32c0da50..9fc06f5fb748 100644 > --- a/lib/iov_iter.c > +++ b/lib/iov_iter.c > @@ -1677,8 +1677,8 @@ static ssize_t iov_iter_extract_xarray_pages(struct iov_iter *i, > } > > /* > - * Extract a list of contiguous pages from an ITER_BVEC iterator. This does > - * not get references on the pages, nor does it get a pin on them. > + * Extract a list of virtually contiguous pages from an ITER_BVEC iterator. > + * This does not get references on the pages, nor does it get a pin on them. > */ > static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i, > struct page ***pages, size_t maxsize, > @@ -1686,35 +1686,58 @@ static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i, > iov_iter_extraction_t extraction_flags, > size_t *offset0) > { > - struct page **p, *page; > - size_t skip = i->iov_offset, offset, size; > - int k; > + size_t skip = i->iov_offset, size = 0; > + struct bvec_iter bi; > + int k = 0; > > - for (;;) { > - if (i->nr_segs == 0) > - return 0; > - size = min(maxsize, i->bvec->bv_len - skip); > - if (size) > - break; > + if (i->nr_segs == 0) > + return 0; > + > + if (i->iov_offset == i->bvec->bv_len) { > i->iov_offset = 0; > i->nr_segs--; > i->bvec++; > skip = 0; > } > + bi.bi_size = maxsize + skip; > + bi.bi_bvec_done = skip; > + > + maxpages = want_pages_array(pages, maxsize, skip, maxpages); > + > + while (bi.bi_size && bi.bi_idx < i->nr_segs) { > + struct bio_vec bv = bvec_iter_bvec(i->bvec, bi); > + > + /* > + * The iov_iter_extract_pages interface only allows an offset > + * into the first page. Break out of the loop if we see an > + * offset into subsequent pages, the caller will have to call > + * iov_iter_extract_pages again for the reminder. > + */ > + if (k) { > + if (bv.bv_offset) > + break; > + } else { > + *offset0 = bv.bv_offset; > + } > > - skip += i->bvec->bv_offset; > - page = i->bvec->bv_page + skip / PAGE_SIZE; > - offset = skip % PAGE_SIZE; > - *offset0 = offset; > + (*pages)[k++] = bv.bv_page; > + size += bv.bv_len; > > - maxpages = want_pages_array(pages, size, offset, maxpages); > - if (!maxpages) > - return -ENOMEM; > - p = *pages; > - for (k = 0; k < maxpages; k++) > - p[k] = page + k; > + if (k >= maxpages) > + break; > + > + /* > + * We are done when the end of the bvec doesn't align to a page > + * boundary as that would create a hole in the returned space. > + * The caller will handle this with another call to > + * iov_iter_extract_pages. > + */ > + if (bv.bv_offset + bv.bv_len != PAGE_SIZE) > + break; > + > + bvec_iter_advance_single(i->bvec, &bi, bv.bv_len); > + } > > - size = min_t(size_t, size, maxpages * PAGE_SIZE - offset); > iov_iter_advance(i, size); > return size; > } # bad: [cadd411a755d40bf717c2514afb90c7c0762aefc] crypto: rsassa-pkcs1 - Migrate to sig_alg backend # good: [e42b1a9a2557aa94fee47f078633677198386a52] Merge tag 'spi-fix-v6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi git bisect start 'next' 'next/stable' # good: [5837b9daa339313b9009011e0173dd874de3f132] Merge branch 'spi-nor/next' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git git bisect good 5837b9daa339313b9009011e0173dd874de3f132 # bad: [64f1d5c3ad7542ea8f979988d2af75fd4e18148e] Merge branch 'for-backlight-next' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight.git git bisect bad 64f1d5c3ad7542ea8f979988d2af75fd4e18148e # good: [e7103f8785504dd5c6aad118fbc64fc49eda33af] Merge tag 'amd-drm-next-6.13-2024-10-25' of https://gitlab.freedesktop.org/agd5f/linux into drm-next git bisect good e7103f8785504dd5c6aad118fbc64fc49eda33af # good: [7487abf914ecae6ad2690493c2a3fb998738bd71] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394.git git bisect good 7487abf914ecae6ad2690493c2a3fb998738bd71 # good: [3f743e703c251c9c3f22088bcdc0330e165c8c94] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git git bisect good 3f743e703c251c9c3f22088bcdc0330e165c8c94 # bad: [9401ff8e2d60f43ecf343c20a7595b2711bce217] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux git bisect bad 9401ff8e2d60f43ecf343c20a7595b2711bce217 # good: [aff750e7094e26eae686965930ef2bec7f4152da] io_uring/rsrc: clear ->buf before mapping pages git bisect good aff750e7094e26eae686965930ef2bec7f4152da # good: [904ebd2527c507752f5ddb358f887d2e0dab96a0] block: remove redundant explicit memory barrier from rq_qos waiter and waker git bisect good 904ebd2527c507752f5ddb358f887d2e0dab96a0 # bad: [d49acf07fd5629a7e96d3f6cb4a28f5cc04a10bf] Merge branch 'for-next' of git://git.kernel.dk/linux-block.git git bisect bad d49acf07fd5629a7e96d3f6cb4a28f5cc04a10bf # good: [f1be1788a32e8fa63416ad4518bbd1a85a825c9d] block: model freeze & enter queue as lock for supporting lockdep git bisect good f1be1788a32e8fa63416ad4518bbd1a85a825c9d # bad: [793c08dfe78b646031fe2aa5910e6fef6e872e4a] Merge branch 'for-6.13/block' into for-next git bisect bad 793c08dfe78b646031fe2aa5910e6fef6e872e4a # bad: [2f5a65ef30a636d5030917eebd283ac447a212af] block: add a bdev_limits helper git bisect bad 2f5a65ef30a636d5030917eebd283ac447a212af # bad: [e4e535bff2bc82bb49a633775f9834beeaa527db] iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages git bisect bad e4e535bff2bc82bb49a633775f9834beeaa527db # first bad commit: [e4e535bff2bc82bb49a633775f9834beeaa527db] iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages
On Wed, Oct 30, 2024 at 06:56:48PM +0100, Klara Modin wrote: > Hi, > > On 2024-10-24 07:00, Christoph Hellwig wrote: > > From: Ming Lei <ming.lei@redhat.com> > > > > The iov_iter_extract_pages interface allows to return physically > > discontiguous pages, as long as all but the first and last page > > in the array are page aligned and page size. Rewrite > > iov_iter_extract_bvec_pages to take advantage of that instead of only > > returning ranges of physically contiguous pages. > > > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > [hch: minor cleanups, new commit log] > > Signed-off-by: Christoph Hellwig <hch@lst.de> > > With this patch (e4e535bff2bc82bb49a633775f9834beeaa527db in next-20241030), > I'm unable to connect via nvme-tcp with this in the log: > > nvme nvme1: failed to send request -5 > nvme nvme1: Connect command failed: host path error > nvme nvme1: failed to connect queue: 0 ret=880 > > With the patch reverted it works as expected: > > nvme nvme1: creating 24 I/O queues. > nvme nvme1: mapped 24/0/0 default/read/poll queues. > nvme nvme1: new ctrl: NQN > "nqn.2018-06.eu.kasm.int:freenas:backup:parmesan.int.kasm.eu", addr > [2001:0678:0a5c:1204:6245:cbff:fe9c:4f59]:4420, hostnqn: > nqn.2018-06.eu.kasm.int:parmesan I can't reproduce it by running blktest 'nvme_trtype=tcp ./check nvme/' on both next tree & for-6.13/block. Can you collect the following bpftrace log by running the script before connecting to nvme-tcp? Please enable the following kernel options for bpftrace: CONFIG_KPROBE_EVENTS_ON_NOTRACE=y CONFIG_NVME_CORE=y CONFIG_NVME_FABRICS=y CONFIG_NVME_TCP=y Btw, bpftrace doesn't work on next tree if nvme is built as module. # cat extract.bt #!/usr/bin/bpftrace kprobe:nvmf_connect_io_queue { @connect[tid]=1; } kretprobe:nvmf_connect_io_queue { @connect[tid]=0; } kprobe:iov_iter_extract_pages /@connect[tid]/ { $i = (struct iov_iter *)arg0; printf("extract pages: iter(cnt %lu off %lu) maxsize %u maxpages %u offset %lu\n", $i->count, $i->iov_offset, arg2, arg3, *((uint32 *)arg4)); printf("\t bvec(off %u len %u)\n", $i->bvec->bv_offset, $i->bvec->bv_len); } kretprobe:iov_iter_extract_pages /@connect[tid]/ { printf("extract pages: ret %d\n", retval); } END { clear(@connect); } Thanks, Ming
On Thu, Oct 31, 2024 at 08:14:49AM +0800, Ming Lei wrote: > On Wed, Oct 30, 2024 at 06:56:48PM +0100, Klara Modin wrote: > > Hi, > > > > On 2024-10-24 07:00, Christoph Hellwig wrote: > > > From: Ming Lei <ming.lei@redhat.com> > > > > > > The iov_iter_extract_pages interface allows to return physically > > > discontiguous pages, as long as all but the first and last page > > > in the array are page aligned and page size. Rewrite > > > iov_iter_extract_bvec_pages to take advantage of that instead of only > > > returning ranges of physically contiguous pages. > > > > > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > > [hch: minor cleanups, new commit log] > > > Signed-off-by: Christoph Hellwig <hch@lst.de> > > > > With this patch (e4e535bff2bc82bb49a633775f9834beeaa527db in next-20241030), > > I'm unable to connect via nvme-tcp with this in the log: > > > > nvme nvme1: failed to send request -5 > > nvme nvme1: Connect command failed: host path error > > nvme nvme1: failed to connect queue: 0 ret=880 > > > > With the patch reverted it works as expected: > > > > nvme nvme1: creating 24 I/O queues. > > nvme nvme1: mapped 24/0/0 default/read/poll queues. > > nvme nvme1: new ctrl: NQN > > "nqn.2018-06.eu.kasm.int:freenas:backup:parmesan.int.kasm.eu", addr > > [2001:0678:0a5c:1204:6245:cbff:fe9c:4f59]:4420, hostnqn: > > nqn.2018-06.eu.kasm.int:parmesan > > I can't reproduce it by running blktest 'nvme_trtype=tcp ./check nvme/' > on both next tree & for-6.13/block. > > Can you collect the following bpftrace log by running the script before > connecting to nvme-tcp? And please try the following patch: diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 9fc06f5fb748..c761f6db3cb4 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1699,6 +1699,7 @@ static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i, i->bvec++; skip = 0; } + bi.bi_idx = 0; bi.bi_size = maxsize + skip; bi.bi_bvec_done = skip; Thanks, Ming
On 2024-10-31 01:22, Ming Lei wrote: > On Thu, Oct 31, 2024 at 08:14:49AM +0800, Ming Lei wrote: >> On Wed, Oct 30, 2024 at 06:56:48PM +0100, Klara Modin wrote: >>> Hi, >>> >>> On 2024-10-24 07:00, Christoph Hellwig wrote: >>>> From: Ming Lei <ming.lei@redhat.com> >>>> >>>> The iov_iter_extract_pages interface allows to return physically >>>> discontiguous pages, as long as all but the first and last page >>>> in the array are page aligned and page size. Rewrite >>>> iov_iter_extract_bvec_pages to take advantage of that instead of only >>>> returning ranges of physically contiguous pages. >>>> >>>> Signed-off-by: Ming Lei <ming.lei@redhat.com> >>>> [hch: minor cleanups, new commit log] >>>> Signed-off-by: Christoph Hellwig <hch@lst.de> >>> >>> With this patch (e4e535bff2bc82bb49a633775f9834beeaa527db in next-20241030), >>> I'm unable to connect via nvme-tcp with this in the log: >>> >>> nvme nvme1: failed to send request -5 >>> nvme nvme1: Connect command failed: host path error >>> nvme nvme1: failed to connect queue: 0 ret=880 >>> >>> With the patch reverted it works as expected: >>> >>> nvme nvme1: creating 24 I/O queues. >>> nvme nvme1: mapped 24/0/0 default/read/poll queues. >>> nvme nvme1: new ctrl: NQN >>> "nqn.2018-06.eu.kasm.int:freenas:backup:parmesan.int.kasm.eu", addr >>> [2001:0678:0a5c:1204:6245:cbff:fe9c:4f59]:4420, hostnqn: >>> nqn.2018-06.eu.kasm.int:parmesan >> >> I can't reproduce it by running blktest 'nvme_trtype=tcp ./check nvme/' >> on both next tree & for-6.13/block. >> >> Can you collect the following bpftrace log by running the script before >> connecting to nvme-tcp? I didn't seem to get any output from the bpftrace script (I confirmed that I had the config as you requested, but I'm not very familiar with bpftrace so I could have done something wrong). I could, however, reproduce the issue in qemu and added breakpoints on nvmf_connect_io_queue and iov_iter_extract_pages. The breakpoint on iov_iter_extract_pages got hit once when running nvme connect: (gdb) break nvmf_connect_io_queue Breakpoint 1 at 0xffffffff81a5d960: file /home/klara/git/linux/drivers/nvme/host/fabrics.c, line 525. (gdb) break iov_iter_extract_pages Breakpoint 2 at 0xffffffff817633b0: file /home/klara/git/linux/lib/iov_iter.c, line 1900. (gdb) c Continuing. [Switching to Thread 1.1] Thread 1 hit Breakpoint 2, iov_iter_extract_pages (i=i@entry=0xffffc900001ebd68, pages=pages@entry=0xffffc900001ebb08, maxsize=maxsize@entry=72, maxpages=8, extraction_flags=extraction_flags@entry=0, offset0=offset0@entry=0xffffc900001ebb10) at /home/klara/git/linux/lib/iov_iter.c:1900 1900 { (gdb) print i->count $5 = 72 (gdb) print i->iov_offset $6 = 0 (gdb) print i->bvec->bv_offset $7 = 3952 (gdb) print i->bvec->bv_len $8 = 72 (gdb) c Continuing. I didn't hit the breakpoint in nvmf_connect_io_queue, but I instead hit it if I add it to nvmf_connect_admin_queue. I added this function to the bpftrace script but that didn't produce any output either. > > And please try the following patch: > > diff --git a/lib/iov_iter.c b/lib/iov_iter.c > index 9fc06f5fb748..c761f6db3cb4 100644 > --- a/lib/iov_iter.c > +++ b/lib/iov_iter.c > @@ -1699,6 +1699,7 @@ static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i, > i->bvec++; > skip = 0; > } > + bi.bi_idx = 0; > bi.bi_size = maxsize + skip; > bi.bi_bvec_done = skip; > > Applying this seems to fix the problem. Thanks, Klara Modin > > Thanks, > Ming >
On Thu, Oct 31, 2024 at 09:42:32AM +0100, Klara Modin wrote: > On 2024-10-31 01:22, Ming Lei wrote: > > On Thu, Oct 31, 2024 at 08:14:49AM +0800, Ming Lei wrote: > > > On Wed, Oct 30, 2024 at 06:56:48PM +0100, Klara Modin wrote: > > > > Hi, > > > > > > > > On 2024-10-24 07:00, Christoph Hellwig wrote: > > > > > From: Ming Lei <ming.lei@redhat.com> > > > > > > > > > > The iov_iter_extract_pages interface allows to return physically > > > > > discontiguous pages, as long as all but the first and last page > > > > > in the array are page aligned and page size. Rewrite > > > > > iov_iter_extract_bvec_pages to take advantage of that instead of only > > > > > returning ranges of physically contiguous pages. > > > > > > > > > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > > > > [hch: minor cleanups, new commit log] > > > > > Signed-off-by: Christoph Hellwig <hch@lst.de> > > > > > > > > With this patch (e4e535bff2bc82bb49a633775f9834beeaa527db in next-20241030), > > > > I'm unable to connect via nvme-tcp with this in the log: > > > > > > > > nvme nvme1: failed to send request -5 > > > > nvme nvme1: Connect command failed: host path error > > > > nvme nvme1: failed to connect queue: 0 ret=880 > > > > > > > > With the patch reverted it works as expected: > > > > > > > > nvme nvme1: creating 24 I/O queues. > > > > nvme nvme1: mapped 24/0/0 default/read/poll queues. > > > > nvme nvme1: new ctrl: NQN > > > > "nqn.2018-06.eu.kasm.int:freenas:backup:parmesan.int.kasm.eu", addr > > > > [2001:0678:0a5c:1204:6245:cbff:fe9c:4f59]:4420, hostnqn: > > > > nqn.2018-06.eu.kasm.int:parmesan > > > > > > I can't reproduce it by running blktest 'nvme_trtype=tcp ./check nvme/' > > > on both next tree & for-6.13/block. > > > > > > Can you collect the following bpftrace log by running the script before > > > connecting to nvme-tcp? > > I didn't seem to get any output from the bpftrace script (I confirmed that I > had the config as you requested, but I'm not very familiar with bpftrace so > I could have done something wrong). I could, however, reproduce the issue in It works for me on Fedora(37, 40). > qemu and added breakpoints on nvmf_connect_io_queue and > iov_iter_extract_pages. The breakpoint on iov_iter_extract_pages got hit > once when running nvme connect: > > (gdb) break nvmf_connect_io_queue > Breakpoint 1 at 0xffffffff81a5d960: file > /home/klara/git/linux/drivers/nvme/host/fabrics.c, line 525. > (gdb) break iov_iter_extract_pages > Breakpoint 2 at 0xffffffff817633b0: file > /home/klara/git/linux/lib/iov_iter.c, line 1900. > (gdb) c > Continuing. > [Switching to Thread 1.1] Wow, debug kernel with gdb, cool! > > Thread 1 hit Breakpoint 2, iov_iter_extract_pages > (i=i@entry=0xffffc900001ebd68, > pages=pages@entry=0xffffc900001ebb08, maxsize=maxsize@entry=72, > maxpages=8, > extraction_flags=extraction_flags@entry=0, > offset0=offset0@entry=0xffffc900001ebb10) > at /home/klara/git/linux/lib/iov_iter.c:1900 > 1900 { > (gdb) print i->count > $5 = 72 > (gdb) print i->iov_offset > $6 = 0 > (gdb) print i->bvec->bv_offset > $7 = 3952 > (gdb) print i->bvec->bv_len > $8 = 72 > (gdb) c > Continuing. > > I didn't hit the breakpoint in nvmf_connect_io_queue, but I instead hit it > if I add it to nvmf_connect_admin_queue. I added this function to the > bpftrace script but that didn't produce any output either. Your kernel config shows all BTF related options are enabled, maybe bpftrace userspace issue? > > > > > And please try the following patch: > > > > diff --git a/lib/iov_iter.c b/lib/iov_iter.c > > index 9fc06f5fb748..c761f6db3cb4 100644 > > --- a/lib/iov_iter.c > > +++ b/lib/iov_iter.c > > @@ -1699,6 +1699,7 @@ static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i, > > i->bvec++; > > skip = 0; > > } > > + bi.bi_idx = 0; > > bi.bi_size = maxsize + skip; > > bi.bi_bvec_done = skip; > > > > > > Applying this seems to fix the problem. Thanks for the test, and the patch is sent out. thanks, Ming
On 10/24/24 7:00 AM, Christoph Hellwig wrote: > From: Ming Lei <ming.lei@redhat.com> > > The iov_iter_extract_pages interface allows to return physically > discontiguous pages, as long as all but the first and last page > in the array are page aligned and page size. Rewrite > iov_iter_extract_bvec_pages to take advantage of that instead of only > returning ranges of physically contiguous pages. > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > [hch: minor cleanups, new commit log] > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > lib/iov_iter.c | 67 +++++++++++++++++++++++++++++++++----------------- > 1 file changed, 45 insertions(+), 22 deletions(-) > > diff --git a/lib/iov_iter.c b/lib/iov_iter.c > index 1abb32c0da50..9fc06f5fb748 100644 > --- a/lib/iov_iter.c > +++ b/lib/iov_iter.c > @@ -1677,8 +1677,8 @@ static ssize_t iov_iter_extract_xarray_pages(struct iov_iter *i, > } > > /* > - * Extract a list of contiguous pages from an ITER_BVEC iterator. This does > - * not get references on the pages, nor does it get a pin on them. > + * Extract a list of virtually contiguous pages from an ITER_BVEC iterator. > + * This does not get references on the pages, nor does it get a pin on them. > */ > static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i, > struct page ***pages, size_t maxsize, > @@ -1686,35 +1686,58 @@ static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i, > iov_iter_extraction_t extraction_flags, > size_t *offset0) > { > - struct page **p, *page; > - size_t skip = i->iov_offset, offset, size; > - int k; > + size_t skip = i->iov_offset, size = 0; > + struct bvec_iter bi; > + int k = 0; > > - for (;;) { > - if (i->nr_segs == 0) > - return 0; > - size = min(maxsize, i->bvec->bv_len - skip); > - if (size) > - break; > + if (i->nr_segs == 0) > + return 0; > + > + if (i->iov_offset == i->bvec->bv_len) { > i->iov_offset = 0; > i->nr_segs--; > i->bvec++; > skip = 0; > } > + bi.bi_size = maxsize + skip; > + bi.bi_bvec_done = skip; > + > + maxpages = want_pages_array(pages, maxsize, skip, maxpages); > + > + while (bi.bi_size && bi.bi_idx < i->nr_segs) { > + struct bio_vec bv = bvec_iter_bvec(i->bvec, bi); > + > + /* > + * The iov_iter_extract_pages interface only allows an offset > + * into the first page. Break out of the loop if we see an > + * offset into subsequent pages, the caller will have to call > + * iov_iter_extract_pages again for the reminder. > + */ > + if (k) { > + if (bv.bv_offset) > + break; > + } else { > + *offset0 = bv.bv_offset; > + } > > - skip += i->bvec->bv_offset; > - page = i->bvec->bv_page + skip / PAGE_SIZE; > - offset = skip % PAGE_SIZE; > - *offset0 = offset; > + (*pages)[k++] = bv.bv_page; > + size += bv.bv_len; > > - maxpages = want_pages_array(pages, size, offset, maxpages); > - if (!maxpages) > - return -ENOMEM; > - p = *pages; > - for (k = 0; k < maxpages; k++) > - p[k] = page + k; > + if (k >= maxpages) > + break; > + > + /* > + * We are done when the end of the bvec doesn't align to a page > + * boundary as that would create a hole in the returned space. > + * The caller will handle this with another call to > + * iov_iter_extract_pages. > + */ > + if (bv.bv_offset + bv.bv_len != PAGE_SIZE) > + break; > + > + bvec_iter_advance_single(i->bvec, &bi, bv.bv_len); > + } > > - size = min_t(size_t, size, maxpages * PAGE_SIZE - offset); > iov_iter_advance(i, size); > return size; > } This is causing major network regression in UDP sendfile, found by syzbot. I will release the syzbot report and this fix : diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 65ec660c2960..e19aab1fccca 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1728,6 +1728,10 @@ static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i, (*pages)[k++] = bv.bv_page; size += bv.bv_len; + if (size > maxsize) { + size = maxsize; + break; + } if (k >= maxpages) break;
On 11/1/24 11:05 AM, Eric Dumazet wrote: > > On 10/24/24 7:00 AM, Christoph Hellwig wrote: >> From: Ming Lei <ming.lei@redhat.com> >> >> The iov_iter_extract_pages interface allows to return physically >> discontiguous pages, as long as all but the first and last page >> in the array are page aligned and page size. Rewrite >> iov_iter_extract_bvec_pages to take advantage of that instead of only >> returning ranges of physically contiguous pages. >> >> Signed-off-by: Ming Lei <ming.lei@redhat.com> >> [hch: minor cleanups, new commit log] >> Signed-off-by: Christoph Hellwig <hch@lst.de> >> --- >> lib/iov_iter.c | 67 +++++++++++++++++++++++++++++++++----------------- >> 1 file changed, 45 insertions(+), 22 deletions(-) >> >> diff --git a/lib/iov_iter.c b/lib/iov_iter.c >> index 1abb32c0da50..9fc06f5fb748 100644 >> --- a/lib/iov_iter.c >> +++ b/lib/iov_iter.c >> @@ -1677,8 +1677,8 @@ static ssize_t iov_iter_extract_xarray_pages(struct iov_iter *i, >> } >> /* >> - * Extract a list of contiguous pages from an ITER_BVEC iterator. This does >> - * not get references on the pages, nor does it get a pin on them. >> + * Extract a list of virtually contiguous pages from an ITER_BVEC iterator. >> + * This does not get references on the pages, nor does it get a pin on them. >> */ >> static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i, >> struct page ***pages, size_t maxsize, >> @@ -1686,35 +1686,58 @@ static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i, >> iov_iter_extraction_t extraction_flags, >> size_t *offset0) >> { >> - struct page **p, *page; >> - size_t skip = i->iov_offset, offset, size; >> - int k; >> + size_t skip = i->iov_offset, size = 0; >> + struct bvec_iter bi; >> + int k = 0; >> - for (;;) { >> - if (i->nr_segs == 0) >> - return 0; >> - size = min(maxsize, i->bvec->bv_len - skip); >> - if (size) >> - break; >> + if (i->nr_segs == 0) >> + return 0; >> + >> + if (i->iov_offset == i->bvec->bv_len) { >> i->iov_offset = 0; >> i->nr_segs--; >> i->bvec++; >> skip = 0; >> } >> + bi.bi_size = maxsize + skip; >> + bi.bi_bvec_done = skip; >> + >> + maxpages = want_pages_array(pages, maxsize, skip, maxpages); >> + >> + while (bi.bi_size && bi.bi_idx < i->nr_segs) { >> + struct bio_vec bv = bvec_iter_bvec(i->bvec, bi); >> + >> + /* >> + * The iov_iter_extract_pages interface only allows an offset >> + * into the first page. Break out of the loop if we see an >> + * offset into subsequent pages, the caller will have to call >> + * iov_iter_extract_pages again for the reminder. >> + */ >> + if (k) { >> + if (bv.bv_offset) >> + break; >> + } else { >> + *offset0 = bv.bv_offset; >> + } >> - skip += i->bvec->bv_offset; >> - page = i->bvec->bv_page + skip / PAGE_SIZE; >> - offset = skip % PAGE_SIZE; >> - *offset0 = offset; >> + (*pages)[k++] = bv.bv_page; >> + size += bv.bv_len; >> - maxpages = want_pages_array(pages, size, offset, maxpages); >> - if (!maxpages) >> - return -ENOMEM; >> - p = *pages; >> - for (k = 0; k < maxpages; k++) >> - p[k] = page + k; >> + if (k >= maxpages) >> + break; >> + >> + /* >> + * We are done when the end of the bvec doesn't align to a page >> + * boundary as that would create a hole in the returned space. >> + * The caller will handle this with another call to >> + * iov_iter_extract_pages. >> + */ >> + if (bv.bv_offset + bv.bv_len != PAGE_SIZE) >> + break; >> + >> + bvec_iter_advance_single(i->bvec, &bi, bv.bv_len); >> + } >> - size = min_t(size_t, size, maxpages * PAGE_SIZE - offset); >> iov_iter_advance(i, size); >> return size; >> } > > > This is causing major network regression in UDP sendfile, found by syzbot. > > I will release the syzbot report and this fix : > > diff --git a/lib/iov_iter.c b/lib/iov_iter.c > index 65ec660c2960..e19aab1fccca 100644 > --- a/lib/iov_iter.c > +++ b/lib/iov_iter.c > @@ -1728,6 +1728,10 @@ static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i, > (*pages)[k++] = bv.bv_page; > size += bv.bv_len; > > + if (size > maxsize) { > + size = maxsize; > + break; > + } > if (k >= maxpages) > break; Thanks Eric, I've applied your patch.
diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 1abb32c0da50..9fc06f5fb748 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1677,8 +1677,8 @@ static ssize_t iov_iter_extract_xarray_pages(struct iov_iter *i, } /* - * Extract a list of contiguous pages from an ITER_BVEC iterator. This does - * not get references on the pages, nor does it get a pin on them. + * Extract a list of virtually contiguous pages from an ITER_BVEC iterator. + * This does not get references on the pages, nor does it get a pin on them. */ static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i, struct page ***pages, size_t maxsize, @@ -1686,35 +1686,58 @@ static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i, iov_iter_extraction_t extraction_flags, size_t *offset0) { - struct page **p, *page; - size_t skip = i->iov_offset, offset, size; - int k; + size_t skip = i->iov_offset, size = 0; + struct bvec_iter bi; + int k = 0; - for (;;) { - if (i->nr_segs == 0) - return 0; - size = min(maxsize, i->bvec->bv_len - skip); - if (size) - break; + if (i->nr_segs == 0) + return 0; + + if (i->iov_offset == i->bvec->bv_len) { i->iov_offset = 0; i->nr_segs--; i->bvec++; skip = 0; } + bi.bi_size = maxsize + skip; + bi.bi_bvec_done = skip; + + maxpages = want_pages_array(pages, maxsize, skip, maxpages); + + while (bi.bi_size && bi.bi_idx < i->nr_segs) { + struct bio_vec bv = bvec_iter_bvec(i->bvec, bi); + + /* + * The iov_iter_extract_pages interface only allows an offset + * into the first page. Break out of the loop if we see an + * offset into subsequent pages, the caller will have to call + * iov_iter_extract_pages again for the reminder. + */ + if (k) { + if (bv.bv_offset) + break; + } else { + *offset0 = bv.bv_offset; + } - skip += i->bvec->bv_offset; - page = i->bvec->bv_page + skip / PAGE_SIZE; - offset = skip % PAGE_SIZE; - *offset0 = offset; + (*pages)[k++] = bv.bv_page; + size += bv.bv_len; - maxpages = want_pages_array(pages, size, offset, maxpages); - if (!maxpages) - return -ENOMEM; - p = *pages; - for (k = 0; k < maxpages; k++) - p[k] = page + k; + if (k >= maxpages) + break; + + /* + * We are done when the end of the bvec doesn't align to a page + * boundary as that would create a hole in the returned space. + * The caller will handle this with another call to + * iov_iter_extract_pages. + */ + if (bv.bv_offset + bv.bv_len != PAGE_SIZE) + break; + + bvec_iter_advance_single(i->bvec, &bi, bv.bv_len); + } - size = min_t(size_t, size, maxpages * PAGE_SIZE - offset); iov_iter_advance(i, size); return size; }