diff mbox series

[RFC] Btrfs: only subtract from len_to_oe_boundary when it is tracking an extent

Message ID 20230730190226.4001117-1-clm@fb.com (mailing list archive)
State New, archived
Headers show
Series [RFC] Btrfs: only subtract from len_to_oe_boundary when it is tracking an extent | expand

Commit Message

Chris Mason July 30, 2023, 7:02 p.m. UTC
[ This is an RFC because Christoph switched us to almost always set
len_to_oe_boundary in a patch in for-next  I think we still need this
commit for strange corners, but it's already pretty hard to hit reliably
so I wanted to toss it out for discussion. We should consider either
Christoph's "btrfs: limit write bios to a single ordered extent" or this
commit for 6.4 stable as well ]

bio_ctrl->len_to_oe_boundary is used to make sure we stay inside an
extent as we submit bios.  Every time we add a page to the bio, we
decrement those bytes from len_to_oe_boundary, and then we submit the
bio if we happen to hit zero.

Most of the time, len_to_oe_boundary gets set to U32_MAX.  With
Christoph's incoming ("btrfs: limit write bios to a single ordered
extent") we're almost always setting len_to_oe_boundary, so we might not
need this commit moving forward.  But, there's a corner of a corner in here
where we can still create a massive bio, so talking through it:

submit_extent_page() adds pages into our bio, and the size of the bio
ends up limited by:

- Are we contiguous on disk?
- Does bio_add_page() allow us to stuff more in?
- is len_to_oe_boundary > 0?

The len_to_oe_boundary math starts with U32_MAX, which isn't page or
sector aligned, and subtracts from it until it hits zero.  In the
non-ordered extent case, the last IO we submit before we hit zero is
going to be unaligned, triggering BUGs and other sadness.

This is hard to trigger because bio_add_page() isn't going to make a bio
of U32_MAX size unless you give it a perfect set of pages and fully
contiguous extents on disk.  We can hit it pretty reliably while making
large swapfiles during provisioning because the machine is freshly
booted, mostly idle, and the disk is freshly formatted.

The code has been cleaned up and shifted around a few times, but this flaw
has been lurking since the counter was added.  I think Christoph's
commit ended up exposing the bug, but it's pretty tricky to get bios
big enough to prove if older kernels have the same problem.

The fix used here is to skip doing math on len_to_oe_boundary unless
we've changed it from the default U32_MAX value.  bio_add_page() is the
real limited we want, and there's no reason to do extra math when Jens
is doing it for us.

Signed-off-by: Chris Mason <clm@fb.com>
Fixes: 24e6c8082208 ("btrfs: simplify main loop in submit_extent_page")
---
 fs/btrfs/extent_io.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Comments

Sweet Tea Dorminy July 30, 2023, 8:27 p.m. UTC | #1
> +		/*
> +		 * len_to_oe_boundary defaults to U32_MAX, which isn't page or
> +		 * sector aligned.  So, we don't really want to do math on
> +		 * len_to_oe_boundary unless it has been intentionally set by
> +		 * alloc_new_bio().  If we decrement here, we'll potentially
> +		 * end up sending down an unaligned bio once we get close to
> +		 * zero.
> +		 */

As I understand it, the important part is: nothing should use 
len_to_oe_boundary unless there's an actual oe boundary, U32_MAX is just 
a placeholder to convey the information that there's no oe boundary.

So maybe:
/*
  * len_to_oe_boundary being U32_MAX indicates that no ordered extent was
  * found by alloc_new_bio(), so there's no boundary.
  */

I think talking about doing math on U32_MAX here obscures the main point.

Otherwise the bug, and fix, looks good to me.
Qu Wenruo July 31, 2023, 2:27 a.m. UTC | #2
On 2023/7/31 03:02, Chris Mason wrote:
> [ This is an RFC because Christoph switched us to almost always set
> len_to_oe_boundary in a patch in for-next  I think we still need this
> commit for strange corners, but it's already pretty hard to hit reliably
> so I wanted to toss it out for discussion. We should consider either
> Christoph's "btrfs: limit write bios to a single ordered extent" or this
> commit for 6.4 stable as well ]
>
> bio_ctrl->len_to_oe_boundary is used to make sure we stay inside an
> extent as we submit bios.  Every time we add a page to the bio, we
> decrement those bytes from len_to_oe_boundary, and then we submit the
> bio if we happen to hit zero.
>
> Most of the time, len_to_oe_boundary gets set to U32_MAX.  With
> Christoph's incoming ("btrfs: limit write bios to a single ordered
> extent") we're almost always setting len_to_oe_boundary, so we might not
> need this commit moving forward.  But, there's a corner of a corner in here
> where we can still create a massive bio, so talking through it:
>
> submit_extent_page() adds pages into our bio, and the size of the bio
> ends up limited by:
>
> - Are we contiguous on disk?
> - Does bio_add_page() allow us to stuff more in?
> - is len_to_oe_boundary > 0?
>
> The len_to_oe_boundary math starts with U32_MAX, which isn't page or
> sector aligned, and subtracts from it until it hits zero.  In the
> non-ordered extent case, the last IO we submit before we hit zero is
> going to be unaligned, triggering BUGs and other sadness.
>
> This is hard to trigger because bio_add_page() isn't going to make a bio
> of U32_MAX size unless you give it a perfect set of pages and fully
> contiguous extents on disk.  We can hit it pretty reliably while making
> large swapfiles during provisioning because the machine is freshly
> booted, mostly idle, and the disk is freshly formatted.
>
> The code has been cleaned up and shifted around a few times, but this flaw
> has been lurking since the counter was added.  I think Christoph's
> commit ended up exposing the bug, but it's pretty tricky to get bios
> big enough to prove if older kernels have the same problem.
>
> The fix used here is to skip doing math on len_to_oe_boundary unless
> we've changed it from the default U32_MAX value.  bio_add_page() is the
> real limited we want, and there's no reason to do extra math when Jens
> is doing it for us.
>
> Signed-off-by: Chris Mason <clm@fb.com>
> Fixes: 24e6c8082208 ("btrfs: simplify main loop in submit_extent_page")
> ---
>   fs/btrfs/extent_io.c | 12 +++++++++++-
>   1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 6b40189a1a3e..bb2d2d405d04 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -849,7 +849,17 @@ static void submit_extent_page(struct btrfs_bio_ctrl *bio_ctrl,
>   		size -= len;
>   		pg_offset += len;
>   		disk_bytenr += len;
> -		bio_ctrl->len_to_oe_boundary -= len;
> +
> +		/*
> +		 * len_to_oe_boundary defaults to U32_MAX, which isn't page or
> +		 * sector aligned.  So, we don't really want to do math on
> +		 * len_to_oe_boundary unless it has been intentionally set by
> +		 * alloc_new_bio().  If we decrement here, we'll potentially
> +		 * end up sending down an unaligned bio once we get close to
> +		 * zero.
> +		 */
> +		if (bio_ctrl->len_to_oe_boundary != U32_MAX)
> +			bio_ctrl->len_to_oe_boundary -= len;

Personally speaking, I think we'd better moving the ordered extent based
split (only for zoned devices) to btrfs bio layer.

HCH has already done the work to remove the stripe boundary checks to
btrfs bio layer, thus I believe we should also move the checks to the
same layer.
(Although unlike the stripe boundary, the OE boundary may need extra works).


Another concern is, how we could hit a bio which has a size larger than
U32_MAX?

The bio->bi_iter.size is only unsigned int, it should never exceed U32_MAX.

It would help a lot if you can provide a backtrace of such unaligned bio.

Thanks,
Qu
>
>   		/* Ordered extent boundary: move on to a new bio. */
>   		if (bio_ctrl->len_to_oe_boundary == 0)
Christoph Hellwig July 31, 2023, 7 a.m. UTC | #3
On Sun, Jul 30, 2023 at 12:02:26PM -0700, Chris Mason wrote:
> [ This is an RFC because Christoph switched us to almost always set
> len_to_oe_boundary in a patch in for-next  I think we still need this
> commit for strange corners, but it's already pretty hard to hit reliably
> so I wanted to toss it out for discussion. We should consider either
> Christoph's "btrfs: limit write bios to a single ordered extent" or this
> commit for 6.4 stable as well ]

I'm torn.  On the one hand "btrfs: limit write bios to a single ordered
extent" is a pretty significant behavior change, on the other hand
stable-only patches with totally different behavior are always a bit
strange.

Note that with my entire pending queue, len_to_oe_boundary goes away
entirely, but with the current speed of patch application it might take
another 6 to 8 month to get there.

> This is hard to trigger because bio_add_page() isn't going to make a bio
> of U32_MAX size unless you give it a perfect set of pages and fully
> contiguous extents on disk.  We can hit it pretty reliably while making
> large swapfiles during provisioning because the machine is freshly
> booted, mostly idle, and the disk is freshly formatted.

It might be useful to create and xfstests for that, even if it only
hits on a freshly booted machine, although we'll need some reordering
in the xfstests sequence to make sure it gets run early..
Christoph Hellwig July 31, 2023, 7:02 a.m. UTC | #4
On Mon, Jul 31, 2023 at 10:27:02AM +0800, Qu Wenruo wrote:
> Personally speaking, I think we'd better moving the ordered extent based
> split (only for zoned devices) to btrfs bio layer.

That goes completely counter the direction I've been working to.  The
ordered extent is the "container" for writeback, so having a bio
that spawns them creates all kinds of problems.  Thats's the reason
why we now have a bbio->ordered pointer now.

> Another concern is, how we could hit a bio which has a size larger than
> U32_MAX?
>
> The bio->bi_iter.size is only unsigned int, it should never exceed U32_MAX.
>
> It would help a lot if you can provide a backtrace of such unaligned bio.

That's indeed a bit weird.
Chris Mason July 31, 2023, 6:10 p.m. UTC | #5
On 7/31/23 3:02 AM, Christoph Hellwig wrote:
> On Mon, Jul 31, 2023 at 10:27:02AM +0800, Qu Wenruo wrote:
>> Personally speaking, I think we'd better moving the ordered extent based
>> split (only for zoned devices) to btrfs bio layer.
> 
> That goes completely counter the direction I've been working to.  The
> ordered extent is the "container" for writeback, so having a bio
> that spawns them creates all kinds of problems.  Thats's the reason
> why we now have a bbio->ordered pointer now.
> 
>> Another concern is, how we could hit a bio which has a size larger than
>> U32_MAX?
>>
>> The bio->bi_iter.size is only unsigned int, it should never exceed U32_MAX.
>>
>> It would help a lot if you can provide a backtrace of such unaligned bio.
> 
> That's indeed a bit weird.
> 

bio_full() is using a slightly different test:

        if (bio->bi_iter.bi_size > UINT_MAX - len)
                return true;

We're doing:

                /* Cap to the current ordered extent boundary if there is one. */
                if (len > bio_ctrl->len_to_oe_boundary) {
                        ASSERT(bio_ctrl->compress_type == BTRFS_COMPRESS_NONE);
                        ASSERT(is_data_inode(&inode->vfs_inode));
                        len = bio_ctrl->len_to_oe_boundary;
                }

The end result of this is that when we get to a 
bio sized U32_MAX - PAGE_SIZE, both our len_to_oe_boundary and the
bio_add_page() check will correctly decide we can only fit 4095 bytes.

The difference is bio_add_page() would just say no triggering bio
submission.  But submit_extent_page()'s check is first, cutting the
IO down to 4095 bytes instead.

(I do have the stack trace, it's just a boring filemap_fdata_write_and_wait() path off of file release)

-chris
Chris Mason July 31, 2023, 6:52 p.m. UTC | #6
On 7/31/23 3:00 AM, Christoph Hellwig wrote:
> On Sun, Jul 30, 2023 at 12:02:26PM -0700, Chris Mason wrote:
>> [ This is an RFC because Christoph switched us to almost always set
>> len_to_oe_boundary in a patch in for-next  I think we still need this
>> commit for strange corners, but it's already pretty hard to hit reliably
>> so I wanted to toss it out for discussion. We should consider either
>> Christoph's "btrfs: limit write bios to a single ordered extent" or this
>> commit for 6.4 stable as well ]
> 
> I'm torn.  On the one hand "btrfs: limit write bios to a single ordered
> extent" is a pretty significant behavior change, on the other hand
> stable-only patches with totally different behavior are always a bit
> strange.

When are we creating bios without bio_ctrl->wbc set?  I think reads will
do this?

> 
> Note that with my entire pending queue, len_to_oe_boundary goes away
> entirely, but with the current speed of patch application it might take
> another 6 to 8 month to get there.
> 
>> This is hard to trigger because bio_add_page() isn't going to make a bio
>> of U32_MAX size unless you give it a perfect set of pages and fully
>> contiguous extents on disk.  We can hit it pretty reliably while making
>> large swapfiles during provisioning because the machine is freshly
>> booted, mostly idle, and the disk is freshly formatted.
> 
> It might be useful to create and xfstests for that, even if it only
> hits on a freshly booted machine, although we'll need some reordering
> in the xfstests sequence to make sure it gets run early..
> 

The test below works for me on 6.4 but not Linus git.  In theory the
swapfile component is entirely unrelated, but I haven't been able to
trigger without it.

It usually takes about 5 or 6 loops on a virtual machine with 32 cpus,
32GB of ram, and an ssd that can push 800MB/s streaming writes on
/dev/vdb.  I'm happy to add an xfstest, but the failure rate is low
enough that I'm not sure it'll catch anything.

As part of testing 6.4, we booted 100 machines and put them through
provisioning.  17 made it out the other end, and the rest hit a message
similar to this one.  Some had a one byte IO, some had a 4095 byte IO,
it was just whichever bio of our split page happened to finish first:

BTRFS error (device nvme0n1p2): partial page write in btrfs with offset
4095 and length 1

followed by a pretty unsurprising:

kernel BUG at mm/filemap.c:1622!

        if (!__folio_end_writeback(folio))
                BUG();

#!/bin/bash

SUBVOL=/btrfs/swapvol
SWAPFILE=$SUBVOL/swapfile
SZMB=8192

mkfs.btrfs -f /dev/vdb
mount /dev/vdb /btrfs

btrfs subvol create $SUBVOL
sync
chattr +C $SUBVOL

while(true) ; do
    swapoff -a
    dd if=/dev/zero of=$SWAPFILE bs=1M count=$SZMB
    mkswap $SWAPFILE
    swapon $SWAPFILE
done
Chris Mason July 31, 2023, 7:22 p.m. UTC | #7
On 7/30/23 4:27 PM, Sweet Tea Dorminy wrote:
> 
>> +        /*
>> +         * len_to_oe_boundary defaults to U32_MAX, which isn't page or
>> +         * sector aligned.  So, we don't really want to do math on
>> +         * len_to_oe_boundary unless it has been intentionally set by
>> +         * alloc_new_bio().  If we decrement here, we'll potentially
>> +         * end up sending down an unaligned bio once we get close to
>> +         * zero.
>> +         */
> 
> As I understand it, the important part is: nothing should use
> len_to_oe_boundary unless there's an actual oe boundary, U32_MAX is just
> a placeholder to convey the information that there's no oe boundary.
> 
> So maybe:
> /*
>  * len_to_oe_boundary being U32_MAX indicates that no ordered extent was
>  * found by alloc_new_bio(), so there's no boundary.
>  */
> 
> I think talking about doing math on U32_MAX here obscures the main point.
> 

Jens wasn't surprised by the idea of a bio almost U32_MAX bytes long,
but I needed a printk to convince myself it was really happening.
Talking about alignment and seeing bios in the wild of these sizes helps
anyone changing the code keep these corner cases in mind.

+/- the part where Christoph is deleting len_to_oe_boundary completely,
and he'll drive this code up to a nice farm in the country where it can
retire in the sunshine.

-chris
Christoph Hellwig July 31, 2023, 7:35 p.m. UTC | #8
On Mon, Jul 31, 2023 at 02:52:23PM -0400, Chris Mason wrote:
> > I'm torn.  On the one hand "btrfs: limit write bios to a single ordered
> > extent" is a pretty significant behavior change, on the other hand
> > stable-only patches with totally different behavior are always a bit
> > strange.
> 
> When are we creating bios without bio_ctrl->wbc set?  I think reads will
> do this?

Yes.  These days the bio_ctrl is only used for data I/O, and
bio_ctrl->wbc is set for all writeback I/O, and clear for all read I/O.
Chris Mason July 31, 2023, 9:05 p.m. UTC | #9
On 7/31/23 3:35 PM, Christoph Hellwig wrote:
> On Mon, Jul 31, 2023 at 02:52:23PM -0400, Chris Mason wrote:
>>> I'm torn.  On the one hand "btrfs: limit write bios to a single ordered
>>> extent" is a pretty significant behavior change, on the other hand
>>> stable-only patches with totally different behavior are always a bit
>>> strange.
>>
>> When are we creating bios without bio_ctrl->wbc set?  I think reads will
>> do this?
> 
> Yes.  These days the bio_ctrl is only used for data I/O, and
> bio_ctrl->wbc is set for all writeback I/O, and clear for all read I/O.

Ok, the script needs updating to set the read_ahead_kb on the correct bdi,
but it triggers reliably for me upstream after 6 or 7 loops.

The trace is different, but we never recover:

[  109.156226] rcu: INFO: rcu_sched self-detected stall on CPU
[  109.157147] rcu:     21-....: (21000 ticks this GP) idle=c25c/1/0x4000000000000000 softirq=250/250 fqs=5249
[  109.158587] rcu:     (t=21003 jiffies g=2425 q=1 ncpus=32)
[  109.159392] Sending NMI from CPU 21 to CPUs 5:
[  109.160119] NMI backtrace for cpu 5
[  109.160131] CPU: 5 PID: 378 Comm: kworker/u64:6 Tainted: G            E      6.5.0-rc3-g57012c57536f #21
[  109.160138] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qem4
[  109.160140] Workqueue: btrfs-endio btrfs_end_bio_work
[  109.160158] RIP: 0010:btrfs_data_csum_ok+0x37/0x270

#!/bin/bash

SUBVOL=/btrfs/swapvol
SWAPFILE=$SUBVOL/swapfile
SZMB=8192

mkfs.btrfs -f /dev/vdb
mount /dev/vdb /btrfs

btrfs subvol create $SUBVOL
sync
chattr +C $SUBVOL
dd if=/dev/zero of=$SWAPFILE bs=1M count=$SZMB
sync;sync;sync

echo 4 > /proc/sys/vm/drop_caches

# UPDATE ME TO THE RIGHT BDI!
echo 4194304 > /sys/class/bdi/btrfs-2/read_ahead_kb

while(true) ; do
        echo 1 > /proc/sys/vm/drop_caches
        echo 1 > /proc/sys/vm/drop_caches
        dd of=/dev/zero if=$SWAPFILE bs=4096M count=2 iflag=fullblock
done
---------------

If you want to convince yourself the bug is happening as described,
add something like this (+/- whitespace munging from my lazy copy)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 79e29b5d3d8d..55716d5feb5e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -984,6 +985,9 @@ static void submit_extent_page(struct btrfs_bio_ctrl *bio_ctrl,
                        ASSERT(bio_ctrl->compress_type == BTRFS_COMPRESS_NONE);
                        ASSERT(is_data_inode(&inode->vfs_inode));
                        len = bio_ctrl->len_to_oe_boundary;
+                       if (len & 4095) {
+                               printk(KERN_CRIT "new len %u, bio size %u\n", len, bio_ctrl->bbio->bio.bi_iter.bi_size);
+                       }
                }

                if (bio_add_page(&bio_ctrl->bbio->bio, page, len, pg_offset) != len) {
Qu Wenruo Aug. 1, 2023, 12:58 a.m. UTC | #10
On 2023/8/1 02:10, Chris Mason wrote:
> On 7/31/23 3:02 AM, Christoph Hellwig wrote:
>> On Mon, Jul 31, 2023 at 10:27:02AM +0800, Qu Wenruo wrote:
>>> Personally speaking, I think we'd better moving the ordered extent based
>>> split (only for zoned devices) to btrfs bio layer.
>>
>> That goes completely counter the direction I've been working to.  The
>> ordered extent is the "container" for writeback, so having a bio
>> that spawns them creates all kinds of problems.  Thats's the reason
>> why we now have a bbio->ordered pointer now.
>>
>>> Another concern is, how we could hit a bio which has a size larger than
>>> U32_MAX?
>>>
>>> The bio->bi_iter.size is only unsigned int, it should never exceed U32_MAX.
>>>
>>> It would help a lot if you can provide a backtrace of such unaligned bio.
>>
>> That's indeed a bit weird.
>>
>
> bio_full() is using a slightly different test:
>
>          if (bio->bi_iter.bi_size > UINT_MAX - len)
>                  return true;
>
> We're doing:
>
>                  /* Cap to the current ordered extent boundary if there is one. */
>                  if (len > bio_ctrl->len_to_oe_boundary) {
>                          ASSERT(bio_ctrl->compress_type == BTRFS_COMPRESS_NONE);
>                          ASSERT(is_data_inode(&inode->vfs_inode));
>                          len = bio_ctrl->len_to_oe_boundary;
>                  }
>
> The end result of this is that when we get to a
> bio sized U32_MAX - PAGE_SIZE, both our len_to_oe_boundary and the
> bio_add_page() check will correctly decide we can only fit 4095 bytes.

Thanks for the details, now I can understand where the problem is.

Mind to add above explanation into the commit message?
Otherwise the fix looks good to me.

Thanks,
Qu
>
> The difference is bio_add_page() would just say no triggering bio
> submission.  But submit_extent_page()'s check is first, cutting the
> IO down to 4095 bytes instead.
>
> (I do have the stack trace, it's just a boring filemap_fdata_write_and_wait() path off of file release)
>
> -chris
>
Sweet Tea Dorminy Aug. 1, 2023, 2:59 a.m. UTC | #11
On 7/31/23 15:22, Chris Mason wrote:
> On 7/30/23 4:27 PM, Sweet Tea Dorminy wrote:
>>
>>> +        /*
>>> +         * len_to_oe_boundary defaults to U32_MAX, which isn't page or
>>> +         * sector aligned.  So, we don't really want to do math on
>>> +         * len_to_oe_boundary unless it has been intentionally set by
>>> +         * alloc_new_bio().  If we decrement here, we'll potentially
>>> +         * end up sending down an unaligned bio once we get close to
>>> +         * zero.
>>> +         */
>>
>> As I understand it, the important part is: nothing should use
>> len_to_oe_boundary unless there's an actual oe boundary, U32_MAX is just
>> a placeholder to convey the information that there's no oe boundary.
>>
>> So maybe:
>> /*
>>   * len_to_oe_boundary being U32_MAX indicates that no ordered extent was
>>   * found by alloc_new_bio(), so there's no boundary.
>>   */
>>
>> I think talking about doing math on U32_MAX here obscures the main point.
>>
> 
> Jens wasn't surprised by the idea of a bio almost U32_MAX bytes long,
> but I needed a printk to convince myself it was really happening.
> Talking about alignment and seeing bios in the wild of these sizes helps
> anyone changing the code keep these corner cases in mind.
> 
> +/- the part where Christoph is deleting len_to_oe_boundary completely,
> and he'll drive this code up to a nice farm in the country where it can
> retire in the sunshine.
> 
> -chris

Sounds good.

Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
diff mbox series

Patch

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6b40189a1a3e..bb2d2d405d04 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -849,7 +849,17 @@  static void submit_extent_page(struct btrfs_bio_ctrl *bio_ctrl,
 		size -= len;
 		pg_offset += len;
 		disk_bytenr += len;
-		bio_ctrl->len_to_oe_boundary -= len;
+
+		/*
+		 * len_to_oe_boundary defaults to U32_MAX, which isn't page or
+		 * sector aligned.  So, we don't really want to do math on
+		 * len_to_oe_boundary unless it has been intentionally set by
+		 * alloc_new_bio().  If we decrement here, we'll potentially
+		 * end up sending down an unaligned bio once we get close to
+		 * zero.
+		 */
+		if (bio_ctrl->len_to_oe_boundary != U32_MAX)
+			bio_ctrl->len_to_oe_boundary -= len;
 
 		/* Ordered extent boundary: move on to a new bio. */
 		if (bio_ctrl->len_to_oe_boundary == 0)