diff mbox series

[v5,3/5] btrfs: return ordered_extent splits from bio extraction

Message ID cf69fdbd608338aaa7916736ac50e2cfdc3d4338.1679512207.git.boris@bur.io (mailing list archive)
State New, archived
Headers show
Series btrfs: fix corruption caused by partial dio writes | expand

Commit Message

Boris Burkov March 22, 2023, 7:11 p.m. UTC
When extracting a bio from its ordered extent for dio partial writes, we
need the "remainder" ordered extent. It would be possible to look it up
in that case, but since we can grab the ordered_extent from the new
allocation function, we might as well wire it up to be returned to the
caller via out parameter and save that lookup.

Refactor the clone ordered extent function to return the new ordered
extent, then refactor the split and extract functions to pass back the
new pre and post split ordered extents via output parameter.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/bio.c          |  2 +-
 fs/btrfs/btrfs_inode.h  |  5 ++++-
 fs/btrfs/inode.c        | 36 +++++++++++++++++++++++++++---------
 fs/btrfs/ordered-data.c | 36 +++++++++++++++++++++++-------------
 fs/btrfs/ordered-data.h |  6 ++++--
 5 files changed, 59 insertions(+), 26 deletions(-)

Comments

Christoph Hellwig March 23, 2023, 8:47 a.m. UTC | #1
This is a bit of a mess.  And the root cause of that is that
btrfs_extract_ordered_extent the way it is used right now does
the wrong thing in terms of splitting the ordered_extent.  What
we want is to allocate a new one for the beginning of the range,
and leave the rest alone.

I did run into this a while ago during my (nt yet submitted) work
to keep an ordered_extent pointer in the btrfs_bio, and I have some
patches to sort it out.

I've rebased your fix on top of those, can you check if this tree
makes sense to you;

   http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/btrfs-dio-fix-hch

it passes basic testing so far.
Boris Burkov March 23, 2023, 4:15 p.m. UTC | #2
On Thu, Mar 23, 2023 at 01:47:28AM -0700, Christoph Hellwig wrote:
> This is a bit of a mess.  And the root cause of that is that
> btrfs_extract_ordered_extent the way it is used right now does
> the wrong thing in terms of splitting the ordered_extent.  What
> we want is to allocate a new one for the beginning of the range,
> and leave the rest alone.
> 
> I did run into this a while ago during my (nt yet submitted) work
> to keep an ordered_extent pointer in the btrfs_bio, and I have some
> patches to sort it out.
> 
> I've rebased your fix on top of those, can you check if this tree
> makes sense to you;
> 
>    http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/btrfs-dio-fix-hch
> 
> it passes basic testing so far.

Nice, this is great!

I actually also made the same changes as in your branch while working on
my fix, but didn't know enough about the zoned use case to realize that
the simpler "extract from beginning" constraint also applied to the
zoned case. So what happened in my branch was I implemented the three
way split as two "split at starts" which ultimately felt too messy and I
opted for returning the new split objects from the the existing model.

If it's true that we can always do a "split from front" then I'm all
aboard and think this is the way forward. Given that I found what I
think is a serious bug in the case where pre>0, I suspect you are right,
and we aren't hitting that case.

I will check that this passes my testing for the various dio cases (I
have one modified xfstests case I haven't sent yet for the meanest
version of the deadlock I have come up with so far) and the other tests
that I saw races/flakiness on, but from a quick look, your branch looks
correct to me. I believe the most non-obvious property my fix relies on
is dio_data->ordered having the leftovers from the partial after
submission so that it can be cancelled, which your branch looks to
maintain.

Assuming the tests pass, I do want to get this in sooner than later,
since downstream is still waiting on a fix. Would you be willing to send
your stack soon for my fix to land atop? I don't mind if you just send a
patch series with my patches mixed in, either. If, OTOH, your patches
are still a while out, or depend on something else that's underway,
maybe we could land mine, then gut them for your improvements. I'm fine
with it either way.

Thanks,
Boris
Boris Burkov March 23, 2023, 5 p.m. UTC | #3
On Thu, Mar 23, 2023 at 09:15:29AM -0700, Boris Burkov wrote:
> On Thu, Mar 23, 2023 at 01:47:28AM -0700, Christoph Hellwig wrote:
> > This is a bit of a mess.  And the root cause of that is that
> > btrfs_extract_ordered_extent the way it is used right now does
> > the wrong thing in terms of splitting the ordered_extent.  What
> > we want is to allocate a new one for the beginning of the range,
> > and leave the rest alone.
> > 
> > I did run into this a while ago during my (nt yet submitted) work
> > to keep an ordered_extent pointer in the btrfs_bio, and I have some
> > patches to sort it out.
> > 
> > I've rebased your fix on top of those, can you check if this tree
> > makes sense to you;
> > 
> >    http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/btrfs-dio-fix-hch
> > 
> > it passes basic testing so far.
> 
> Nice, this is great!
> 
> I actually also made the same changes as in your branch while working on
> my fix, but didn't know enough about the zoned use case to realize that
> the simpler "extract from beginning" constraint also applied to the
> zoned case. So what happened in my branch was I implemented the three
> way split as two "split at starts" which ultimately felt too messy and I
> opted for returning the new split objects from the the existing model.
> 
> If it's true that we can always do a "split from front" then I'm all
> aboard and think this is the way forward. Given that I found what I
> think is a serious bug in the case where pre>0, I suspect you are right,
> and we aren't hitting that case.
> 
> I will check that this passes my testing for the various dio cases (I
> have one modified xfstests case I haven't sent yet for the meanest
> version of the deadlock I have come up with so far) and the other tests
> that I saw races/flakiness on, but from a quick look, your branch looks
> correct to me. I believe the most non-obvious property my fix relies on
> is dio_data->ordered having the leftovers from the partial after
> submission so that it can be cancelled, which your branch looks to
> maintain.

Your branch as-is does not pass the existing tests, It's missing a fix
from my V5. We need to avoid splitting partial OEs when doing NOCOW dio
writes, because iomap_begin() does not create a fresh pinned em in that
case, since it reuses the existing extent.

e.g.,

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8cb61f4daec0..bbc89a0872e7 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7719,7 +7719,7 @@ static void btrfs_dio_submit_io(const struct iomap_iter *iter, struct bio *bio,
         * cancelled in iomap_end to avoid a deadlock wherein faulting the
         * remaining pages is blocked on the outstanding ordered extent.
         */
-       if (iter->flags & IOMAP_WRITE) {
+       if (iter->flags & IOMAP_WRITE && !test_bit(BTRFS_ORDERED_NOCOW, &dio_data->ordered->flags)) {
                int err;

                err = btrfs_extract_ordered_extent(bbio, dio_data->ordered);

With that patch, I pass 10x of btrfs/250, so running the full suite next.

> 
> Assuming the tests pass, I do want to get this in sooner than later,
> since downstream is still waiting on a fix. Would you be willing to send
> your stack soon for my fix to land atop? I don't mind if you just send a
> patch series with my patches mixed in, either. If, OTOH, your patches
> are still a while out, or depend on something else that's underway,
> maybe we could land mine, then gut them for your improvements. I'm fine
> with it either way.
> 
> Thanks,
> Boris
Boris Burkov March 23, 2023, 5:45 p.m. UTC | #4
On Thu, Mar 23, 2023 at 10:00:06AM -0700, Boris Burkov wrote:
> On Thu, Mar 23, 2023 at 09:15:29AM -0700, Boris Burkov wrote:
> > On Thu, Mar 23, 2023 at 01:47:28AM -0700, Christoph Hellwig wrote:
> > > This is a bit of a mess.  And the root cause of that is that
> > > btrfs_extract_ordered_extent the way it is used right now does
> > > the wrong thing in terms of splitting the ordered_extent.  What
> > > we want is to allocate a new one for the beginning of the range,
> > > and leave the rest alone.
> > > 
> > > I did run into this a while ago during my (nt yet submitted) work
> > > to keep an ordered_extent pointer in the btrfs_bio, and I have some
> > > patches to sort it out.
> > > 
> > > I've rebased your fix on top of those, can you check if this tree
> > > makes sense to you;
> > > 
> > >    http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/btrfs-dio-fix-hch
> > > 
> > > it passes basic testing so far.
> > 
> > Nice, this is great!
> > 
> > I actually also made the same changes as in your branch while working on
> > my fix, but didn't know enough about the zoned use case to realize that
> > the simpler "extract from beginning" constraint also applied to the
> > zoned case. So what happened in my branch was I implemented the three
> > way split as two "split at starts" which ultimately felt too messy and I
> > opted for returning the new split objects from the the existing model.
> > 
> > If it's true that we can always do a "split from front" then I'm all
> > aboard and think this is the way forward. Given that I found what I
> > think is a serious bug in the case where pre>0, I suspect you are right,
> > and we aren't hitting that case.
> > 
> > I will check that this passes my testing for the various dio cases (I
> > have one modified xfstests case I haven't sent yet for the meanest
> > version of the deadlock I have come up with so far) and the other tests
> > that I saw races/flakiness on, but from a quick look, your branch looks
> > correct to me. I believe the most non-obvious property my fix relies on
> > is dio_data->ordered having the leftovers from the partial after
> > submission so that it can be cancelled, which your branch looks to
> > maintain.
> 
> Your branch as-is does not pass the existing tests, It's missing a fix
> from my V5. We need to avoid splitting partial OEs when doing NOCOW dio
> writes, because iomap_begin() does not create a fresh pinned em in that
> case, since it reuses the existing extent.
> 
> e.g.,
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 8cb61f4daec0..bbc89a0872e7 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7719,7 +7719,7 @@ static void btrfs_dio_submit_io(const struct iomap_iter *iter, struct bio *bio,
>          * cancelled in iomap_end to avoid a deadlock wherein faulting the
>          * remaining pages is blocked on the outstanding ordered extent.
>          */
> -       if (iter->flags & IOMAP_WRITE) {
> +       if (iter->flags & IOMAP_WRITE && !test_bit(BTRFS_ORDERED_NOCOW, &dio_data->ordered->flags)) {
>                 int err;
> 
>                 err = btrfs_extract_ordered_extent(bbio, dio_data->ordered);
> 
> With that patch, I pass 10x of btrfs/250, so running the full suite next.

fstests in general passed on my system, so I am happy with this branch +
my above tweak if Naohiro/Johannes are on board with the simplified
ordered_extent/extent_map splitting model that assumes the bio is at the
start offset.

> 
> > 
> > Assuming the tests pass, I do want to get this in sooner than later,
> > since downstream is still waiting on a fix. Would you be willing to send
> > your stack soon for my fix to land atop? I don't mind if you just send a
> > patch series with my patches mixed in, either. If, OTOH, your patches
> > are still a while out, or depend on something else that's underway,
> > maybe we could land mine, then gut them for your improvements. I'm fine
> > with it either way.
> > 
> > Thanks,
> > Boris
Christoph Hellwig March 23, 2023, 9:29 p.m. UTC | #5
On Thu, Mar 23, 2023 at 10:00:06AM -0700, Boris Burkov wrote:
> Your branch as-is does not pass the existing tests, It's missing a fix
> from my V5. We need to avoid splitting partial OEs when doing NOCOW dio
> writes, because iomap_begin() does not create a fresh pinned em in that
> case, since it reuses the existing extent.

Oops, yes, that got lost.  I can add this as another patch attributed
to you.

That beeing said, I'm a bit confused about:

 1) if we need this split call at all for the non-zoned case as we don't
    need to record a different physical disk address
 2) how we clean up this on-disk logical to physical mapping at all on
    a write failure

Maybe we should let those dragons sleep for now and just do the minimal
fix, though.

I just woke up on an airplane, so depending on my jetlag I might have
a new series ready with the minimal fix for varying definitions of
"in a few hours".

> 
> e.g.,
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 8cb61f4daec0..bbc89a0872e7 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7719,7 +7719,7 @@ static void btrfs_dio_submit_io(const struct iomap_iter *iter, struct bio *bio,
>          * cancelled in iomap_end to avoid a deadlock wherein faulting the
>          * remaining pages is blocked on the outstanding ordered extent.
>          */
> -       if (iter->flags & IOMAP_WRITE) {
> +       if (iter->flags & IOMAP_WRITE && !test_bit(BTRFS_ORDERED_NOCOW, &dio_data->ordered->flags)) {
>                 int err;
> 
>                 err = btrfs_extract_ordered_extent(bbio, dio_data->ordered);

I think the BTRFS_ORDERED_NOCOW should be just around the
split_extent_map call.  That matches your series, and without
that we wouldn't split the ordered_extent for nowcow writes and thus
only fix the original problem for non-nocow writes.
Boris Burkov March 23, 2023, 10:43 p.m. UTC | #6
On Thu, Mar 23, 2023 at 02:29:39PM -0700, Christoph Hellwig wrote:
> On Thu, Mar 23, 2023 at 10:00:06AM -0700, Boris Burkov wrote:
> > Your branch as-is does not pass the existing tests, It's missing a fix
> > from my V5. We need to avoid splitting partial OEs when doing NOCOW dio
> > writes, because iomap_begin() does not create a fresh pinned em in that
> > case, since it reuses the existing extent.
> 
> Oops, yes, that got lost.  I can add this as another patch attributed
> to you.
> 
> That beeing said, I'm a bit confused about:
> 
>  1) if we need this split call at all for the non-zoned case as we don't
>     need to record a different physical disk address

I think I understand this, but maybe I'm missing exactly what you're
asking.

In finish_ordered_io, we call unpin_extent_cache, which blows up on
em->start != oe->file_offset. I believe the rationale is we are creating
a new em which is PINNED when we allocate the extent in
btrfs_new_extent_direct (via the call to btrfs_reserve_extent), so we
need to unpin it and allow it to be merged, etc... For nocow, we don't
allocate that new extent, so we don't need to split/unpin the existing
extent_map which we are just reusing.

>  2) how we clean up this on-disk logical to physical mapping at all on
>     a write failure

This I haven't thought much about, so I will leave it in the "dragons
sleep for now" category.

> 
> Maybe we should let those dragons sleep for now and just do the minimal
> fix, though.
> 
> I just woke up on an airplane, so depending on my jetlag I might have
> a new series ready with the minimal fix for varying definitions of
> "in a few hours".

Great, that works for me. I just didn't want to wait weeks if you were
blocked on other stuff.

> 
> > 
> > e.g.,
> > 
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index 8cb61f4daec0..bbc89a0872e7 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -7719,7 +7719,7 @@ static void btrfs_dio_submit_io(const struct iomap_iter *iter, struct bio *bio,
> >          * cancelled in iomap_end to avoid a deadlock wherein faulting the
> >          * remaining pages is blocked on the outstanding ordered extent.
> >          */
> > -       if (iter->flags & IOMAP_WRITE) {
> > +       if (iter->flags & IOMAP_WRITE && !test_bit(BTRFS_ORDERED_NOCOW, &dio_data->ordered->flags)) {
> >                 int err;
> > 
> >                 err = btrfs_extract_ordered_extent(bbio, dio_data->ordered);
> 
> I think the BTRFS_ORDERED_NOCOW should be just around the
> split_extent_map call.  That matches your series, and without
> that we wouldn't split the ordered_extent for nowcow writes and thus
> only fix the original problem for non-nocow writes.

Oops, my bad. Good catch.
Christoph Hellwig March 24, 2023, 12:24 a.m. UTC | #7
On Thu, Mar 23, 2023 at 03:43:36PM -0700, Boris Burkov wrote:
> In finish_ordered_io, we call unpin_extent_cache, which blows up on
> em->start != oe->file_offset. I believe the rationale is we are creating
> a new em which is PINNED when we allocate the extent in
> btrfs_new_extent_direct (via the call to btrfs_reserve_extent), so we
> need to unpin it and allow it to be merged, etc... For nocow, we don't
> allocate that new extent, so we don't need to split/unpin the existing
> extent_map which we are just reusing.

Yeah, I actually just ran into that when testing my idea :)
diff mbox series

Patch

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index cf09c6271edb..b849ced40d37 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -653,7 +653,7 @@  static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
 		if (use_append) {
 			bio->bi_opf &= ~REQ_OP_WRITE;
 			bio->bi_opf |= REQ_OP_ZONE_APPEND;
-			ret = btrfs_extract_ordered_extent(bbio);
+			ret = btrfs_extract_ordered_extent_bio(bbio, NULL, NULL, NULL);
 			if (ret)
 				goto fail_put_bio;
 		}
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 9dc21622806e..e92a09559058 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -407,7 +407,10 @@  static inline void btrfs_inode_split_flags(u64 inode_item_flags,
 
 int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
 			    u32 pgoff, u8 *csum, const u8 * const csum_expected);
-blk_status_t btrfs_extract_ordered_extent(struct btrfs_bio *bbio);
+blk_status_t btrfs_extract_ordered_extent_bio(struct btrfs_bio *bbio,
+					      struct btrfs_ordered_extent *ordered,
+					      struct btrfs_ordered_extent **ret_pre,
+					      struct btrfs_ordered_extent **ret_post);
 bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
 			u32 bio_offset, struct bio_vec *bv);
 noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5ab486f448eb..e30390051f15 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2514,10 +2514,14 @@  void btrfs_clear_delalloc_extent(struct btrfs_inode *inode,
 /*
  * Split an extent_map at [start, start + len]
  *
- * This function is intended to be used only for extract_ordered_extent().
+ * This function is intended to be used only for
+ * btrfs_extract_ordered_extent_bio().
+ *
+ * It makes assumptions about the extent map that are only valid in the narrow
+ * situations in which we are extracting a bio from a containing ordered extent,
+ * that are specific to zoned filesystems or partial dio writes.
  */
-static int split_zoned_em(struct btrfs_inode *inode, u64 start, u64 len,
-			  u64 pre, u64 post)
+static int split_em(struct btrfs_inode *inode, u64 start, u64 len, u64 pre, u64 post)
 {
 	struct extent_map_tree *em_tree = &inode->extent_tree;
 	struct extent_map *em;
@@ -2626,22 +2630,36 @@  static int split_zoned_em(struct btrfs_inode *inode, u64 start, u64 len,
 	return ret;
 }
 
-blk_status_t btrfs_extract_ordered_extent(struct btrfs_bio *bbio)
+/*
+ * Extract a bio from an ordered extent to enforce an invariant where the bio
+ * fully matches a single ordered extent.
+ *
+ * @bbio: the bio to extract.
+ * @ordered: the ordered extent the bio is in, will be shrunk to fit. If NULL we
+ *	     will look it up.
+ * @ret_pre: out parameter to return the new oe in front of the bio, if needed.
+ * @ret_post: out parameter to return the new oe past the bio, if needed.
+ */
+blk_status_t btrfs_extract_ordered_extent_bio(struct btrfs_bio *bbio,
+					      struct btrfs_ordered_extent *ordered,
+					      struct btrfs_ordered_extent **ret_pre,
+					      struct btrfs_ordered_extent **ret_post)
 {
 	u64 start = (u64)bbio->bio.bi_iter.bi_sector << SECTOR_SHIFT;
 	u64 len = bbio->bio.bi_iter.bi_size;
 	struct btrfs_inode *inode = bbio->inode;
-	struct btrfs_ordered_extent *ordered;
 	u64 file_len;
 	u64 end = start + len;
 	u64 ordered_end;
 	u64 pre, post;
 	int ret = 0;
 
-	ordered = btrfs_lookup_ordered_extent(inode, bbio->file_offset);
+	if (!ordered)
+		ordered = btrfs_lookup_ordered_extent(inode, bbio->file_offset);
 	if (WARN_ON_ONCE(!ordered))
 		return BLK_STS_IOERR;
 
+	ordered_end = ordered->disk_bytenr + ordered->disk_num_bytes;
 	/* No need to split */
 	if (ordered->disk_num_bytes == len)
 		goto out;
@@ -2658,7 +2676,6 @@  blk_status_t btrfs_extract_ordered_extent(struct btrfs_bio *bbio)
 		goto out;
 	}
 
-	ordered_end = ordered->disk_bytenr + ordered->disk_num_bytes;
 	/* bio must be in one ordered extent */
 	if (WARN_ON_ONCE(start < ordered->disk_bytenr || end > ordered_end)) {
 		ret = -EINVAL;
@@ -2675,10 +2692,11 @@  blk_status_t btrfs_extract_ordered_extent(struct btrfs_bio *bbio)
 	pre = start - ordered->disk_bytenr;
 	post = ordered_end - end;
 
-	ret = btrfs_split_ordered_extent(ordered, pre, post);
+	ret = btrfs_split_ordered_extent(ordered, pre, post, ret_pre, ret_post);
 	if (ret)
 		goto out;
-	ret = split_zoned_em(inode, bbio->file_offset, file_len, pre, post);
+	if (!test_bit(BTRFS_ORDERED_NOCOW, &ordered->flags))
+		ret = split_em(inode, bbio->file_offset, file_len, pre, post);
 
 out:
 	btrfs_put_ordered_extent(ordered);
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 1848d0d1a9c4..4bebebb9b434 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -1117,8 +1117,8 @@  bool btrfs_try_lock_ordered_range(struct btrfs_inode *inode, u64 start, u64 end,
 }
 
 
-static int clone_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pos,
-				u64 len)
+static struct btrfs_ordered_extent *clone_ordered_extent(struct btrfs_ordered_extent *ordered,
+						  u64 pos, u64 len)
 {
 	struct inode *inode = ordered->inode;
 	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
@@ -1133,18 +1133,22 @@  static int clone_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pos,
 	percpu_counter_add_batch(&fs_info->ordered_bytes, -len,
 				 fs_info->delalloc_batch);
 	WARN_ON_ONCE(flags & (1 << BTRFS_ORDERED_COMPRESSED));
-	return btrfs_add_ordered_extent(BTRFS_I(inode), file_offset, len, len,
-					disk_bytenr, len, 0, flags,
-					ordered->compress_type);
+	return btrfs_alloc_ordered_extent(BTRFS_I(inode), file_offset, len, len,
+					  disk_bytenr, len, 0, flags,
+					  ordered->compress_type);
 }
 
-int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre,
-				u64 post)
+int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered,
+			       u64 pre, u64 post,
+			       struct btrfs_ordered_extent **ret_pre,
+			       struct btrfs_ordered_extent **ret_post)
+
 {
 	struct inode *inode = ordered->inode;
 	struct btrfs_ordered_inode_tree *tree = &BTRFS_I(inode)->ordered_tree;
 	struct rb_node *node;
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	struct btrfs_ordered_extent *oe;
 	int ret = 0;
 
 	trace_btrfs_ordered_extent_split(BTRFS_I(inode), ordered);
@@ -1172,12 +1176,18 @@  int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre,
 
 	spin_unlock_irq(&tree->lock);
 
-	if (pre)
-		ret = clone_ordered_extent(ordered, 0, pre);
-	if (ret == 0 && post)
-		ret = clone_ordered_extent(ordered, pre + ordered->disk_num_bytes,
-					   post);
-
+	if (pre) {
+		oe = clone_ordered_extent(ordered, 0, pre);
+		ret = IS_ERR(oe) ? PTR_ERR(oe) : 0;
+		if (!ret && ret_pre)
+			*ret_pre = oe;
+	}
+	if (!ret && post) {
+		oe = clone_ordered_extent(ordered, pre + ordered->disk_num_bytes, post);
+		ret = IS_ERR(oe) ? PTR_ERR(oe) : 0;
+		if (!ret && ret_post)
+			*ret_post = oe;
+	}
 	return ret;
 }
 
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index 18007f9c00ad..933f6f0d8c10 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -212,8 +212,10 @@  void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start,
 					struct extent_state **cached_state);
 bool btrfs_try_lock_ordered_range(struct btrfs_inode *inode, u64 start, u64 end,
 				  struct extent_state **cached_state);
-int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre,
-			       u64 post);
+int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered,
+			       u64 pre, u64 post,
+			       struct btrfs_ordered_extent **ret_pre,
+			       struct btrfs_ordered_extent **ret_post);
 int __init ordered_data_init(void);
 void __cold ordered_data_exit(void);