diff mbox series

[v2,09/10] block: Let replace_child_noperm free children

Message ID 20211111120829.81329-10-hreitz@redhat.com (mailing list archive)
State New, archived
Headers show
Series block: Attempt on fixing 030-reported errors | expand

Commit Message

Hanna Czenczek Nov. 11, 2021, 12:08 p.m. UTC
In most of the block layer, especially when traversing down from other
BlockDriverStates, we assume that BdrvChild.bs can never be NULL.  When
it becomes NULL, it is expected that the corresponding BdrvChild pointer
also becomes NULL and the BdrvChild object is freed.

Therefore, once bdrv_replace_child_noperm() sets the BdrvChild.bs
pointer to NULL, it should also immediately set the corresponding
BdrvChild pointer (like bs->file or bs->backing) to NULL.

In that context, it also makes sense for this function to free the
child.  Sometimes we cannot do so, though, because it is called in a
transactional context where the caller might still want to reinstate the
child in the abort branch (and free it only on commit), so this behavior
has to remain optional.

In bdrv_replace_child_tran()'s abort handler, we now rely on the fact
that the BdrvChild passed to bdrv_replace_child_tran() must have had a
non-NULL .bs pointer initially.  Make a note of that and assert it.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
 block.c | 102 +++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 79 insertions(+), 23 deletions(-)

Comments

Vladimir Sementsov-Ogievskiy Nov. 12, 2021, 4:10 p.m. UTC | #1
11.11.2021 15:08, Hanna Reitz wrote:
> In most of the block layer, especially when traversing down from other
> BlockDriverStates, we assume that BdrvChild.bs can never be NULL.  When
> it becomes NULL, it is expected that the corresponding BdrvChild pointer
> also becomes NULL and the BdrvChild object is freed.
> 
> Therefore, once bdrv_replace_child_noperm() sets the BdrvChild.bs
> pointer to NULL, it should also immediately set the corresponding
> BdrvChild pointer (like bs->file or bs->backing) to NULL.
> 
> In that context, it also makes sense for this function to free the
> child.  Sometimes we cannot do so, though, because it is called in a
> transactional context where the caller might still want to reinstate the
> child in the abort branch (and free it only on commit), so this behavior
> has to remain optional.
> 
> In bdrv_replace_child_tran()'s abort handler, we now rely on the fact
> that the BdrvChild passed to bdrv_replace_child_tran() must have had a
> non-NULL .bs pointer initially.  Make a note of that and assert it.
> 
> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> ---
>   block.c | 102 +++++++++++++++++++++++++++++++++++++++++++-------------
>   1 file changed, 79 insertions(+), 23 deletions(-)
> 
> diff --git a/block.c b/block.c
> index a40027161c..0ac5b163d2 100644
> --- a/block.c
> +++ b/block.c
> @@ -87,8 +87,10 @@ static BlockDriverState *bdrv_open_inherit(const char *filename,
>   static bool bdrv_recurse_has_child(BlockDriverState *bs,
>                                      BlockDriverState *child);
>   
> +static void bdrv_child_free(BdrvChild *child);
>   static void bdrv_replace_child_noperm(BdrvChild **child,
> -                                      BlockDriverState *new_bs);
> +                                      BlockDriverState *new_bs,
> +                                      bool free_empty_child);
>   static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
>                                                 BdrvChild *child,
>                                                 Transaction *tran);
> @@ -2256,12 +2258,16 @@ typedef struct BdrvReplaceChildState {
>       BdrvChild *child;
>       BdrvChild **childp;
>       BlockDriverState *old_bs;
> +    bool free_empty_child;
>   } BdrvReplaceChildState;
>   
>   static void bdrv_replace_child_commit(void *opaque)
>   {
>       BdrvReplaceChildState *s = opaque;
>   
> +    if (s->free_empty_child && !s->child->bs) {
> +        bdrv_child_free(s->child);
> +    }
>       bdrv_unref(s->old_bs);
>   }
>   
> @@ -2278,22 +2284,26 @@ static void bdrv_replace_child_abort(void *opaque)
>        *     modify the BdrvChild * pointer we indirectly pass to it, i.e. it
>        *     will not modify s->child.  From that perspective, it does not matter
>        *     whether we pass s->childp or &s->child.
> -     *     (TODO: Right now, bdrv_replace_child_noperm() never modifies that
> -     *     pointer anyway (though it will in the future), so at this point it
> -     *     absolutely does not matter whether we pass s->childp or &s->child.)
>        * (2) If new_bs is not NULL, s->childp will be NULL.  We then cannot use
>        *     it here.
>        * (3) If new_bs is NULL, *s->childp will have been NULLed by
>        *     bdrv_replace_child_tran()'s bdrv_replace_child_noperm() call, and we
>        *     must not pass a NULL *s->childp here.
> -     *     (TODO: In its current state, bdrv_replace_child_noperm() will not
> -     *     have NULLed *s->childp, so this does not apply yet.  It will in the
> -     *     future.)

What I don't like about this patch is that it does two different things: zeroing the pointer and clearing the object. And if we look at the latter in separate, it seems that it's not needed:

Look: bdrv_replace_child_tran(): new parameter is set to true in two places, in both of them we are sure (and do assertion and comment) that new bs is not NULL and nothing will be freed.

Similarly, bdrv_replace_child_noperm() is called with true in two places where we sure that new bs is not NULL.

and only one place where new parameter set to true really do something:

> @@ -2960,8 +3013,7 @@ static void bdrv_detach_child(BdrvChild **childp)
>   {
>       BlockDriverState *old_bs = (*childp)->bs;
>   
> -    bdrv_replace_child_noperm(childp, NULL);
> -    bdrv_child_free(*childp);
> +    bdrv_replace_child_noperm(childp, NULL, true);
>   
>       if (old_bs) {
>           /*

And it doesn't worth the whole complexity of new parameters for two functions.

In this place we can simply do something like

BdrvChild *child = *childp;

bdrv_replace_child_noperm(childp, NULL);

bdrv_child_free(child);


I understand the idea: it seems good and intuitive to do zeroing the pointer and clearing the object in one shot. But this patch itself shows that we just can't do it in 90% of cases. So, I think better is not do it and live with only "zeroing the pointer" part of this patch.





Another idea that come to my mind while reviewing this series: did you consider zeroing bs->file / bs->backing in .detach, like you do with bs->children list at start of the series?  We can argue the same way that file and backing pointers are property of parent, and they should be zeroed in .detach, where element is removed from bs->children.
Hanna Czenczek Nov. 15, 2021, 1:04 p.m. UTC | #2
On 12.11.21 17:10, Vladimir Sementsov-Ogievskiy wrote:
> 11.11.2021 15:08, Hanna Reitz wrote:
>> In most of the block layer, especially when traversing down from other
>> BlockDriverStates, we assume that BdrvChild.bs can never be NULL.  When
>> it becomes NULL, it is expected that the corresponding BdrvChild pointer
>> also becomes NULL and the BdrvChild object is freed.
>>
>> Therefore, once bdrv_replace_child_noperm() sets the BdrvChild.bs
>> pointer to NULL, it should also immediately set the corresponding
>> BdrvChild pointer (like bs->file or bs->backing) to NULL.
>>
>> In that context, it also makes sense for this function to free the
>> child.  Sometimes we cannot do so, though, because it is called in a
>> transactional context where the caller might still want to reinstate the
>> child in the abort branch (and free it only on commit), so this behavior
>> has to remain optional.
>>
>> In bdrv_replace_child_tran()'s abort handler, we now rely on the fact
>> that the BdrvChild passed to bdrv_replace_child_tran() must have had a
>> non-NULL .bs pointer initially.  Make a note of that and assert it.
>>
>> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
>> ---
>>   block.c | 102 +++++++++++++++++++++++++++++++++++++++++++-------------
>>   1 file changed, 79 insertions(+), 23 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index a40027161c..0ac5b163d2 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -87,8 +87,10 @@ static BlockDriverState *bdrv_open_inherit(const 
>> char *filename,
>>   static bool bdrv_recurse_has_child(BlockDriverState *bs,
>>                                      BlockDriverState *child);
>>   +static void bdrv_child_free(BdrvChild *child);
>>   static void bdrv_replace_child_noperm(BdrvChild **child,
>> -                                      BlockDriverState *new_bs);
>> +                                      BlockDriverState *new_bs,
>> +                                      bool free_empty_child);
>>   static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
>>                                                 BdrvChild *child,
>>                                                 Transaction *tran);
>> @@ -2256,12 +2258,16 @@ typedef struct BdrvReplaceChildState {
>>       BdrvChild *child;
>>       BdrvChild **childp;
>>       BlockDriverState *old_bs;
>> +    bool free_empty_child;
>>   } BdrvReplaceChildState;
>>     static void bdrv_replace_child_commit(void *opaque)
>>   {
>>       BdrvReplaceChildState *s = opaque;
>>   +    if (s->free_empty_child && !s->child->bs) {
>> +        bdrv_child_free(s->child);
>> +    }
>>       bdrv_unref(s->old_bs);
>>   }
>>   @@ -2278,22 +2284,26 @@ static void bdrv_replace_child_abort(void 
>> *opaque)
>>        *     modify the BdrvChild * pointer we indirectly pass to it, 
>> i.e. it
>>        *     will not modify s->child.  From that perspective, it 
>> does not matter
>>        *     whether we pass s->childp or &s->child.
>> -     *     (TODO: Right now, bdrv_replace_child_noperm() never 
>> modifies that
>> -     *     pointer anyway (though it will in the future), so at this 
>> point it
>> -     *     absolutely does not matter whether we pass s->childp or 
>> &s->child.)
>>        * (2) If new_bs is not NULL, s->childp will be NULL. We then 
>> cannot use
>>        *     it here.
>>        * (3) If new_bs is NULL, *s->childp will have been NULLed by
>>        *     bdrv_replace_child_tran()'s bdrv_replace_child_noperm() 
>> call, and we
>>        *     must not pass a NULL *s->childp here.
>> -     *     (TODO: In its current state, bdrv_replace_child_noperm() 
>> will not
>> -     *     have NULLed *s->childp, so this does not apply yet.  It 
>> will in the
>> -     *     future.)
>
> What I don't like about this patch is that it does two different 
> things: zeroing the pointer and clearing the object. And if we look at 
> the latter in separate, it seems that it's not needed:
>
> Look: bdrv_replace_child_tran(): new parameter is set to true in two 
> places, in both of them we are sure (and do assertion and comment) 
> that new bs is not NULL and nothing will be freed.
>
> Similarly, bdrv_replace_child_noperm() is called with true in two 
> places where we sure that new bs is not NULL.
>
> and only one place where new parameter set to true really do something:
>
>> @@ -2960,8 +3013,7 @@ static void bdrv_detach_child(BdrvChild **childp)
>>   {
>>       BlockDriverState *old_bs = (*childp)->bs;
>>   -    bdrv_replace_child_noperm(childp, NULL);
>> -    bdrv_child_free(*childp);
>> +    bdrv_replace_child_noperm(childp, NULL, true);
>>         if (old_bs) {
>>           /*
>
> And it doesn't worth the whole complexity of new parameters for two 
> functions.
>
> In this place we can simply do something like
>
> BdrvChild *child = *childp;
>
> bdrv_replace_child_noperm(childp, NULL);
>
> bdrv_child_free(child);
>
>
> I understand the idea: it seems good and intuitive to do zeroing the 
> pointer and clearing the object in one shot. But this patch itself 
> shows that we just can't do it in 90% of cases. So, I think better is 
> not do it and live with only "zeroing the pointer" part of this patch.

I see your point, but I don’t find it too complex.  Passing `true` is 
the default and then calling the function is easy.  Passing `false` 
means there’s a catch, and then the caller is already complex anyway, so 
it doesn’t really make things worse.

I find the condition on when to pass `true` and when to pass `false` 
simple: Always pass true, unless the child cannot be deleted yet.

There are two reasons why I’d rather keep the parameter:
(1) That’s how it’s already merged, and I’m biased against respinning 
given that Kevin will be on PTO beginning tomorrow, and that we’re in 
freeze, so I’d rather not miss an RC.
(2) I really dislike a function that takes a pointer, NULLs it, and then 
doesn’t free the object it belongs to.  I find that a bad interface.  
Unfortunately we sometimes need this behavior, though, hence the 
additional parameter.  And this parameter basically asks the caller 
whether they want the reasonable interface (`true`) or the weird one 
where the pointer is NULLed but the object isn’t freed (`false`).  I 
find this makes the interface palatable to me.

>
> Another idea that come to my mind while reviewing this series: did you 
> consider zeroing bs->file / bs->backing in .detach, like you do with 
> bs->children list at start of the series?  We can argue the same way 
> that file and backing pointers are property of parent, and they should 
> be zeroed in .detach, where element is removed from bs->children.

Yes, I did.  The problem is that to make this right, .attach() would 
symmetrically need to put the child into bs->file or bs->backing (e.g. 
when the removal transaction is aborted).  That would not only be more 
invasive (we’d have to deal with and modify the places where bs->file or 
bs->backing is set), you’re then also facing the problem of giving 
.attach() this information.

Perhaps we could let .detach() clear the pointer and not set it in 
.attach(), but that seemed sufficiently wrong to me that I didn’t 
consider it further.

Hanna
Vladimir Sementsov-Ogievskiy Nov. 16, 2021, 8:16 a.m. UTC | #3
15.11.2021 16:04, Hanna Reitz wrote:
> On 12.11.21 17:10, Vladimir Sementsov-Ogievskiy wrote:
>> 11.11.2021 15:08, Hanna Reitz wrote:
>>> In most of the block layer, especially when traversing down from other
>>> BlockDriverStates, we assume that BdrvChild.bs can never be NULL.  When
>>> it becomes NULL, it is expected that the corresponding BdrvChild pointer
>>> also becomes NULL and the BdrvChild object is freed.
>>>
>>> Therefore, once bdrv_replace_child_noperm() sets the BdrvChild.bs
>>> pointer to NULL, it should also immediately set the corresponding
>>> BdrvChild pointer (like bs->file or bs->backing) to NULL.
>>>
>>> In that context, it also makes sense for this function to free the
>>> child.  Sometimes we cannot do so, though, because it is called in a
>>> transactional context where the caller might still want to reinstate the
>>> child in the abort branch (and free it only on commit), so this behavior
>>> has to remain optional.
>>>
>>> In bdrv_replace_child_tran()'s abort handler, we now rely on the fact
>>> that the BdrvChild passed to bdrv_replace_child_tran() must have had a
>>> non-NULL .bs pointer initially.  Make a note of that and assert it.
>>>
>>> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
>>> ---
>>>   block.c | 102 +++++++++++++++++++++++++++++++++++++++++++-------------
>>>   1 file changed, 79 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/block.c b/block.c
>>> index a40027161c..0ac5b163d2 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -87,8 +87,10 @@ static BlockDriverState *bdrv_open_inherit(const char *filename,
>>>   static bool bdrv_recurse_has_child(BlockDriverState *bs,
>>>                                      BlockDriverState *child);
>>>   +static void bdrv_child_free(BdrvChild *child);
>>>   static void bdrv_replace_child_noperm(BdrvChild **child,
>>> -                                      BlockDriverState *new_bs);
>>> +                                      BlockDriverState *new_bs,
>>> +                                      bool free_empty_child);
>>>   static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
>>>                                                 BdrvChild *child,
>>>                                                 Transaction *tran);
>>> @@ -2256,12 +2258,16 @@ typedef struct BdrvReplaceChildState {
>>>       BdrvChild *child;
>>>       BdrvChild **childp;
>>>       BlockDriverState *old_bs;
>>> +    bool free_empty_child;
>>>   } BdrvReplaceChildState;
>>>     static void bdrv_replace_child_commit(void *opaque)
>>>   {
>>>       BdrvReplaceChildState *s = opaque;
>>>   +    if (s->free_empty_child && !s->child->bs) {
>>> +        bdrv_child_free(s->child);
>>> +    }
>>>       bdrv_unref(s->old_bs);
>>>   }
>>>   @@ -2278,22 +2284,26 @@ static void bdrv_replace_child_abort(void *opaque)
>>>        *     modify the BdrvChild * pointer we indirectly pass to it, i.e. it
>>>        *     will not modify s->child.  From that perspective, it does not matter
>>>        *     whether we pass s->childp or &s->child.
>>> -     *     (TODO: Right now, bdrv_replace_child_noperm() never modifies that
>>> -     *     pointer anyway (though it will in the future), so at this point it
>>> -     *     absolutely does not matter whether we pass s->childp or &s->child.)
>>>        * (2) If new_bs is not NULL, s->childp will be NULL. We then cannot use
>>>        *     it here.
>>>        * (3) If new_bs is NULL, *s->childp will have been NULLed by
>>>        *     bdrv_replace_child_tran()'s bdrv_replace_child_noperm() call, and we
>>>        *     must not pass a NULL *s->childp here.
>>> -     *     (TODO: In its current state, bdrv_replace_child_noperm() will not
>>> -     *     have NULLed *s->childp, so this does not apply yet.  It will in the
>>> -     *     future.)
>>
>> What I don't like about this patch is that it does two different things: zeroing the pointer and clearing the object. And if we look at the latter in separate, it seems that it's not needed:
>>
>> Look: bdrv_replace_child_tran(): new parameter is set to true in two places, in both of them we are sure (and do assertion and comment) that new bs is not NULL and nothing will be freed.
>>
>> Similarly, bdrv_replace_child_noperm() is called with true in two places where we sure that new bs is not NULL.
>>
>> and only one place where new parameter set to true really do something:
>>
>>> @@ -2960,8 +3013,7 @@ static void bdrv_detach_child(BdrvChild **childp)
>>>   {
>>>       BlockDriverState *old_bs = (*childp)->bs;
>>>   -    bdrv_replace_child_noperm(childp, NULL);
>>> -    bdrv_child_free(*childp);
>>> +    bdrv_replace_child_noperm(childp, NULL, true);
>>>         if (old_bs) {
>>>           /*
>>
>> And it doesn't worth the whole complexity of new parameters for two functions.
>>
>> In this place we can simply do something like
>>
>> BdrvChild *child = *childp;
>>
>> bdrv_replace_child_noperm(childp, NULL);
>>
>> bdrv_child_free(child);
>>
>>
>> I understand the idea: it seems good and intuitive to do zeroing the pointer and clearing the object in one shot. But this patch itself shows that we just can't do it in 90% of cases. So, I think better is not do it and live with only "zeroing the pointer" part of this patch.
> 
> I see your point, but I don’t find it too complex.  Passing `true` is the default and then calling the function is easy.  Passing `false` means there’s a catch, and then the caller is already complex anyway, so it doesn’t really make things worse.
> 
> I find the condition on when to pass `true` and when to pass `false` simple: Always pass true, unless the child cannot be deleted yet.
> 
> There are two reasons why I’d rather keep the parameter:
> (1) That’s how it’s already merged, and I’m biased against respinning given that Kevin will be on PTO beginning tomorrow, and that we’re in freeze, so I’d rather not miss an RC.

OK, that of course makes sense)

> (2) I really dislike a function that takes a pointer, NULLs it, and then doesn’t free the object it belongs to.  I find that a bad interface. Unfortunately we sometimes need this behavior, though, hence the additional parameter.  And this parameter basically asks the caller whether they want the reasonable interface (`true`) or the weird one where the pointer is NULLed but the object isn’t freed (`false`).  I find this makes the interface palatable to me.
> 
>>
>> Another idea that come to my mind while reviewing this series: did you consider zeroing bs->file / bs->backing in .detach, like you do with bs->children list at start of the series?  We can argue the same way that file and backing pointers are property of parent, and they should be zeroed in .detach, where element is removed from bs->children.
> 
> Yes, I did.  The problem is that to make this right, .attach() would symmetrically need to put the child into bs->file or bs->backing (e.g. when the removal transaction is aborted).  That would not only be more invasive (we’d have to deal with and modify the places where bs->file or bs->backing is set), you’re then also facing the problem of giving .attach() this information.
> 
> Perhaps we could let .detach() clear the pointer and not set it in .attach(), but that seemed sufficiently wrong to me that I didn’t consider it further.
> 

OK, reasonable.

If I have a good idea, I'll come with patches on top of this.
diff mbox series

Patch

diff --git a/block.c b/block.c
index a40027161c..0ac5b163d2 100644
--- a/block.c
+++ b/block.c
@@ -87,8 +87,10 @@  static BlockDriverState *bdrv_open_inherit(const char *filename,
 static bool bdrv_recurse_has_child(BlockDriverState *bs,
                                    BlockDriverState *child);
 
+static void bdrv_child_free(BdrvChild *child);
 static void bdrv_replace_child_noperm(BdrvChild **child,
-                                      BlockDriverState *new_bs);
+                                      BlockDriverState *new_bs,
+                                      bool free_empty_child);
 static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
                                               BdrvChild *child,
                                               Transaction *tran);
@@ -2256,12 +2258,16 @@  typedef struct BdrvReplaceChildState {
     BdrvChild *child;
     BdrvChild **childp;
     BlockDriverState *old_bs;
+    bool free_empty_child;
 } BdrvReplaceChildState;
 
 static void bdrv_replace_child_commit(void *opaque)
 {
     BdrvReplaceChildState *s = opaque;
 
+    if (s->free_empty_child && !s->child->bs) {
+        bdrv_child_free(s->child);
+    }
     bdrv_unref(s->old_bs);
 }
 
@@ -2278,22 +2284,26 @@  static void bdrv_replace_child_abort(void *opaque)
      *     modify the BdrvChild * pointer we indirectly pass to it, i.e. it
      *     will not modify s->child.  From that perspective, it does not matter
      *     whether we pass s->childp or &s->child.
-     *     (TODO: Right now, bdrv_replace_child_noperm() never modifies that
-     *     pointer anyway (though it will in the future), so at this point it
-     *     absolutely does not matter whether we pass s->childp or &s->child.)
      * (2) If new_bs is not NULL, s->childp will be NULL.  We then cannot use
      *     it here.
      * (3) If new_bs is NULL, *s->childp will have been NULLed by
      *     bdrv_replace_child_tran()'s bdrv_replace_child_noperm() call, and we
      *     must not pass a NULL *s->childp here.
-     *     (TODO: In its current state, bdrv_replace_child_noperm() will not
-     *     have NULLed *s->childp, so this does not apply yet.  It will in the
-     *     future.)
      *
      * So whether new_bs was NULL or not, we cannot pass s->childp here; and in
      * any case, there is no reason to pass it anyway.
      */
-    bdrv_replace_child_noperm(&s->child, s->old_bs);
+    bdrv_replace_child_noperm(&s->child, s->old_bs, true);
+    /*
+     * The child was pre-existing, so s->old_bs must be non-NULL, and
+     * s->child thus must not have been freed
+     */
+    assert(s->child != NULL);
+    if (!new_bs) {
+        /* As described above, *s->childp was cleared, so restore it */
+        assert(s->childp != NULL);
+        *s->childp = s->child;
+    }
     bdrv_unref(new_bs);
 }
 
@@ -2310,30 +2320,44 @@  static TransactionActionDrv bdrv_replace_child_drv = {
  *
  * The function doesn't update permissions, caller is responsible for this.
  *
+ * (*childp)->bs must not be NULL.
+ *
  * Note that if new_bs == NULL, @childp is stored in a state object attached
  * to @tran, so that the old child can be reinstated in the abort handler.
  * Therefore, if @new_bs can be NULL, @childp must stay valid until the
  * transaction is committed or aborted.
  *
- * (TODO: The reinstating does not happen yet, but it will once
- * bdrv_replace_child_noperm() NULLs *childp when new_bs is NULL.)
+ * If @free_empty_child is true and @new_bs is NULL, the BdrvChild is
+ * freed (on commit).  @free_empty_child should only be false if the
+ * caller will free the BDrvChild themselves (which may be important
+ * if this is in turn called in another transactional context).
  */
 static void bdrv_replace_child_tran(BdrvChild **childp,
                                     BlockDriverState *new_bs,
-                                    Transaction *tran)
+                                    Transaction *tran,
+                                    bool free_empty_child)
 {
     BdrvReplaceChildState *s = g_new(BdrvReplaceChildState, 1);
     *s = (BdrvReplaceChildState) {
         .child = *childp,
         .childp = new_bs == NULL ? childp : NULL,
         .old_bs = (*childp)->bs,
+        .free_empty_child = free_empty_child,
     };
     tran_add(tran, &bdrv_replace_child_drv, s);
 
+    /* The abort handler relies on this */
+    assert(s->old_bs != NULL);
+
     if (new_bs) {
         bdrv_ref(new_bs);
     }
-    bdrv_replace_child_noperm(childp, new_bs);
+    /*
+     * Pass free_empty_child=false, we will free the child (if
+     * necessary) in bdrv_replace_child_commit() (if our
+     * @free_empty_child parameter was true).
+     */
+    bdrv_replace_child_noperm(childp, new_bs, false);
     /* old_bs reference is transparently moved from *childp to @s */
 }
 
@@ -2705,8 +2729,22 @@  uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm)
     return permissions[qapi_perm];
 }
 
+/**
+ * Replace (*childp)->bs by @new_bs.
+ *
+ * If @new_bs is NULL, *childp will be set to NULL, too: BDS parents
+ * generally cannot handle a BdrvChild with .bs == NULL, so clearing
+ * BdrvChild.bs should generally immediately be followed by the
+ * BdrvChild pointer being cleared as well.
+ *
+ * If @free_empty_child is true and @new_bs is NULL, the BdrvChild is
+ * freed.  @free_empty_child should only be false if the caller will
+ * free the BdrvChild themselves (this may be important in a
+ * transactional context, where it may only be freed on commit).
+ */
 static void bdrv_replace_child_noperm(BdrvChild **childp,
-                                      BlockDriverState *new_bs)
+                                      BlockDriverState *new_bs,
+                                      bool free_empty_child)
 {
     BdrvChild *child = *childp;
     BlockDriverState *old_bs = child->bs;
@@ -2743,6 +2781,9 @@  static void bdrv_replace_child_noperm(BdrvChild **childp,
     }
 
     child->bs = new_bs;
+    if (!new_bs) {
+        *childp = NULL;
+    }
 
     if (new_bs) {
         QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
@@ -2772,6 +2813,10 @@  static void bdrv_replace_child_noperm(BdrvChild **childp,
         bdrv_parent_drained_end_single(child);
         drain_saldo++;
     }
+
+    if (free_empty_child && !child->bs) {
+        bdrv_child_free(child);
+    }
 }
 
 /**
@@ -2801,7 +2846,14 @@  static void bdrv_attach_child_common_abort(void *opaque)
     BdrvChild *child = *s->child;
     BlockDriverState *bs = child->bs;
 
-    bdrv_replace_child_noperm(s->child, NULL);
+    /*
+     * Pass free_empty_child=false, because we still need the child
+     * for the AioContext operations on the parent below; those
+     * BdrvChildClass methods all work on a BdrvChild object, so we
+     * need to keep it as an empty shell (after this function, it will
+     * not be attached to any parent, and it will not have a .bs).
+     */
+    bdrv_replace_child_noperm(s->child, NULL, false);
 
     if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
         bdrv_try_set_aio_context(bs, s->old_child_ctx, &error_abort);
@@ -2823,7 +2875,6 @@  static void bdrv_attach_child_common_abort(void *opaque)
 
     bdrv_unref(bs);
     bdrv_child_free(child);
-    *s->child = NULL;
 }
 
 static TransactionActionDrv bdrv_attach_child_common_drv = {
@@ -2901,7 +2952,9 @@  static int bdrv_attach_child_common(BlockDriverState *child_bs,
     }
 
     bdrv_ref(child_bs);
-    bdrv_replace_child_noperm(&new_child, child_bs);
+    bdrv_replace_child_noperm(&new_child, child_bs, true);
+    /* child_bs was non-NULL, so new_child must not have been freed */
+    assert(new_child != NULL);
 
     *child = new_child;
 
@@ -2960,8 +3013,7 @@  static void bdrv_detach_child(BdrvChild **childp)
 {
     BlockDriverState *old_bs = (*childp)->bs;
 
-    bdrv_replace_child_noperm(childp, NULL);
-    bdrv_child_free(*childp);
+    bdrv_replace_child_noperm(childp, NULL, true);
 
     if (old_bs) {
         /*
@@ -4951,7 +5003,11 @@  static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
     }
 
     if (child->bs) {
-        bdrv_replace_child_tran(childp, NULL, tran);
+        /*
+         * Pass free_empty_child=false, we will free the child in
+         * bdrv_remove_filter_or_cow_child_commit()
+         */
+        bdrv_replace_child_tran(childp, NULL, tran, false);
     }
 
     s = g_new(BdrvRemoveFilterOrCowChild, 1);
@@ -4961,8 +5017,6 @@  static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
         .is_backing = (childp == &bs->backing),
     };
     tran_add(tran, &bdrv_remove_filter_or_cow_child_drv, s);
-
-    *childp = NULL;
 }
 
 /*
@@ -5005,7 +5059,7 @@  static int bdrv_replace_node_noperm(BlockDriverState *from,
          * Passing a pointer to the local variable @c is fine here, because
          * @to is not NULL, and so &c will not be attached to the transaction.
          */
-        bdrv_replace_child_tran(&c, to, tran);
+        bdrv_replace_child_tran(&c, to, tran, true);
     }
 
     return 0;
@@ -5161,7 +5215,9 @@  int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
     bdrv_drained_begin(old_bs);
     bdrv_drained_begin(new_bs);
 
-    bdrv_replace_child_tran(&child, new_bs, tran);
+    bdrv_replace_child_tran(&child, new_bs, tran, true);
+    /* @new_bs must have been non-NULL, so @child must not have been freed */
+    assert(child != NULL);
 
     found = g_hash_table_new(NULL, NULL);
     refresh_list = bdrv_topological_dfs(refresh_list, found, old_bs);