diff mbox series

[v4,6/7] memory: Do not create circular reference with subregion

Message ID 20240823-san-v4-6-a24c6dfa4ceb@daynix.com (mailing list archive)
State New, archived
Headers show
Series Fix check-qtest-ppc64 sanitizer errors | expand

Commit Message

Akihiko Odaki Aug. 23, 2024, 6:13 a.m. UTC
memory_region_update_container_subregions() used to call
memory_region_ref(), which creates a reference to the owner of the
subregion, on behalf of the owner of the container. This results in a
circular reference if the subregion and container have the same owner.

memory_region_ref() creates a reference to the owner instead of the
memory region to match the lifetime of the owner and memory region. We
do not need such a hack if the subregion and container have the same
owner because the owner will be alive as long as the container is.
Therefore, create a reference to the subregion itself instead ot its
owner in such a case; the reference to the subregion is still necessary
to ensure that the subregion gets finalized after the container.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 system/memory.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Peter Xu Aug. 26, 2024, 3:21 p.m. UTC | #1
On Fri, Aug 23, 2024 at 03:13:11PM +0900, Akihiko Odaki wrote:
> memory_region_update_container_subregions() used to call
> memory_region_ref(), which creates a reference to the owner of the
> subregion, on behalf of the owner of the container. This results in a
> circular reference if the subregion and container have the same owner.
> 
> memory_region_ref() creates a reference to the owner instead of the
> memory region to match the lifetime of the owner and memory region. We
> do not need such a hack if the subregion and container have the same
> owner because the owner will be alive as long as the container is.
> Therefore, create a reference to the subregion itself instead ot its
> owner in such a case; the reference to the subregion is still necessary
> to ensure that the subregion gets finalized after the container.
> 
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> ---
>  system/memory.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/system/memory.c b/system/memory.c
> index 5e6eb459d5de..e4d3e9d1f427 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -2612,7 +2612,9 @@ static void memory_region_update_container_subregions(MemoryRegion *subregion)
>  
>      memory_region_transaction_begin();
>  
> -    memory_region_ref(subregion);
> +    object_ref(mr->owner == subregion->owner ?
> +               OBJECT(subregion) : subregion->owner);

The only place that mr->refcount is used so far is the owner with the
object property attached to the mr, am I right (ignoring name-less MRs)?

I worry this will further complicate refcounting, now we're actively using
two refcounts for MRs..

Continue discussion there:

https://lore.kernel.org/r/067b17a4-cdfc-4f7e-b7e4-28c38e1c10f0@daynix.com

What I don't see is how mr->subregions differs from mr->container, so we
allow subregions to be attached but not the container when finalize()
(which is, afaict, the other way round).

It seems easier to me that we allow both container and subregions to exist
as long as within the owner itself, rather than start heavier use of
mr->refcount.

I tend to agree with you in another thread, where you mentioned it's better
we get rid of one of the refcounts. If not trivial to get, we should still
try to stick with one refcount to make it less chaos.

> +
>      QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
>          if (subregion->priority >= other->priority) {
>              QTAILQ_INSERT_BEFORE(other, subregion, subregions_link);
> @@ -2670,7 +2672,9 @@ void memory_region_del_subregion(MemoryRegion *mr,
>          assert(alias->mapped_via_alias >= 0);
>      }
>      QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link);
> -    memory_region_unref(subregion);
> +    object_unref(mr->owner == subregion->owner ?
> +                 OBJECT(subregion) : subregion->owner);
> +
>      memory_region_update_pending |= mr->enabled && subregion->enabled;
>      memory_region_transaction_commit();
>  }
> 
> -- 
> 2.46.0
>
Peter Maydell Aug. 26, 2024, 5:10 p.m. UTC | #2
On Mon, 26 Aug 2024 at 16:22, Peter Xu <peterx@redhat.com> wrote:
>
> On Fri, Aug 23, 2024 at 03:13:11PM +0900, Akihiko Odaki wrote:
> > memory_region_update_container_subregions() used to call
> > memory_region_ref(), which creates a reference to the owner of the
> > subregion, on behalf of the owner of the container. This results in a
> > circular reference if the subregion and container have the same owner.
> >
> > memory_region_ref() creates a reference to the owner instead of the
> > memory region to match the lifetime of the owner and memory region. We
> > do not need such a hack if the subregion and container have the same
> > owner because the owner will be alive as long as the container is.
> > Therefore, create a reference to the subregion itself instead ot its
> > owner in such a case; the reference to the subregion is still necessary
> > to ensure that the subregion gets finalized after the container.
> >
> > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > ---
> >  system/memory.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/system/memory.c b/system/memory.c
> > index 5e6eb459d5de..e4d3e9d1f427 100644
> > --- a/system/memory.c
> > +++ b/system/memory.c
> > @@ -2612,7 +2612,9 @@ static void memory_region_update_container_subregions(MemoryRegion *subregion)
> >
> >      memory_region_transaction_begin();
> >
> > -    memory_region_ref(subregion);
> > +    object_ref(mr->owner == subregion->owner ?
> > +               OBJECT(subregion) : subregion->owner);
>
> The only place that mr->refcount is used so far is the owner with the
> object property attached to the mr, am I right (ignoring name-less MRs)?
>
> I worry this will further complicate refcounting, now we're actively using
> two refcounts for MRs..
>
> Continue discussion there:
>
> https://lore.kernel.org/r/067b17a4-cdfc-4f7e-b7e4-28c38e1c10f0@daynix.com
>
> What I don't see is how mr->subregions differs from mr->container, so we
> allow subregions to be attached but not the container when finalize()
> (which is, afaict, the other way round).
>
> It seems easier to me that we allow both container and subregions to exist
> as long as within the owner itself, rather than start heavier use of
> mr->refcount.

I don't think just "same owner" necessarily will be workable --
you can have a setup like:
  * device A has a container C_A
  * device A has a child-device B
  * device B has a memory region R_B
  * device A's realize method puts R_B into C_A

R_B's owner is B, and the container's owner is A,
but we still want to be able to get rid of A (in the process
getting rid of B because it gets unparented and unreffed,
and R_B and C_A also).

-- PMM
Peter Xu Aug. 26, 2024, 7:42 p.m. UTC | #3
On Mon, Aug 26, 2024 at 06:10:25PM +0100, Peter Maydell wrote:
> On Mon, 26 Aug 2024 at 16:22, Peter Xu <peterx@redhat.com> wrote:
> >
> > On Fri, Aug 23, 2024 at 03:13:11PM +0900, Akihiko Odaki wrote:
> > > memory_region_update_container_subregions() used to call
> > > memory_region_ref(), which creates a reference to the owner of the
> > > subregion, on behalf of the owner of the container. This results in a
> > > circular reference if the subregion and container have the same owner.
> > >
> > > memory_region_ref() creates a reference to the owner instead of the
> > > memory region to match the lifetime of the owner and memory region. We
> > > do not need such a hack if the subregion and container have the same
> > > owner because the owner will be alive as long as the container is.
> > > Therefore, create a reference to the subregion itself instead ot its
> > > owner in such a case; the reference to the subregion is still necessary
> > > to ensure that the subregion gets finalized after the container.
> > >
> > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > ---
> > >  system/memory.c | 8 ++++++--
> > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/system/memory.c b/system/memory.c
> > > index 5e6eb459d5de..e4d3e9d1f427 100644
> > > --- a/system/memory.c
> > > +++ b/system/memory.c
> > > @@ -2612,7 +2612,9 @@ static void memory_region_update_container_subregions(MemoryRegion *subregion)
> > >
> > >      memory_region_transaction_begin();
> > >
> > > -    memory_region_ref(subregion);
> > > +    object_ref(mr->owner == subregion->owner ?
> > > +               OBJECT(subregion) : subregion->owner);
> >
> > The only place that mr->refcount is used so far is the owner with the
> > object property attached to the mr, am I right (ignoring name-less MRs)?
> >
> > I worry this will further complicate refcounting, now we're actively using
> > two refcounts for MRs..
> >
> > Continue discussion there:
> >
> > https://lore.kernel.org/r/067b17a4-cdfc-4f7e-b7e4-28c38e1c10f0@daynix.com
> >
> > What I don't see is how mr->subregions differs from mr->container, so we
> > allow subregions to be attached but not the container when finalize()
> > (which is, afaict, the other way round).
> >
> > It seems easier to me that we allow both container and subregions to exist
> > as long as within the owner itself, rather than start heavier use of
> > mr->refcount.
> 
> I don't think just "same owner" necessarily will be workable --
> you can have a setup like:
>   * device A has a container C_A
>   * device A has a child-device B
>   * device B has a memory region R_B
>   * device A's realize method puts R_B into C_A
> 
> R_B's owner is B, and the container's owner is A,
> but we still want to be able to get rid of A (in the process
> getting rid of B because it gets unparented and unreffed,
> and R_B and C_A also).

For cross-device references, should we rely on an explicit call to
memory_region_del_subregion(), so as to detach the link between C_A and
R_B?

My understanding so far: logically when MR finalize() it should guarantee
both (1) mr->container==NULL, and (2) mr->subregions empty.  That's before
commit 2e2b8eb70fdb7dfb and could be the ideal world (though at the very
beginning we don't assert on ->container==NULL yet).  It requires all
device emulations to do proper unrealize() to unlink all the MRs.

However what I'm guessing is QEMU probably used to have lots of devices
that are not following the rules and leaking these links.  Hence we have
had 2e2b8eb70fdb7dfb, allowing that to happen as long as it's safe, and
it's justified by comment in 2e2b8eb70fdb7dfb on why it's safe.

What I was thinking is this comment seems to apply too to mr->container, so
that it should be safe too to unlink ->container the same way as its own
subregions.

IIUC that means for device-internal MR links we should be fine leaving
whatever link between MRs owned by such device; the device->refcount
guarantees none of them will be visible in any AS.  But then we need to
always properly unlink the MRs when the link is across >1 device owners,
otherwise it's prone to leak.
Akihiko Odaki Aug. 27, 2024, 4:14 a.m. UTC | #4
On 2024/08/27 4:42, Peter Xu wrote:
> On Mon, Aug 26, 2024 at 06:10:25PM +0100, Peter Maydell wrote:
>> On Mon, 26 Aug 2024 at 16:22, Peter Xu <peterx@redhat.com> wrote:
>>>
>>> On Fri, Aug 23, 2024 at 03:13:11PM +0900, Akihiko Odaki wrote:
>>>> memory_region_update_container_subregions() used to call
>>>> memory_region_ref(), which creates a reference to the owner of the
>>>> subregion, on behalf of the owner of the container. This results in a
>>>> circular reference if the subregion and container have the same owner.
>>>>
>>>> memory_region_ref() creates a reference to the owner instead of the
>>>> memory region to match the lifetime of the owner and memory region. We
>>>> do not need such a hack if the subregion and container have the same
>>>> owner because the owner will be alive as long as the container is.
>>>> Therefore, create a reference to the subregion itself instead ot its
>>>> owner in such a case; the reference to the subregion is still necessary
>>>> to ensure that the subregion gets finalized after the container.
>>>>
>>>> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
>>>> ---
>>>>   system/memory.c | 8 ++++++--
>>>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/system/memory.c b/system/memory.c
>>>> index 5e6eb459d5de..e4d3e9d1f427 100644
>>>> --- a/system/memory.c
>>>> +++ b/system/memory.c
>>>> @@ -2612,7 +2612,9 @@ static void memory_region_update_container_subregions(MemoryRegion *subregion)
>>>>
>>>>       memory_region_transaction_begin();
>>>>
>>>> -    memory_region_ref(subregion);
>>>> +    object_ref(mr->owner == subregion->owner ?
>>>> +               OBJECT(subregion) : subregion->owner);
>>>
>>> The only place that mr->refcount is used so far is the owner with the
>>> object property attached to the mr, am I right (ignoring name-less MRs)?
>>>
>>> I worry this will further complicate refcounting, now we're actively using
>>> two refcounts for MRs..

The actor of object_ref() is the owner of the memory region also in this 
case. We are calling object_ref() on behalf of mr->owner so we use 
mr->refcount iff mr->owner == subregion->owner. In this sense there is 
only one user of mr->refcount even after this change.

>>>
>>> Continue discussion there:
>>>
>>> https://lore.kernel.org/r/067b17a4-cdfc-4f7e-b7e4-28c38e1c10f0@daynix.com
>>>
>>> What I don't see is how mr->subregions differs from mr->container, so we
>>> allow subregions to be attached but not the container when finalize()
>>> (which is, afaict, the other way round).
>>>
>>> It seems easier to me that we allow both container and subregions to exist
>>> as long as within the owner itself, rather than start heavier use of
>>> mr->refcount.
>>
>> I don't think just "same owner" necessarily will be workable --
>> you can have a setup like:
>>    * device A has a container C_A
>>    * device A has a child-device B
>>    * device B has a memory region R_B
>>    * device A's realize method puts R_B into C_A
>>
>> R_B's owner is B, and the container's owner is A,
>> but we still want to be able to get rid of A (in the process
>> getting rid of B because it gets unparented and unreffed,
>> and R_B and C_A also).
> 
> For cross-device references, should we rely on an explicit call to
> memory_region_del_subregion(), so as to detach the link between C_A and
> R_B?

Yes, I agree.

> 
> My understanding so far: logically when MR finalize() it should guarantee
> both (1) mr->container==NULL, and (2) mr->subregions empty.  That's before
> commit 2e2b8eb70fdb7dfb and could be the ideal world (though at the very
> beginning we don't assert on ->container==NULL yet).  It requires all
> device emulations to do proper unrealize() to unlink all the MRs.
> 
> However what I'm guessing is QEMU probably used to have lots of devices
> that are not following the rules and leaking these links.  Hence we have
> had 2e2b8eb70fdb7dfb, allowing that to happen as long as it's safe, and
> it's justified by comment in 2e2b8eb70fdb7dfb on why it's safe.
> 
> What I was thinking is this comment seems to apply too to mr->container, so
> that it should be safe too to unlink ->container the same way as its own
> subregions. >
> IIUC that means for device-internal MR links we should be fine leaving
> whatever link between MRs owned by such device; the device->refcount
> guarantees none of them will be visible in any AS.  But then we need to
> always properly unlink the MRs when the link is across >1 device owners,
> otherwise it's prone to leak.

There is one principle we must satisfy in general: keep a reference to a 
memory region if it is visible to the guest.

It is safe to call memory_region_del_subregion() and to trigger the 
finalization of subregions when the container is not referenced because 
they are no longer visible. This is not true for the other way around; 
even when subregions are not referenced by anyone else, they are still 
visible to the guest as long as the container is visible to the guest. 
It is not safe to unref and finalize them in such a case.

A memory region and its owner will leak if a memory region kept visible 
for a too long period whether the chain of reference contains a 
container/subregion relationship or not.

Regards,
Akihiko Odaki
Peter Xu Aug. 27, 2024, 4:11 p.m. UTC | #5
On Tue, Aug 27, 2024 at 01:14:51PM +0900, Akihiko Odaki wrote:
> On 2024/08/27 4:42, Peter Xu wrote:
> > On Mon, Aug 26, 2024 at 06:10:25PM +0100, Peter Maydell wrote:
> > > On Mon, 26 Aug 2024 at 16:22, Peter Xu <peterx@redhat.com> wrote:
> > > > 
> > > > On Fri, Aug 23, 2024 at 03:13:11PM +0900, Akihiko Odaki wrote:
> > > > > memory_region_update_container_subregions() used to call
> > > > > memory_region_ref(), which creates a reference to the owner of the
> > > > > subregion, on behalf of the owner of the container. This results in a
> > > > > circular reference if the subregion and container have the same owner.
> > > > > 
> > > > > memory_region_ref() creates a reference to the owner instead of the
> > > > > memory region to match the lifetime of the owner and memory region. We
> > > > > do not need such a hack if the subregion and container have the same
> > > > > owner because the owner will be alive as long as the container is.
> > > > > Therefore, create a reference to the subregion itself instead ot its
> > > > > owner in such a case; the reference to the subregion is still necessary
> > > > > to ensure that the subregion gets finalized after the container.
> > > > > 
> > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > > > ---
> > > > >   system/memory.c | 8 ++++++--
> > > > >   1 file changed, 6 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/system/memory.c b/system/memory.c
> > > > > index 5e6eb459d5de..e4d3e9d1f427 100644
> > > > > --- a/system/memory.c
> > > > > +++ b/system/memory.c
> > > > > @@ -2612,7 +2612,9 @@ static void memory_region_update_container_subregions(MemoryRegion *subregion)
> > > > > 
> > > > >       memory_region_transaction_begin();
> > > > > 
> > > > > -    memory_region_ref(subregion);
> > > > > +    object_ref(mr->owner == subregion->owner ?
> > > > > +               OBJECT(subregion) : subregion->owner);
> > > > 
> > > > The only place that mr->refcount is used so far is the owner with the
> > > > object property attached to the mr, am I right (ignoring name-less MRs)?
> > > > 
> > > > I worry this will further complicate refcounting, now we're actively using
> > > > two refcounts for MRs..
> 
> The actor of object_ref() is the owner of the memory region also in this
> case. We are calling object_ref() on behalf of mr->owner so we use
> mr->refcount iff mr->owner == subregion->owner. In this sense there is only
> one user of mr->refcount even after this change.

Yes it's still one user, but it's not that straightforward to see, also
it's still an extension to how we use mr->refcount right now.  Currently
it's about "true / false" just to describe, now it's a real counter.

I wished that counter doesn't even exist if we'd like to stick with device
/ owner's counter.  Adding this can definitely also make further effort
harder if we want to remove mr->refcount.

> 
> > > > 
> > > > Continue discussion there:
> > > > 
> > > > https://lore.kernel.org/r/067b17a4-cdfc-4f7e-b7e4-28c38e1c10f0@daynix.com
> > > > 
> > > > What I don't see is how mr->subregions differs from mr->container, so we
> > > > allow subregions to be attached but not the container when finalize()
> > > > (which is, afaict, the other way round).
> > > > 
> > > > It seems easier to me that we allow both container and subregions to exist
> > > > as long as within the owner itself, rather than start heavier use of
> > > > mr->refcount.
> > > 
> > > I don't think just "same owner" necessarily will be workable --
> > > you can have a setup like:
> > >    * device A has a container C_A
> > >    * device A has a child-device B
> > >    * device B has a memory region R_B
> > >    * device A's realize method puts R_B into C_A
> > > 
> > > R_B's owner is B, and the container's owner is A,
> > > but we still want to be able to get rid of A (in the process
> > > getting rid of B because it gets unparented and unreffed,
> > > and R_B and C_A also).
> > 
> > For cross-device references, should we rely on an explicit call to
> > memory_region_del_subregion(), so as to detach the link between C_A and
> > R_B?
> 
> Yes, I agree.
> 
> > 
> > My understanding so far: logically when MR finalize() it should guarantee
> > both (1) mr->container==NULL, and (2) mr->subregions empty.  That's before
> > commit 2e2b8eb70fdb7dfb and could be the ideal world (though at the very
> > beginning we don't assert on ->container==NULL yet).  It requires all
> > device emulations to do proper unrealize() to unlink all the MRs.
> > 
> > However what I'm guessing is QEMU probably used to have lots of devices
> > that are not following the rules and leaking these links.  Hence we have
> > had 2e2b8eb70fdb7dfb, allowing that to happen as long as it's safe, and
> > it's justified by comment in 2e2b8eb70fdb7dfb on why it's safe.
> > 
> > What I was thinking is this comment seems to apply too to mr->container, so
> > that it should be safe too to unlink ->container the same way as its own
> > subregions. >
> > IIUC that means for device-internal MR links we should be fine leaving
> > whatever link between MRs owned by such device; the device->refcount
> > guarantees none of them will be visible in any AS.  But then we need to
> > always properly unlink the MRs when the link is across >1 device owners,
> > otherwise it's prone to leak.
> 
> There is one principle we must satisfy in general: keep a reference to a
> memory region if it is visible to the guest.
> 
> It is safe to call memory_region_del_subregion() and to trigger the
> finalization of subregions when the container is not referenced because they
> are no longer visible. This is not true for the other way around; even when
> subregions are not referenced by anyone else, they are still visible to the
> guest as long as the container is visible to the guest. It is not safe to
> unref and finalize them in such a case.
> 
> A memory region and its owner will leak if a memory region kept visible for
> a too long period whether the chain of reference contains a
> container/subregion relationship or not.

Could you elaborate why it's still visible to the guest if
owner->refcount==0 && mr->container!=NULL?

Firstly, mr->container != NULL means the MR has an user indeed.  It's the
matter of who's using it.  If that came from outside this device, it should
require memory_region_ref(mr) before hand when adding the subregion, and
that will hold one reference on the owner->refcount.

Here owner->refcount==0 means there's no such reference, so it seems to me
it's guaranteed to not be visible to anything outside of this device / owner.
Then from that POV it's safe to unlink when the owner is finalizing just
like what we do with mr->subregions, no?
Akihiko Odaki Aug. 28, 2024, 5:33 a.m. UTC | #6
On 2024/08/28 1:11, Peter Xu wrote:
> On Tue, Aug 27, 2024 at 01:14:51PM +0900, Akihiko Odaki wrote:
>> On 2024/08/27 4:42, Peter Xu wrote:
>>> On Mon, Aug 26, 2024 at 06:10:25PM +0100, Peter Maydell wrote:
>>>> On Mon, 26 Aug 2024 at 16:22, Peter Xu <peterx@redhat.com> wrote:
>>>>>
>>>>> On Fri, Aug 23, 2024 at 03:13:11PM +0900, Akihiko Odaki wrote:
>>>>>> memory_region_update_container_subregions() used to call
>>>>>> memory_region_ref(), which creates a reference to the owner of the
>>>>>> subregion, on behalf of the owner of the container. This results in a
>>>>>> circular reference if the subregion and container have the same owner.
>>>>>>
>>>>>> memory_region_ref() creates a reference to the owner instead of the
>>>>>> memory region to match the lifetime of the owner and memory region. We
>>>>>> do not need such a hack if the subregion and container have the same
>>>>>> owner because the owner will be alive as long as the container is.
>>>>>> Therefore, create a reference to the subregion itself instead ot its
>>>>>> owner in such a case; the reference to the subregion is still necessary
>>>>>> to ensure that the subregion gets finalized after the container.
>>>>>>
>>>>>> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
>>>>>> ---
>>>>>>    system/memory.c | 8 ++++++--
>>>>>>    1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/system/memory.c b/system/memory.c
>>>>>> index 5e6eb459d5de..e4d3e9d1f427 100644
>>>>>> --- a/system/memory.c
>>>>>> +++ b/system/memory.c
>>>>>> @@ -2612,7 +2612,9 @@ static void memory_region_update_container_subregions(MemoryRegion *subregion)
>>>>>>
>>>>>>        memory_region_transaction_begin();
>>>>>>
>>>>>> -    memory_region_ref(subregion);
>>>>>> +    object_ref(mr->owner == subregion->owner ?
>>>>>> +               OBJECT(subregion) : subregion->owner);
>>>>>
>>>>> The only place that mr->refcount is used so far is the owner with the
>>>>> object property attached to the mr, am I right (ignoring name-less MRs)?
>>>>>
>>>>> I worry this will further complicate refcounting, now we're actively using
>>>>> two refcounts for MRs..
>>
>> The actor of object_ref() is the owner of the memory region also in this
>> case. We are calling object_ref() on behalf of mr->owner so we use
>> mr->refcount iff mr->owner == subregion->owner. In this sense there is only
>> one user of mr->refcount even after this change.
> 
> Yes it's still one user, but it's not that straightforward to see, also
> it's still an extension to how we use mr->refcount right now.  Currently
> it's about "true / false" just to describe, now it's a real counter.
> 
> I wished that counter doesn't even exist if we'd like to stick with device
> / owner's counter.  Adding this can definitely also make further effort
> harder if we want to remove mr->refcount.

I don't think it will make removing mr->refcount harder. With this 
change, mr->refcount will count the parent and container. If we remove 
mr->refcount, we need to trigger object_finalize() in a way other than 
checking mr->refcount, which can be achieved by simply evaluating 
OBJECT(mr)->parent && mr->container.

> 
>>
>>>>>
>>>>> Continue discussion there:
>>>>>
>>>>> https://lore.kernel.org/r/067b17a4-cdfc-4f7e-b7e4-28c38e1c10f0@daynix.com
>>>>>
>>>>> What I don't see is how mr->subregions differs from mr->container, so we
>>>>> allow subregions to be attached but not the container when finalize()
>>>>> (which is, afaict, the other way round).
>>>>>
>>>>> It seems easier to me that we allow both container and subregions to exist
>>>>> as long as within the owner itself, rather than start heavier use of
>>>>> mr->refcount.
>>>>
>>>> I don't think just "same owner" necessarily will be workable --
>>>> you can have a setup like:
>>>>     * device A has a container C_A
>>>>     * device A has a child-device B
>>>>     * device B has a memory region R_B
>>>>     * device A's realize method puts R_B into C_A
>>>>
>>>> R_B's owner is B, and the container's owner is A,
>>>> but we still want to be able to get rid of A (in the process
>>>> getting rid of B because it gets unparented and unreffed,
>>>> and R_B and C_A also).
>>>
>>> For cross-device references, should we rely on an explicit call to
>>> memory_region_del_subregion(), so as to detach the link between C_A and
>>> R_B?
>>
>> Yes, I agree.
>>
>>>
>>> My understanding so far: logically when MR finalize() it should guarantee
>>> both (1) mr->container==NULL, and (2) mr->subregions empty.  That's before
>>> commit 2e2b8eb70fdb7dfb and could be the ideal world (though at the very
>>> beginning we don't assert on ->container==NULL yet).  It requires all
>>> device emulations to do proper unrealize() to unlink all the MRs.
>>>
>>> However what I'm guessing is QEMU probably used to have lots of devices
>>> that are not following the rules and leaking these links.  Hence we have
>>> had 2e2b8eb70fdb7dfb, allowing that to happen as long as it's safe, and
>>> it's justified by comment in 2e2b8eb70fdb7dfb on why it's safe.
>>>
>>> What I was thinking is this comment seems to apply too to mr->container, so
>>> that it should be safe too to unlink ->container the same way as its own
>>> subregions. >
>>> IIUC that means for device-internal MR links we should be fine leaving
>>> whatever link between MRs owned by such device; the device->refcount
>>> guarantees none of them will be visible in any AS.  But then we need to
>>> always properly unlink the MRs when the link is across >1 device owners,
>>> otherwise it's prone to leak.
>>
>> There is one principle we must satisfy in general: keep a reference to a
>> memory region if it is visible to the guest.
>>
>> It is safe to call memory_region_del_subregion() and to trigger the
>> finalization of subregions when the container is not referenced because they
>> are no longer visible. This is not true for the other way around; even when
>> subregions are not referenced by anyone else, they are still visible to the
>> guest as long as the container is visible to the guest. It is not safe to
>> unref and finalize them in such a case.
>>
>> A memory region and its owner will leak if a memory region kept visible for
>> a too long period whether the chain of reference contains a
>> container/subregion relationship or not.
> 
> Could you elaborate why it's still visible to the guest if
> owner->refcount==0 && mr->container!=NULL?
> 
> Firstly, mr->container != NULL means the MR has an user indeed.  It's the
> matter of who's using it.  If that came from outside this device, it should
> require memory_region_ref(mr) before hand when adding the subregion, and
> that will hold one reference on the owner->refcount.
> 
> Here owner->refcount==0 means there's no such reference, so it seems to me
> it's guaranteed to not be visible to anything outside of this device / owner.
> Then from that POV it's safe to unlink when the owner is finalizing just
> like what we do with mr->subregions, no?

An object is alive during instance_finalize even though its refcount == 
0. We can't assume all memory regions are dead even if owner->refcount 
== 0 because of that. In particular, docs/devel/memory.rst says you can 
call object_unparent() in the instance_finalize of the owner. This 
assumes a memory region will not vanish during the execution of the 
function unless object_unparent() is already called for the memory region.

Regards,
Akihiko Odaki
Peter Xu Aug. 28, 2024, 1:09 p.m. UTC | #7
On Wed, Aug 28, 2024 at 02:33:59PM +0900, Akihiko Odaki wrote:
> On 2024/08/28 1:11, Peter Xu wrote:
> > On Tue, Aug 27, 2024 at 01:14:51PM +0900, Akihiko Odaki wrote:
> > > On 2024/08/27 4:42, Peter Xu wrote:
> > > > On Mon, Aug 26, 2024 at 06:10:25PM +0100, Peter Maydell wrote:
> > > > > On Mon, 26 Aug 2024 at 16:22, Peter Xu <peterx@redhat.com> wrote:
> > > > > > 
> > > > > > On Fri, Aug 23, 2024 at 03:13:11PM +0900, Akihiko Odaki wrote:
> > > > > > > memory_region_update_container_subregions() used to call
> > > > > > > memory_region_ref(), which creates a reference to the owner of the
> > > > > > > subregion, on behalf of the owner of the container. This results in a
> > > > > > > circular reference if the subregion and container have the same owner.
> > > > > > > 
> > > > > > > memory_region_ref() creates a reference to the owner instead of the
> > > > > > > memory region to match the lifetime of the owner and memory region. We
> > > > > > > do not need such a hack if the subregion and container have the same
> > > > > > > owner because the owner will be alive as long as the container is.
> > > > > > > Therefore, create a reference to the subregion itself instead ot its
> > > > > > > owner in such a case; the reference to the subregion is still necessary
> > > > > > > to ensure that the subregion gets finalized after the container.
> > > > > > > 
> > > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > > > > > ---
> > > > > > >    system/memory.c | 8 ++++++--
> > > > > > >    1 file changed, 6 insertions(+), 2 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/system/memory.c b/system/memory.c
> > > > > > > index 5e6eb459d5de..e4d3e9d1f427 100644
> > > > > > > --- a/system/memory.c
> > > > > > > +++ b/system/memory.c
> > > > > > > @@ -2612,7 +2612,9 @@ static void memory_region_update_container_subregions(MemoryRegion *subregion)
> > > > > > > 
> > > > > > >        memory_region_transaction_begin();
> > > > > > > 
> > > > > > > -    memory_region_ref(subregion);
> > > > > > > +    object_ref(mr->owner == subregion->owner ?
> > > > > > > +               OBJECT(subregion) : subregion->owner);
> > > > > > 
> > > > > > The only place that mr->refcount is used so far is the owner with the
> > > > > > object property attached to the mr, am I right (ignoring name-less MRs)?
> > > > > > 
> > > > > > I worry this will further complicate refcounting, now we're actively using
> > > > > > two refcounts for MRs..
> > > 
> > > The actor of object_ref() is the owner of the memory region also in this
> > > case. We are calling object_ref() on behalf of mr->owner so we use
> > > mr->refcount iff mr->owner == subregion->owner. In this sense there is only
> > > one user of mr->refcount even after this change.
> > 
> > Yes it's still one user, but it's not that straightforward to see, also
> > it's still an extension to how we use mr->refcount right now.  Currently
> > it's about "true / false" just to describe, now it's a real counter.
> > 
> > I wished that counter doesn't even exist if we'd like to stick with device
> > / owner's counter.  Adding this can definitely also make further effort
> > harder if we want to remove mr->refcount.
> 
> I don't think it will make removing mr->refcount harder. With this change,
> mr->refcount will count the parent and container. If we remove mr->refcount,
> we need to trigger object_finalize() in a way other than checking
> mr->refcount, which can be achieved by simply evaluating OBJECT(mr)->parent
> && mr->container.
> 
> > 
> > > 
> > > > > > 
> > > > > > Continue discussion there:
> > > > > > 
> > > > > > https://lore.kernel.org/r/067b17a4-cdfc-4f7e-b7e4-28c38e1c10f0@daynix.com
> > > > > > 
> > > > > > What I don't see is how mr->subregions differs from mr->container, so we
> > > > > > allow subregions to be attached but not the container when finalize()
> > > > > > (which is, afaict, the other way round).
> > > > > > 
> > > > > > It seems easier to me that we allow both container and subregions to exist
> > > > > > as long as within the owner itself, rather than start heavier use of
> > > > > > mr->refcount.
> > > > > 
> > > > > I don't think just "same owner" necessarily will be workable --
> > > > > you can have a setup like:
> > > > >     * device A has a container C_A
> > > > >     * device A has a child-device B
> > > > >     * device B has a memory region R_B
> > > > >     * device A's realize method puts R_B into C_A
> > > > > 
> > > > > R_B's owner is B, and the container's owner is A,
> > > > > but we still want to be able to get rid of A (in the process
> > > > > getting rid of B because it gets unparented and unreffed,
> > > > > and R_B and C_A also).
> > > > 
> > > > For cross-device references, should we rely on an explicit call to
> > > > memory_region_del_subregion(), so as to detach the link between C_A and
> > > > R_B?
> > > 
> > > Yes, I agree.
> > > 
> > > > 
> > > > My understanding so far: logically when MR finalize() it should guarantee
> > > > both (1) mr->container==NULL, and (2) mr->subregions empty.  That's before
> > > > commit 2e2b8eb70fdb7dfb and could be the ideal world (though at the very
> > > > beginning we don't assert on ->container==NULL yet).  It requires all
> > > > device emulations to do proper unrealize() to unlink all the MRs.
> > > > 
> > > > However what I'm guessing is QEMU probably used to have lots of devices
> > > > that are not following the rules and leaking these links.  Hence we have
> > > > had 2e2b8eb70fdb7dfb, allowing that to happen as long as it's safe, and
> > > > it's justified by comment in 2e2b8eb70fdb7dfb on why it's safe.
> > > > 
> > > > What I was thinking is this comment seems to apply too to mr->container, so
> > > > that it should be safe too to unlink ->container the same way as its own
> > > > subregions. >
> > > > IIUC that means for device-internal MR links we should be fine leaving
> > > > whatever link between MRs owned by such device; the device->refcount
> > > > guarantees none of them will be visible in any AS.  But then we need to
> > > > always properly unlink the MRs when the link is across >1 device owners,
> > > > otherwise it's prone to leak.
> > > 
> > > There is one principle we must satisfy in general: keep a reference to a
> > > memory region if it is visible to the guest.
> > > 
> > > It is safe to call memory_region_del_subregion() and to trigger the
> > > finalization of subregions when the container is not referenced because they
> > > are no longer visible. This is not true for the other way around; even when
> > > subregions are not referenced by anyone else, they are still visible to the
> > > guest as long as the container is visible to the guest. It is not safe to
> > > unref and finalize them in such a case.
> > > 
> > > A memory region and its owner will leak if a memory region kept visible for
> > > a too long period whether the chain of reference contains a
> > > container/subregion relationship or not.
> > 
> > Could you elaborate why it's still visible to the guest if
> > owner->refcount==0 && mr->container!=NULL?
> > 
> > Firstly, mr->container != NULL means the MR has an user indeed.  It's the
> > matter of who's using it.  If that came from outside this device, it should
> > require memory_region_ref(mr) before hand when adding the subregion, and
> > that will hold one reference on the owner->refcount.
> > 
> > Here owner->refcount==0 means there's no such reference, so it seems to me
> > it's guaranteed to not be visible to anything outside of this device / owner.
> > Then from that POV it's safe to unlink when the owner is finalizing just
> > like what we do with mr->subregions, no?
> 
> An object is alive during instance_finalize even though its refcount == 0.
> We can't assume all memory regions are dead even if owner->refcount == 0
> because of that.

When you referred to "an object", do you mean the MR being finalized here?

IIUC when the MR reaches its finalize(), it should mean it's not live
anymore.

We have two forms of MR usages right now: either embeded in another Object
/ Device, or dynamically, like VFIOQuirk.

When used embeded, the MR is only finalized when being removed from the
object's property list, that should only happen when the object / device
triggered finalize().  Since the MR will use the owner->refcount so I
suppose it means the MR is not live anymore.

When used dynamically, object_unparent() is needed but that should only
happen when the object / owner is during finalize(), per document:

        If however the memory region is part of a dynamically allocated
        data structure, you should call object_unparent() to destroy the
        memory region before the data structure is freed.  For an example
        see VFIOMSIXInfo and VFIOQuirk in hw/vfio/pci.c.

Then the MR is also not live.

> In particular, docs/devel/memory.rst says you can call object_unparent()
> in the instance_finalize of the owner. This assumes a memory region will
> not vanish during the execution of the function unless object_unparent()
> is already called for the memory region.

Yes, the MR will not vanish during finalize() of the owner object. However
I don't think it's "live"?  Again, it's based on my definition of
"liveness" as "taking one refcount of its owner", here the owner refcount
is zero.  IOW, I don't expect it to be accessible anywhere from any address
space (e.g. address_space_map()), because they'll all use
memory_region_ref() and that'll ultimately stops the owner from being
finalized.

Thanks,
Akihiko Odaki Aug. 28, 2024, 2:02 p.m. UTC | #8
On 2024/08/28 22:09, Peter Xu wrote:
> On Wed, Aug 28, 2024 at 02:33:59PM +0900, Akihiko Odaki wrote:
>> On 2024/08/28 1:11, Peter Xu wrote:
>>> On Tue, Aug 27, 2024 at 01:14:51PM +0900, Akihiko Odaki wrote:
>>>> On 2024/08/27 4:42, Peter Xu wrote:
>>>>> On Mon, Aug 26, 2024 at 06:10:25PM +0100, Peter Maydell wrote:
>>>>>> On Mon, 26 Aug 2024 at 16:22, Peter Xu <peterx@redhat.com> wrote:
>>>>>>>
>>>>>>> On Fri, Aug 23, 2024 at 03:13:11PM +0900, Akihiko Odaki wrote:
>>>>>>>> memory_region_update_container_subregions() used to call
>>>>>>>> memory_region_ref(), which creates a reference to the owner of the
>>>>>>>> subregion, on behalf of the owner of the container. This results in a
>>>>>>>> circular reference if the subregion and container have the same owner.
>>>>>>>>
>>>>>>>> memory_region_ref() creates a reference to the owner instead of the
>>>>>>>> memory region to match the lifetime of the owner and memory region. We
>>>>>>>> do not need such a hack if the subregion and container have the same
>>>>>>>> owner because the owner will be alive as long as the container is.
>>>>>>>> Therefore, create a reference to the subregion itself instead ot its
>>>>>>>> owner in such a case; the reference to the subregion is still necessary
>>>>>>>> to ensure that the subregion gets finalized after the container.
>>>>>>>>
>>>>>>>> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
>>>>>>>> ---
>>>>>>>>     system/memory.c | 8 ++++++--
>>>>>>>>     1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/system/memory.c b/system/memory.c
>>>>>>>> index 5e6eb459d5de..e4d3e9d1f427 100644
>>>>>>>> --- a/system/memory.c
>>>>>>>> +++ b/system/memory.c
>>>>>>>> @@ -2612,7 +2612,9 @@ static void memory_region_update_container_subregions(MemoryRegion *subregion)
>>>>>>>>
>>>>>>>>         memory_region_transaction_begin();
>>>>>>>>
>>>>>>>> -    memory_region_ref(subregion);
>>>>>>>> +    object_ref(mr->owner == subregion->owner ?
>>>>>>>> +               OBJECT(subregion) : subregion->owner);
>>>>>>>
>>>>>>> The only place that mr->refcount is used so far is the owner with the
>>>>>>> object property attached to the mr, am I right (ignoring name-less MRs)?
>>>>>>>
>>>>>>> I worry this will further complicate refcounting, now we're actively using
>>>>>>> two refcounts for MRs..
>>>>
>>>> The actor of object_ref() is the owner of the memory region also in this
>>>> case. We are calling object_ref() on behalf of mr->owner so we use
>>>> mr->refcount iff mr->owner == subregion->owner. In this sense there is only
>>>> one user of mr->refcount even after this change.
>>>
>>> Yes it's still one user, but it's not that straightforward to see, also
>>> it's still an extension to how we use mr->refcount right now.  Currently
>>> it's about "true / false" just to describe, now it's a real counter.
>>>
>>> I wished that counter doesn't even exist if we'd like to stick with device
>>> / owner's counter.  Adding this can definitely also make further effort
>>> harder if we want to remove mr->refcount.
>>
>> I don't think it will make removing mr->refcount harder. With this change,
>> mr->refcount will count the parent and container. If we remove mr->refcount,
>> we need to trigger object_finalize() in a way other than checking
>> mr->refcount, which can be achieved by simply evaluating OBJECT(mr)->parent
>> && mr->container.
>>
>>>
>>>>
>>>>>>>
>>>>>>> Continue discussion there:
>>>>>>>
>>>>>>> https://lore.kernel.org/r/067b17a4-cdfc-4f7e-b7e4-28c38e1c10f0@daynix.com
>>>>>>>
>>>>>>> What I don't see is how mr->subregions differs from mr->container, so we
>>>>>>> allow subregions to be attached but not the container when finalize()
>>>>>>> (which is, afaict, the other way round).
>>>>>>>
>>>>>>> It seems easier to me that we allow both container and subregions to exist
>>>>>>> as long as within the owner itself, rather than start heavier use of
>>>>>>> mr->refcount.
>>>>>>
>>>>>> I don't think just "same owner" necessarily will be workable --
>>>>>> you can have a setup like:
>>>>>>      * device A has a container C_A
>>>>>>      * device A has a child-device B
>>>>>>      * device B has a memory region R_B
>>>>>>      * device A's realize method puts R_B into C_A
>>>>>>
>>>>>> R_B's owner is B, and the container's owner is A,
>>>>>> but we still want to be able to get rid of A (in the process
>>>>>> getting rid of B because it gets unparented and unreffed,
>>>>>> and R_B and C_A also).
>>>>>
>>>>> For cross-device references, should we rely on an explicit call to
>>>>> memory_region_del_subregion(), so as to detach the link between C_A and
>>>>> R_B?
>>>>
>>>> Yes, I agree.
>>>>
>>>>>
>>>>> My understanding so far: logically when MR finalize() it should guarantee
>>>>> both (1) mr->container==NULL, and (2) mr->subregions empty.  That's before
>>>>> commit 2e2b8eb70fdb7dfb and could be the ideal world (though at the very
>>>>> beginning we don't assert on ->container==NULL yet).  It requires all
>>>>> device emulations to do proper unrealize() to unlink all the MRs.
>>>>>
>>>>> However what I'm guessing is QEMU probably used to have lots of devices
>>>>> that are not following the rules and leaking these links.  Hence we have
>>>>> had 2e2b8eb70fdb7dfb, allowing that to happen as long as it's safe, and
>>>>> it's justified by comment in 2e2b8eb70fdb7dfb on why it's safe.
>>>>>
>>>>> What I was thinking is this comment seems to apply too to mr->container, so
>>>>> that it should be safe too to unlink ->container the same way as its own
>>>>> subregions. >
>>>>> IIUC that means for device-internal MR links we should be fine leaving
>>>>> whatever link between MRs owned by such device; the device->refcount
>>>>> guarantees none of them will be visible in any AS.  But then we need to
>>>>> always properly unlink the MRs when the link is across >1 device owners,
>>>>> otherwise it's prone to leak.
>>>>
>>>> There is one principle we must satisfy in general: keep a reference to a
>>>> memory region if it is visible to the guest.
>>>>
>>>> It is safe to call memory_region_del_subregion() and to trigger the
>>>> finalization of subregions when the container is not referenced because they
>>>> are no longer visible. This is not true for the other way around; even when
>>>> subregions are not referenced by anyone else, they are still visible to the
>>>> guest as long as the container is visible to the guest. It is not safe to
>>>> unref and finalize them in such a case.
>>>>
>>>> A memory region and its owner will leak if a memory region kept visible for
>>>> a too long period whether the chain of reference contains a
>>>> container/subregion relationship or not.
>>>
>>> Could you elaborate why it's still visible to the guest if
>>> owner->refcount==0 && mr->container!=NULL?
>>>
>>> Firstly, mr->container != NULL means the MR has an user indeed.  It's the
>>> matter of who's using it.  If that came from outside this device, it should
>>> require memory_region_ref(mr) before hand when adding the subregion, and
>>> that will hold one reference on the owner->refcount.
>>>
>>> Here owner->refcount==0 means there's no such reference, so it seems to me
>>> it's guaranteed to not be visible to anything outside of this device / owner.
>>> Then from that POV it's safe to unlink when the owner is finalizing just
>>> like what we do with mr->subregions, no?
>>
>> An object is alive during instance_finalize even though its refcount == 0.
>> We can't assume all memory regions are dead even if owner->refcount == 0
>> because of that.
> 
> When you referred to "an object", do you mean the MR being finalized here?
> 
> IIUC when the MR reaches its finalize(), it should mean it's not live
> anymore.
> 
> We have two forms of MR usages right now: either embeded in another Object
> / Device, or dynamically, like VFIOQuirk.
> 
> When used embeded, the MR is only finalized when being removed from the
> object's property list, that should only happen when the object / device
> triggered finalize().  Since the MR will use the owner->refcount so I
> suppose it means the MR is not live anymore.
> 
> When used dynamically, object_unparent() is needed but that should only
> happen when the object / owner is during finalize(), per document:
> 
>          If however the memory region is part of a dynamically allocated
>          data structure, you should call object_unparent() to destroy the
>          memory region before the data structure is freed.  For an example
>          see VFIOMSIXInfo and VFIOQuirk in hw/vfio/pci.c.
> 
> Then the MR is also not live.
> 
>> In particular, docs/devel/memory.rst says you can call object_unparent()
>> in the instance_finalize of the owner. This assumes a memory region will
>> not vanish during the execution of the function unless object_unparent()
>> is already called for the memory region.
> 
> Yes, the MR will not vanish during finalize() of the owner object. However
> I don't think it's "live"?  Again, it's based on my definition of
> "liveness" as "taking one refcount of its owner", here the owner refcount
> is zero.  IOW, I don't expect it to be accessible anywhere from any address
> space (e.g. address_space_map()), because they'll all use
> memory_region_ref() and that'll ultimately stops the owner from being
> finalized.

I am calling the fact that embedded memory regions are accessible in 
instance_finalize() "live". A device can perform operations on its 
memory regions during instance_finalize() and we should be aware of that.

object_unparent() is such an example. instance_finalize() of a device 
can call object_unparent() for a subregion and for its container. If we 
automatically finalize the container when calling object_unparent() for 
the subregion, calling object_unparent() for its container will result 
in the second finalization, which is not good.

Regards,
Akihiko Odaki
Peter Xu Aug. 28, 2024, 3:56 p.m. UTC | #9
On Wed, Aug 28, 2024 at 11:02:06PM +0900, Akihiko Odaki wrote:
> On 2024/08/28 22:09, Peter Xu wrote:
> > On Wed, Aug 28, 2024 at 02:33:59PM +0900, Akihiko Odaki wrote:
> > > On 2024/08/28 1:11, Peter Xu wrote:
> > > > On Tue, Aug 27, 2024 at 01:14:51PM +0900, Akihiko Odaki wrote:
> > > > > On 2024/08/27 4:42, Peter Xu wrote:
> > > > > > On Mon, Aug 26, 2024 at 06:10:25PM +0100, Peter Maydell wrote:
> > > > > > > On Mon, 26 Aug 2024 at 16:22, Peter Xu <peterx@redhat.com> wrote:
> > > > > > > > 
> > > > > > > > On Fri, Aug 23, 2024 at 03:13:11PM +0900, Akihiko Odaki wrote:
> > > > > > > > > memory_region_update_container_subregions() used to call
> > > > > > > > > memory_region_ref(), which creates a reference to the owner of the
> > > > > > > > > subregion, on behalf of the owner of the container. This results in a
> > > > > > > > > circular reference if the subregion and container have the same owner.
> > > > > > > > > 
> > > > > > > > > memory_region_ref() creates a reference to the owner instead of the
> > > > > > > > > memory region to match the lifetime of the owner and memory region. We
> > > > > > > > > do not need such a hack if the subregion and container have the same
> > > > > > > > > owner because the owner will be alive as long as the container is.
> > > > > > > > > Therefore, create a reference to the subregion itself instead ot its
> > > > > > > > > owner in such a case; the reference to the subregion is still necessary
> > > > > > > > > to ensure that the subregion gets finalized after the container.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > > > > > > > ---
> > > > > > > > >     system/memory.c | 8 ++++++--
> > > > > > > > >     1 file changed, 6 insertions(+), 2 deletions(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/system/memory.c b/system/memory.c
> > > > > > > > > index 5e6eb459d5de..e4d3e9d1f427 100644
> > > > > > > > > --- a/system/memory.c
> > > > > > > > > +++ b/system/memory.c
> > > > > > > > > @@ -2612,7 +2612,9 @@ static void memory_region_update_container_subregions(MemoryRegion *subregion)
> > > > > > > > > 
> > > > > > > > >         memory_region_transaction_begin();
> > > > > > > > > 
> > > > > > > > > -    memory_region_ref(subregion);
> > > > > > > > > +    object_ref(mr->owner == subregion->owner ?
> > > > > > > > > +               OBJECT(subregion) : subregion->owner);
> > > > > > > > 
> > > > > > > > The only place that mr->refcount is used so far is the owner with the
> > > > > > > > object property attached to the mr, am I right (ignoring name-less MRs)?
> > > > > > > > 
> > > > > > > > I worry this will further complicate refcounting, now we're actively using
> > > > > > > > two refcounts for MRs..
> > > > > 
> > > > > The actor of object_ref() is the owner of the memory region also in this
> > > > > case. We are calling object_ref() on behalf of mr->owner so we use
> > > > > mr->refcount iff mr->owner == subregion->owner. In this sense there is only
> > > > > one user of mr->refcount even after this change.
> > > > 
> > > > Yes it's still one user, but it's not that straightforward to see, also
> > > > it's still an extension to how we use mr->refcount right now.  Currently
> > > > it's about "true / false" just to describe, now it's a real counter.
> > > > 
> > > > I wished that counter doesn't even exist if we'd like to stick with device
> > > > / owner's counter.  Adding this can definitely also make further effort
> > > > harder if we want to remove mr->refcount.
> > > 
> > > I don't think it will make removing mr->refcount harder. With this change,
> > > mr->refcount will count the parent and container. If we remove mr->refcount,
> > > we need to trigger object_finalize() in a way other than checking
> > > mr->refcount, which can be achieved by simply evaluating OBJECT(mr)->parent
> > > && mr->container.
> > > 
> > > > 
> > > > > 
> > > > > > > > 
> > > > > > > > Continue discussion there:
> > > > > > > > 
> > > > > > > > https://lore.kernel.org/r/067b17a4-cdfc-4f7e-b7e4-28c38e1c10f0@daynix.com
> > > > > > > > 
> > > > > > > > What I don't see is how mr->subregions differs from mr->container, so we
> > > > > > > > allow subregions to be attached but not the container when finalize()
> > > > > > > > (which is, afaict, the other way round).
> > > > > > > > 
> > > > > > > > It seems easier to me that we allow both container and subregions to exist
> > > > > > > > as long as within the owner itself, rather than start heavier use of
> > > > > > > > mr->refcount.
> > > > > > > 
> > > > > > > I don't think just "same owner" necessarily will be workable --
> > > > > > > you can have a setup like:
> > > > > > >      * device A has a container C_A
> > > > > > >      * device A has a child-device B
> > > > > > >      * device B has a memory region R_B
> > > > > > >      * device A's realize method puts R_B into C_A
> > > > > > > 
> > > > > > > R_B's owner is B, and the container's owner is A,
> > > > > > > but we still want to be able to get rid of A (in the process
> > > > > > > getting rid of B because it gets unparented and unreffed,
> > > > > > > and R_B and C_A also).
> > > > > > 
> > > > > > For cross-device references, should we rely on an explicit call to
> > > > > > memory_region_del_subregion(), so as to detach the link between C_A and
> > > > > > R_B?
> > > > > 
> > > > > Yes, I agree.
> > > > > 
> > > > > > 
> > > > > > My understanding so far: logically when MR finalize() it should guarantee
> > > > > > both (1) mr->container==NULL, and (2) mr->subregions empty.  That's before
> > > > > > commit 2e2b8eb70fdb7dfb and could be the ideal world (though at the very
> > > > > > beginning we don't assert on ->container==NULL yet).  It requires all
> > > > > > device emulations to do proper unrealize() to unlink all the MRs.
> > > > > > 
> > > > > > However what I'm guessing is QEMU probably used to have lots of devices
> > > > > > that are not following the rules and leaking these links.  Hence we have
> > > > > > had 2e2b8eb70fdb7dfb, allowing that to happen as long as it's safe, and
> > > > > > it's justified by comment in 2e2b8eb70fdb7dfb on why it's safe.
> > > > > > 
> > > > > > What I was thinking is this comment seems to apply too to mr->container, so
> > > > > > that it should be safe too to unlink ->container the same way as its own
> > > > > > subregions. >
> > > > > > IIUC that means for device-internal MR links we should be fine leaving
> > > > > > whatever link between MRs owned by such device; the device->refcount
> > > > > > guarantees none of them will be visible in any AS.  But then we need to
> > > > > > always properly unlink the MRs when the link is across >1 device owners,
> > > > > > otherwise it's prone to leak.
> > > > > 
> > > > > There is one principle we must satisfy in general: keep a reference to a
> > > > > memory region if it is visible to the guest.
> > > > > 
> > > > > It is safe to call memory_region_del_subregion() and to trigger the
> > > > > finalization of subregions when the container is not referenced because they
> > > > > are no longer visible. This is not true for the other way around; even when
> > > > > subregions are not referenced by anyone else, they are still visible to the
> > > > > guest as long as the container is visible to the guest. It is not safe to
> > > > > unref and finalize them in such a case.
> > > > > 
> > > > > A memory region and its owner will leak if a memory region kept visible for
> > > > > a too long period whether the chain of reference contains a
> > > > > container/subregion relationship or not.
> > > > 
> > > > Could you elaborate why it's still visible to the guest if
> > > > owner->refcount==0 && mr->container!=NULL?
> > > > 
> > > > Firstly, mr->container != NULL means the MR has an user indeed.  It's the
> > > > matter of who's using it.  If that came from outside this device, it should
> > > > require memory_region_ref(mr) before hand when adding the subregion, and
> > > > that will hold one reference on the owner->refcount.
> > > > 
> > > > Here owner->refcount==0 means there's no such reference, so it seems to me
> > > > it's guaranteed to not be visible to anything outside of this device / owner.
> > > > Then from that POV it's safe to unlink when the owner is finalizing just
> > > > like what we do with mr->subregions, no?
> > > 
> > > An object is alive during instance_finalize even though its refcount == 0.
> > > We can't assume all memory regions are dead even if owner->refcount == 0
> > > because of that.
> > 
> > When you referred to "an object", do you mean the MR being finalized here?
> > 
> > IIUC when the MR reaches its finalize(), it should mean it's not live
> > anymore.
> > 
> > We have two forms of MR usages right now: either embeded in another Object
> > / Device, or dynamically, like VFIOQuirk.
> > 
> > When used embeded, the MR is only finalized when being removed from the
> > object's property list, that should only happen when the object / device
> > triggered finalize().  Since the MR will use the owner->refcount so I
> > suppose it means the MR is not live anymore.
> > 
> > When used dynamically, object_unparent() is needed but that should only
> > happen when the object / owner is during finalize(), per document:
> > 
> >          If however the memory region is part of a dynamically allocated
> >          data structure, you should call object_unparent() to destroy the
> >          memory region before the data structure is freed.  For an example
> >          see VFIOMSIXInfo and VFIOQuirk in hw/vfio/pci.c.
> > 
> > Then the MR is also not live.
> > 
> > > In particular, docs/devel/memory.rst says you can call object_unparent()
> > > in the instance_finalize of the owner. This assumes a memory region will
> > > not vanish during the execution of the function unless object_unparent()
> > > is already called for the memory region.
> > 
> > Yes, the MR will not vanish during finalize() of the owner object. However
> > I don't think it's "live"?  Again, it's based on my definition of
> > "liveness" as "taking one refcount of its owner", here the owner refcount
> > is zero.  IOW, I don't expect it to be accessible anywhere from any address
> > space (e.g. address_space_map()), because they'll all use
> > memory_region_ref() and that'll ultimately stops the owner from being
> > finalized.
> 
> I am calling the fact that embedded memory regions are accessible in
> instance_finalize() "live". A device can perform operations on its memory
> regions during instance_finalize() and we should be aware of that.

This part is true.  I suppose we should still suggest device finalize() to
properly detach MRs, and that should normally be done there.

> 
> object_unparent() is such an example. instance_finalize() of a device can
> call object_unparent() for a subregion and for its container. If we
> automatically finalize the container when calling object_unparent() for the
> subregion, calling object_unparent() for its container will result in the
> second finalization, which is not good.

IMHO we don't finalize the container at all - what I suggested was we call
del_subregion() for the case where container != NULL.  Since in this case
both container & mr belong to the same owner, it shouldn't change any
refcount, but only remove the link.

However I think I see what you pointed out.  I wonder why we remove all
properties now before reaching instance_finalze(): shouldn't finalize() be
allowed to access some of the properties?

It goes back to this commit:

commit 76a6e1cc7cc3ad022e7159b37b291b75bc4615bf
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Wed Jun 11 11:58:30 2014 +0200

    qom: object: delete properties before calling instance_finalize
    
    This ensures that the children's unparent callback will still
    have a usable parent.
    
    Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

From this series (as the 1st patch there):

https://lore.kernel.org/qemu-devel/1406716032-21795-1-git-send-email-pbonzini@redhat.com/

I can't say I fully understand the commit yet so far.. because it seems
this patch was trying to pave way so that MR's unparent() can have a usable
parent.  However... I don't think mr implemented unparent() at all..
while it just sounds more reasonable that properties shouldn't be
auto-removed before calling instance_finalize() from my gut feeling.

I tried to revert 76a6e1cc7c ("qom: object: delete properties before
calling instance_finalize"), "make check" all passes here.  I am a bit
confused on where it applies, and whether we should revert it.

If with 76a6e1cc7cc reverted, I think your concern should go away because
then properties (including MRs) will only be detached after owner's
instance_finalize().  Again, I wished Paolo could chime in as he should
know the best.

Thanks,
Akihiko Odaki Aug. 29, 2024, 4:39 a.m. UTC | #10
On 2024/08/29 0:56, Peter Xu wrote:
> On Wed, Aug 28, 2024 at 11:02:06PM +0900, Akihiko Odaki wrote:
>> On 2024/08/28 22:09, Peter Xu wrote:
>>> On Wed, Aug 28, 2024 at 02:33:59PM +0900, Akihiko Odaki wrote:
>>>> On 2024/08/28 1:11, Peter Xu wrote:
>>>>> On Tue, Aug 27, 2024 at 01:14:51PM +0900, Akihiko Odaki wrote:
>>>>>> On 2024/08/27 4:42, Peter Xu wrote:
>>>>>>> On Mon, Aug 26, 2024 at 06:10:25PM +0100, Peter Maydell wrote:
>>>>>>>> On Mon, 26 Aug 2024 at 16:22, Peter Xu <peterx@redhat.com> wrote:
>>>>>>>>>
>>>>>>>>> On Fri, Aug 23, 2024 at 03:13:11PM +0900, Akihiko Odaki wrote:
>>>>>>>>>> memory_region_update_container_subregions() used to call
>>>>>>>>>> memory_region_ref(), which creates a reference to the owner of the
>>>>>>>>>> subregion, on behalf of the owner of the container. This results in a
>>>>>>>>>> circular reference if the subregion and container have the same owner.
>>>>>>>>>>
>>>>>>>>>> memory_region_ref() creates a reference to the owner instead of the
>>>>>>>>>> memory region to match the lifetime of the owner and memory region. We
>>>>>>>>>> do not need such a hack if the subregion and container have the same
>>>>>>>>>> owner because the owner will be alive as long as the container is.
>>>>>>>>>> Therefore, create a reference to the subregion itself instead ot its
>>>>>>>>>> owner in such a case; the reference to the subregion is still necessary
>>>>>>>>>> to ensure that the subregion gets finalized after the container.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
>>>>>>>>>> ---
>>>>>>>>>>      system/memory.c | 8 ++++++--
>>>>>>>>>>      1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/system/memory.c b/system/memory.c
>>>>>>>>>> index 5e6eb459d5de..e4d3e9d1f427 100644
>>>>>>>>>> --- a/system/memory.c
>>>>>>>>>> +++ b/system/memory.c
>>>>>>>>>> @@ -2612,7 +2612,9 @@ static void memory_region_update_container_subregions(MemoryRegion *subregion)
>>>>>>>>>>
>>>>>>>>>>          memory_region_transaction_begin();
>>>>>>>>>>
>>>>>>>>>> -    memory_region_ref(subregion);
>>>>>>>>>> +    object_ref(mr->owner == subregion->owner ?
>>>>>>>>>> +               OBJECT(subregion) : subregion->owner);
>>>>>>>>>
>>>>>>>>> The only place that mr->refcount is used so far is the owner with the
>>>>>>>>> object property attached to the mr, am I right (ignoring name-less MRs)?
>>>>>>>>>
>>>>>>>>> I worry this will further complicate refcounting, now we're actively using
>>>>>>>>> two refcounts for MRs..
>>>>>>
>>>>>> The actor of object_ref() is the owner of the memory region also in this
>>>>>> case. We are calling object_ref() on behalf of mr->owner so we use
>>>>>> mr->refcount iff mr->owner == subregion->owner. In this sense there is only
>>>>>> one user of mr->refcount even after this change.
>>>>>
>>>>> Yes it's still one user, but it's not that straightforward to see, also
>>>>> it's still an extension to how we use mr->refcount right now.  Currently
>>>>> it's about "true / false" just to describe, now it's a real counter.
>>>>>
>>>>> I wished that counter doesn't even exist if we'd like to stick with device
>>>>> / owner's counter.  Adding this can definitely also make further effort
>>>>> harder if we want to remove mr->refcount.
>>>>
>>>> I don't think it will make removing mr->refcount harder. With this change,
>>>> mr->refcount will count the parent and container. If we remove mr->refcount,
>>>> we need to trigger object_finalize() in a way other than checking
>>>> mr->refcount, which can be achieved by simply evaluating OBJECT(mr)->parent
>>>> && mr->container.
>>>>
>>>>>
>>>>>>
>>>>>>>>>
>>>>>>>>> Continue discussion there:
>>>>>>>>>
>>>>>>>>> https://lore.kernel.org/r/067b17a4-cdfc-4f7e-b7e4-28c38e1c10f0@daynix.com
>>>>>>>>>
>>>>>>>>> What I don't see is how mr->subregions differs from mr->container, so we
>>>>>>>>> allow subregions to be attached but not the container when finalize()
>>>>>>>>> (which is, afaict, the other way round).
>>>>>>>>>
>>>>>>>>> It seems easier to me that we allow both container and subregions to exist
>>>>>>>>> as long as within the owner itself, rather than start heavier use of
>>>>>>>>> mr->refcount.
>>>>>>>>
>>>>>>>> I don't think just "same owner" necessarily will be workable --
>>>>>>>> you can have a setup like:
>>>>>>>>       * device A has a container C_A
>>>>>>>>       * device A has a child-device B
>>>>>>>>       * device B has a memory region R_B
>>>>>>>>       * device A's realize method puts R_B into C_A
>>>>>>>>
>>>>>>>> R_B's owner is B, and the container's owner is A,
>>>>>>>> but we still want to be able to get rid of A (in the process
>>>>>>>> getting rid of B because it gets unparented and unreffed,
>>>>>>>> and R_B and C_A also).
>>>>>>>
>>>>>>> For cross-device references, should we rely on an explicit call to
>>>>>>> memory_region_del_subregion(), so as to detach the link between C_A and
>>>>>>> R_B?
>>>>>>
>>>>>> Yes, I agree.
>>>>>>
>>>>>>>
>>>>>>> My understanding so far: logically when MR finalize() it should guarantee
>>>>>>> both (1) mr->container==NULL, and (2) mr->subregions empty.  That's before
>>>>>>> commit 2e2b8eb70fdb7dfb and could be the ideal world (though at the very
>>>>>>> beginning we don't assert on ->container==NULL yet).  It requires all
>>>>>>> device emulations to do proper unrealize() to unlink all the MRs.
>>>>>>>
>>>>>>> However what I'm guessing is QEMU probably used to have lots of devices
>>>>>>> that are not following the rules and leaking these links.  Hence we have
>>>>>>> had 2e2b8eb70fdb7dfb, allowing that to happen as long as it's safe, and
>>>>>>> it's justified by comment in 2e2b8eb70fdb7dfb on why it's safe.
>>>>>>>
>>>>>>> What I was thinking is this comment seems to apply too to mr->container, so
>>>>>>> that it should be safe too to unlink ->container the same way as its own
>>>>>>> subregions. >
>>>>>>> IIUC that means for device-internal MR links we should be fine leaving
>>>>>>> whatever link between MRs owned by such device; the device->refcount
>>>>>>> guarantees none of them will be visible in any AS.  But then we need to
>>>>>>> always properly unlink the MRs when the link is across >1 device owners,
>>>>>>> otherwise it's prone to leak.
>>>>>>
>>>>>> There is one principle we must satisfy in general: keep a reference to a
>>>>>> memory region if it is visible to the guest.
>>>>>>
>>>>>> It is safe to call memory_region_del_subregion() and to trigger the
>>>>>> finalization of subregions when the container is not referenced because they
>>>>>> are no longer visible. This is not true for the other way around; even when
>>>>>> subregions are not referenced by anyone else, they are still visible to the
>>>>>> guest as long as the container is visible to the guest. It is not safe to
>>>>>> unref and finalize them in such a case.
>>>>>>
>>>>>> A memory region and its owner will leak if a memory region kept visible for
>>>>>> a too long period whether the chain of reference contains a
>>>>>> container/subregion relationship or not.
>>>>>
>>>>> Could you elaborate why it's still visible to the guest if
>>>>> owner->refcount==0 && mr->container!=NULL?
>>>>>
>>>>> Firstly, mr->container != NULL means the MR has an user indeed.  It's the
>>>>> matter of who's using it.  If that came from outside this device, it should
>>>>> require memory_region_ref(mr) before hand when adding the subregion, and
>>>>> that will hold one reference on the owner->refcount.
>>>>>
>>>>> Here owner->refcount==0 means there's no such reference, so it seems to me
>>>>> it's guaranteed to not be visible to anything outside of this device / owner.
>>>>> Then from that POV it's safe to unlink when the owner is finalizing just
>>>>> like what we do with mr->subregions, no?
>>>>
>>>> An object is alive during instance_finalize even though its refcount == 0.
>>>> We can't assume all memory regions are dead even if owner->refcount == 0
>>>> because of that.
>>>
>>> When you referred to "an object", do you mean the MR being finalized here?
>>>
>>> IIUC when the MR reaches its finalize(), it should mean it's not live
>>> anymore.
>>>
>>> We have two forms of MR usages right now: either embeded in another Object
>>> / Device, or dynamically, like VFIOQuirk.
>>>
>>> When used embeded, the MR is only finalized when being removed from the
>>> object's property list, that should only happen when the object / device
>>> triggered finalize().  Since the MR will use the owner->refcount so I
>>> suppose it means the MR is not live anymore.
>>>
>>> When used dynamically, object_unparent() is needed but that should only
>>> happen when the object / owner is during finalize(), per document:
>>>
>>>           If however the memory region is part of a dynamically allocated
>>>           data structure, you should call object_unparent() to destroy the
>>>           memory region before the data structure is freed.  For an example
>>>           see VFIOMSIXInfo and VFIOQuirk in hw/vfio/pci.c.
>>>
>>> Then the MR is also not live.
>>>
>>>> In particular, docs/devel/memory.rst says you can call object_unparent()
>>>> in the instance_finalize of the owner. This assumes a memory region will
>>>> not vanish during the execution of the function unless object_unparent()
>>>> is already called for the memory region.
>>>
>>> Yes, the MR will not vanish during finalize() of the owner object. However
>>> I don't think it's "live"?  Again, it's based on my definition of
>>> "liveness" as "taking one refcount of its owner", here the owner refcount
>>> is zero.  IOW, I don't expect it to be accessible anywhere from any address
>>> space (e.g. address_space_map()), because they'll all use
>>> memory_region_ref() and that'll ultimately stops the owner from being
>>> finalized.
>>
>> I am calling the fact that embedded memory regions are accessible in
>> instance_finalize() "live". A device can perform operations on its memory
>> regions during instance_finalize() and we should be aware of that.
> 
> This part is true.  I suppose we should still suggest device finalize() to
> properly detach MRs, and that should normally be done there.

It is better to avoid manual resource deallocation in general because it 
is too error-prone.

I have an impression with QEMU code base that it is failing manual 
resource deallocation so frequently although such deallocation can be 
easily automated by binding resources to objects and free them when 
objects die by providing a function like Linux's devm_kmalloc(). 
Unfortunately I haven't found time to do that though.

So my opinion here is 1) we should automate resource deallocation as 
much as possible but 2) we shouldn't disturb code that performs manual 
resource management.

instance_finalize() is for manual resource management. It is better to 
have less code in instance_finalize() and fortunately MemoryRegion don't 
require any code in instance_finalize() in most cases. If 
instance_finalize() still insists to call object_unparent(), we 
shouldn't prevent that. (I changed my mind regarding this particular 
case of object_unparent() however as I describe below.)

> 
>>
>> object_unparent() is such an example. instance_finalize() of a device can
>> call object_unparent() for a subregion and for its container. If we
>> automatically finalize the container when calling object_unparent() for the
>> subregion, calling object_unparent() for its container will result in the
>> second finalization, which is not good.
> 
> IMHO we don't finalize the container at all - what I suggested was we call
> del_subregion() for the case where container != NULL.  Since in this case
> both container & mr belong to the same owner, it shouldn't change any
> refcount, but only remove the link.
> 
> However I think I see what you pointed out.  I wonder why we remove all
> properties now before reaching instance_finalze(): shouldn't finalize() be
> allowed to access some of the properties?
> 
> It goes back to this commit:
> 
> commit 76a6e1cc7cc3ad022e7159b37b291b75bc4615bf
> Author: Paolo Bonzini <pbonzini@redhat.com>
> Date:   Wed Jun 11 11:58:30 2014 +0200
> 
>      qom: object: delete properties before calling instance_finalize
>      
>      This ensures that the children's unparent callback will still
>      have a usable parent.
>      
>      Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
>      Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
>  From this series (as the 1st patch there):
> 
> https://lore.kernel.org/qemu-devel/1406716032-21795-1-git-send-email-pbonzini@redhat.com/
> 
> I can't say I fully understand the commit yet so far.. because it seems
> this patch was trying to pave way so that MR's unparent() can have a usable
> parent.  However... I don't think mr implemented unparent() at all..
> while it just sounds more reasonable that properties shouldn't be
> auto-removed before calling instance_finalize() from my gut feeling.
> 
> I tried to revert 76a6e1cc7c ("qom: object: delete properties before
> calling instance_finalize"), "make check" all passes here.  I am a bit
> confused on where it applies, and whether we should revert it.
> 
> If with 76a6e1cc7cc reverted, I think your concern should go away because
> then properties (including MRs) will only be detached after owner's
> instance_finalize().  Again, I wished Paolo could chime in as he should
> know the best.

I didn't know QOM deletes properties before instance_finalize().

I think it is a bad idea to delete properties before 
instance_finalize(). The intention is that to keep a usable parent 
during unparenting, but it is inevitable that an object is in a 
semi-finalized state during finalization. If the order of finalization 
matters, it should be explicitly described in instance_finalize(). 
Deleting properties before instance_finalize() prevents that.

That said, I think it is too late to revert that change. "make check" 
works as a preliminary check, but ultimately we need manual tree-wide 
review, which is too costly.

Going back to the original topic, I had the (incorrect) assumption that 
QOM deletes properties *after* instance_finalize(). Under such an 
assumption, it would be unsafe to remove a subregion from its container 
when a subregion is gone. As you said, we don't have to 
object_unparent(mr->container) and instead we can just call 
memory_region_del_subregion() instead to keep object_unparent() 
functional. However, strictly speaking, memory_region_del_subregion() is 
also a side effect and may affect operations with the container later. 
Such a side effect is better to be avoided whenever possible.

My point in this discussion is that the device has a control of its 
memory region even during finalization. We shouldn't call 
object_unparent(), memory_region_del_subregion(), or perform whatever 
operation on another memory region; otherwise the device may see an 
unexpected side effect.

Now, let's consider the correct assumption that QOM deletes properties 
*before* instance_finalize(). Under this assumption, the statement in 
docs/devel/memory.rst saying that calling object_unparent() in 
instance_finalize() is fine is wrong because the memory region is 
already finalized and we shouldn't call object_unparent() for a 
finalized object.

This also means a device does not have a control of its memory regions 
during finalization and the device will not see the side effect of 
calling memory_region_del_subregion(). However, if a device really needs 
to control the order of memory region finalization, it can still call 
object_ref() for memory regions to keep them alive until 
instance_finalize(). The effect of memory_region_del_subregion() will be 
  visible to the device if a device does such a trick. While it is very 
unlikely that any device does such a trick, it is still better to avoid 
having any assumption on devices.

With the proposed object_ref(mr->owner == subregion->owner ? 
OBJECT(subregion) : subregion->owner) call, we don't have to make any 
assumption on how finalization works, which is too complicated as this 
discussion shows. It certainly makes the use of two reference counters, 
but it does not deviate from the current policy on their usage and it is 
also trivial to adapt when the reference counter in memory region gets 
removed. Considering all above, I believe my patch is the most 
straightforward solution.

Regards,
Akihiko Odaki
Peter Xu Aug. 29, 2024, 7:48 p.m. UTC | #11
On Thu, Aug 29, 2024 at 01:39:36PM +0900, Akihiko Odaki wrote:
> > > I am calling the fact that embedded memory regions are accessible in
> > > instance_finalize() "live". A device can perform operations on its memory
> > > regions during instance_finalize() and we should be aware of that.
> > 
> > This part is true.  I suppose we should still suggest device finalize() to
> > properly detach MRs, and that should normally be done there.
> 
> It is better to avoid manual resource deallocation in general because it is
> too error-prone.

I had an impression that you mixed up "finalize()" and "free()" in the
context of our discussion.. let us clarify this first before everything
else below, just in case I overlook stuff..

MR is very special as an object, in that it should have no free() hook,
hence by default nothing is going to be freed when mr->refcount==0.  It
means MRs need to be freed manually always.

For example:

(gdb) p system_memory->parent_obj->free
$2 = (ObjectFree *) 0x0

It plays perfect because the majority of QEMU device model is using MR as a
field (rather than a pointer) of another structure, so that's exactly what
we're looking for: we don't want to free() the MR as it's allocated
together with the owner device.  That'll be released when the owner free().

When dynamic allocation gets into the picture for MR, it's more complicated
for sure, because it means the user (like VFIOQuirk) will need to manually
allocate the MRs, then it requires explicit object_unparent() to detach
that from the device / owner when finalize().  NOTE!  object_unparent()
will NOT free the MR yet so far.  The MR still need to be manually freed
with an explicit g_free(), normally.  Again, I'd suggest you refer to the
VFIOQuirk code just as an example.  In that case this part is done with
e.g. vfio_bar_quirk_finalize().

        for (i = 0; i < quirk->nr_mem; i++) {
            object_unparent(OBJECT(&quirk->mem[i]));
        }
        g_free(quirk->mem);

Here quirk->mem is a pointer to an array of MR which can contain one or
more MRs, but the idea is the same.

> 
> I have an impression with QEMU code base that it is failing manual resource
> deallocation so frequently although such deallocation can be easily
> automated by binding resources to objects and free them when objects die by
> providing a function like Linux's devm_kmalloc(). Unfortunately I haven't
> found time to do that though.

AFAICT, the property list is exactly what you're saying.  IIUC normally an
object will be properly finalized()ed + free()ed when the parent object is
finalize()ed.  Here MR is just special as it bypasses all the free() part.

> 
> So my opinion here is 1) we should automate resource deallocation as much as
> possible but 2) we shouldn't disturb code that performs manual resource
> management.

Agree.  However note again that in whatever case cross-device MR links will
still require explicit detachment or it's prone to memory leak.

> 
> instance_finalize() is for manual resource management. It is better to have
> less code in instance_finalize() and fortunately MemoryRegion don't require
> any code in instance_finalize() in most cases. If instance_finalize() still
> insists to call object_unparent(), we shouldn't prevent that. (I changed my
> mind regarding this particular case of object_unparent() however as I
> describe below.)
> 
> > 
> > > 
> > > object_unparent() is such an example. instance_finalize() of a device can
> > > call object_unparent() for a subregion and for its container. If we
> > > automatically finalize the container when calling object_unparent() for the
> > > subregion, calling object_unparent() for its container will result in the
> > > second finalization, which is not good.
> > 
> > IMHO we don't finalize the container at all - what I suggested was we call
> > del_subregion() for the case where container != NULL.  Since in this case
> > both container & mr belong to the same owner, it shouldn't change any
> > refcount, but only remove the link.
> > 
> > However I think I see what you pointed out.  I wonder why we remove all
> > properties now before reaching instance_finalze(): shouldn't finalize() be
> > allowed to access some of the properties?
> > 
> > It goes back to this commit:
> > 
> > commit 76a6e1cc7cc3ad022e7159b37b291b75bc4615bf
> > Author: Paolo Bonzini <pbonzini@redhat.com>
> > Date:   Wed Jun 11 11:58:30 2014 +0200
> > 
> >      qom: object: delete properties before calling instance_finalize
> >      This ensures that the children's unparent callback will still
> >      have a usable parent.
> >      Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
> >      Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > 
> >  From this series (as the 1st patch there):
> > 
> > https://lore.kernel.org/qemu-devel/1406716032-21795-1-git-send-email-pbonzini@redhat.com/
> > 
> > I can't say I fully understand the commit yet so far.. because it seems
> > this patch was trying to pave way so that MR's unparent() can have a usable
> > parent.  However... I don't think mr implemented unparent() at all..
> > while it just sounds more reasonable that properties shouldn't be
> > auto-removed before calling instance_finalize() from my gut feeling.
> > 
> > I tried to revert 76a6e1cc7c ("qom: object: delete properties before
> > calling instance_finalize"), "make check" all passes here.  I am a bit
> > confused on where it applies, and whether we should revert it.
> > 
> > If with 76a6e1cc7cc reverted, I think your concern should go away because
> > then properties (including MRs) will only be detached after owner's
> > instance_finalize().  Again, I wished Paolo could chime in as he should
> > know the best.
> 
> I didn't know QOM deletes properties before instance_finalize().
> 
> I think it is a bad idea to delete properties before instance_finalize().
> The intention is that to keep a usable parent during unparenting, but it is
> inevitable that an object is in a semi-finalized state during finalization.
> If the order of finalization matters, it should be explicitly described in
> instance_finalize(). Deleting properties before instance_finalize() prevents
> that.
> 
> That said, I think it is too late to revert that change. "make check" works
> as a preliminary check, but ultimately we need manual tree-wide review,
> which is too costly.

I don't think it's too late. :)

IMHO if that statement is true, then QEMU will gradually become not
maintainable anymore with tons of such code nobody understands and nobody
can touch.  To me it can destine its death soon afterwards if so, sooner or
later.

OTOH, it'll be harder to justify or change indeed if some piece of code
stays longer in-tree.  So I agree with you even if we want to change
anything around this we should be with extreme cautions, and I don't think
we should merge anything like this too late of a release, just to give it
time to expose and break things.

> 
> Going back to the original topic, I had the (incorrect) assumption that QOM
> deletes properties *after* instance_finalize(). Under such an assumption, it

Let's stick with this model; so far I still think this is the right thing
to do, and I'm happy to know that you seem to at least agree (irrelevant of
whether we should move on with a change).

> would be unsafe to remove a subregion from its container when a subregion is
> gone. As you said, we don't have to object_unparent(mr->container) and

IMHO we need to do this. Please recheck this after reading above on
finalize() v.s. free().  Here object_unparent() is needed because the MR is
allocated by the owner, it means only the owner knows how to free() it
(again: MR's own free()==NULL).  Then if we want to free it properly, we
need to detach it from the owner object first, hence object_unparent().

> instead we can just call memory_region_del_subregion() instead to keep
> object_unparent() functional. However, strictly speaking,

I think we can do that, or we don't.  It's the same as whether we want to
explicitly detach links for an embeded MR: if it's not explicitly done, we
can rely on the finalize() of the MR to do so.

In all cases, object_unparent() will still be needed because it will do
more things than memory_region_del_subregion(): it removes the property
link itself.

This is also one reason why I keep thinking what I suggested might be good:
it covers not only mr->subregions and also mr->container, it means the
detachment has no ordering constraints on which mr is finalize()ed first
(in this case, either the container mr or the child mr).

> memory_region_del_subregion() is also a side effect and may affect
> operations with the container later. Such a side effect is better to be
> avoided whenever possible.
> 
> My point in this discussion is that the device has a control of its memory
> region even during finalization. We shouldn't call object_unparent(),
> memory_region_del_subregion(), or perform whatever operation on another
> memory region; otherwise the device may see an unexpected side effect.
> 
> Now, let's consider the correct assumption that QOM deletes properties
> *before* instance_finalize(). Under this assumption, the statement in
> docs/devel/memory.rst saying that calling object_unparent() in
> instance_finalize() is fine is wrong because the memory region is already
> finalized and we shouldn't call object_unparent() for a finalized object.
> 
> This also means a device does not have a control of its memory regions
> during finalization and the device will not see the side effect of calling
> memory_region_del_subregion(). However, if a device really needs to control
> the order of memory region finalization, it can still call object_ref() for
> memory regions to keep them alive until instance_finalize(). The effect of
> memory_region_del_subregion() will be  visible to the device if a device
> does such a trick. While it is very unlikely that any device does such a
> trick, it is still better to avoid having any assumption on devices.
> 
> With the proposed object_ref(mr->owner == subregion->owner ?
> OBJECT(subregion) : subregion->owner) call, we don't have to make any
> assumption on how finalization works, which is too complicated as this
> discussion shows. It certainly makes the use of two reference counters, but
> it does not deviate from the current policy on their usage and it is also
> trivial to adapt when the reference counter in memory region gets removed.
> Considering all above, I believe my patch is the most straightforward
> solution.

Let's see how you thinks after you read above first.  I hope we can get
more onto the same page at least on the context.

So far I still prefer using mr->refcount as less as possible.  Now it plays
the only role as "whether this MR is put onto an owner property list", and
for all the rest refcounts on MR it should always routes to the owner.  I
still think we need to be extremely cautious on further complicating this
refcount.  It's already complicated indeed, but hopefully this model is the
best we can trivially have right now, and so far it's clear to me, but it's
always possible I overlooked something.

Thanks,
Akihiko Odaki Aug. 30, 2024, 6:11 a.m. UTC | #12
On 2024/08/30 4:48, Peter Xu wrote:
> On Thu, Aug 29, 2024 at 01:39:36PM +0900, Akihiko Odaki wrote:
>>>> I am calling the fact that embedded memory regions are accessible in
>>>> instance_finalize() "live". A device can perform operations on its memory
>>>> regions during instance_finalize() and we should be aware of that.
>>>
>>> This part is true.  I suppose we should still suggest device finalize() to
>>> properly detach MRs, and that should normally be done there.
>>
>> It is better to avoid manual resource deallocation in general because it is
>> too error-prone.
> 
> I had an impression that you mixed up "finalize()" and "free()" in the
> context of our discussion.. let us clarify this first before everything
> else below, just in case I overlook stuff..
I am aware of that distinction. I dealt with it with patch "virtio-gpu: 
Handle resource blob commands":
https://lore.kernel.org/r/20240822185110.1757429-12-dmitry.osipenko@collabora.com

> 
> MR is very special as an object, in that it should have no free() hook,
> hence by default nothing is going to be freed when mr->refcount==0.  It
> means MRs need to be freed manually always.
> 
> For example:
> 
> (gdb) p system_memory->parent_obj->free
> $2 = (ObjectFree *) 0x0
> 
> It plays perfect because the majority of QEMU device model is using MR as a
> field (rather than a pointer) of another structure, so that's exactly what
> we're looking for: we don't want to free() the MR as it's allocated
> together with the owner device.  That'll be released when the owner free().
> 
> When dynamic allocation gets into the picture for MR, it's more complicated
> for sure, because it means the user (like VFIOQuirk) will need to manually
> allocate the MRs, then it requires explicit object_unparent() to detach
> that from the device / owner when finalize().  NOTE!  object_unparent()
> will NOT free the MR yet so far.  The MR still need to be manually freed
> with an explicit g_free(), normally.  Again, I'd suggest you refer to the
> VFIOQuirk code just as an example.  In that case this part is done with
> e.g. vfio_bar_quirk_finalize().
> 
>          for (i = 0; i < quirk->nr_mem; i++) {
>              object_unparent(OBJECT(&quirk->mem[i]));
>          }
>          g_free(quirk->mem);
> 
> Here quirk->mem is a pointer to an array of MR which can contain one or
> more MRs, but the idea is the same.
> 
>>
>> I have an impression with QEMU code base that it is failing manual resource
>> deallocation so frequently although such deallocation can be easily
>> automated by binding resources to objects and free them when objects die by
>> providing a function like Linux's devm_kmalloc(). Unfortunately I haven't
>> found time to do that though.
> 
> AFAICT, the property list is exactly what you're saying.  IIUC normally an
> object will be properly finalized()ed + free()ed when the parent object is
> finalize()ed.  Here MR is just special as it bypasses all the free() part.
> 
>>
>> So my opinion here is 1) we should automate resource deallocation as much as
>> possible but 2) we shouldn't disturb code that performs manual resource
>> management.
> 
> Agree.  However note again that in whatever case cross-device MR links will
> still require explicit detachment or it's prone to memory leak.

I am not sure what you refer to with cross-device MR links so can you 
clarify it?

> 
>>
>> instance_finalize() is for manual resource management. It is better to have
>> less code in instance_finalize() and fortunately MemoryRegion don't require
>> any code in instance_finalize() in most cases. If instance_finalize() still
>> insists to call object_unparent(), we shouldn't prevent that. (I changed my
>> mind regarding this particular case of object_unparent() however as I
>> describe below.)
>>
>>>
>>>>
>>>> object_unparent() is such an example. instance_finalize() of a device can
>>>> call object_unparent() for a subregion and for its container. If we
>>>> automatically finalize the container when calling object_unparent() for the
>>>> subregion, calling object_unparent() for its container will result in the
>>>> second finalization, which is not good.
>>>
>>> IMHO we don't finalize the container at all - what I suggested was we call
>>> del_subregion() for the case where container != NULL.  Since in this case
>>> both container & mr belong to the same owner, it shouldn't change any
>>> refcount, but only remove the link.
>>>
>>> However I think I see what you pointed out.  I wonder why we remove all
>>> properties now before reaching instance_finalze(): shouldn't finalize() be
>>> allowed to access some of the properties?
>>>
>>> It goes back to this commit:
>>>
>>> commit 76a6e1cc7cc3ad022e7159b37b291b75bc4615bf
>>> Author: Paolo Bonzini <pbonzini@redhat.com>
>>> Date:   Wed Jun 11 11:58:30 2014 +0200
>>>
>>>       qom: object: delete properties before calling instance_finalize
>>>       This ensures that the children's unparent callback will still
>>>       have a usable parent.
>>>       Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
>>>       Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>
>>>   From this series (as the 1st patch there):
>>>
>>> https://lore.kernel.org/qemu-devel/1406716032-21795-1-git-send-email-pbonzini@redhat.com/
>>>
>>> I can't say I fully understand the commit yet so far.. because it seems
>>> this patch was trying to pave way so that MR's unparent() can have a usable
>>> parent.  However... I don't think mr implemented unparent() at all..
>>> while it just sounds more reasonable that properties shouldn't be
>>> auto-removed before calling instance_finalize() from my gut feeling.
>>>
>>> I tried to revert 76a6e1cc7c ("qom: object: delete properties before
>>> calling instance_finalize"), "make check" all passes here.  I am a bit
>>> confused on where it applies, and whether we should revert it.
>>>
>>> If with 76a6e1cc7cc reverted, I think your concern should go away because
>>> then properties (including MRs) will only be detached after owner's
>>> instance_finalize().  Again, I wished Paolo could chime in as he should
>>> know the best.
>>
>> I didn't know QOM deletes properties before instance_finalize().
>>
>> I think it is a bad idea to delete properties before instance_finalize().
>> The intention is that to keep a usable parent during unparenting, but it is
>> inevitable that an object is in a semi-finalized state during finalization.
>> If the order of finalization matters, it should be explicitly described in
>> instance_finalize(). Deleting properties before instance_finalize() prevents
>> that.
>>
>> That said, I think it is too late to revert that change. "make check" works
>> as a preliminary check, but ultimately we need manual tree-wide review,
>> which is too costly.
> 
> I don't think it's too late. :)
> 
> IMHO if that statement is true, then QEMU will gradually become not
> maintainable anymore with tons of such code nobody understands and nobody
> can touch.  To me it can destine its death soon afterwards if so, sooner or
> later.
> 
> OTOH, it'll be harder to justify or change indeed if some piece of code
> stays longer in-tree.  So I agree with you even if we want to change
> anything around this we should be with extreme cautions, and I don't think
> we should merge anything like this too late of a release, just to give it
> time to expose and break things.
> 
>>
>> Going back to the original topic, I had the (incorrect) assumption that QOM
>> deletes properties *after* instance_finalize(). Under such an assumption, it
> 
> Let's stick with this model; so far I still think this is the right thing
> to do, and I'm happy to know that you seem to at least agree (irrelevant of
> whether we should move on with a change).
> 
>> would be unsafe to remove a subregion from its container when a subregion is
>> gone. As you said, we don't have to object_unparent(mr->container) and
> 
> IMHO we need to do this. Please recheck this after reading above on
> finalize() v.s. free().  Here object_unparent() is needed because the MR is
> allocated by the owner, it means only the owner knows how to free() it
> (again: MR's own free()==NULL).  Then if we want to free it properly, we
> need to detach it from the owner object first, hence object_unparent().
> 
>> instead we can just call memory_region_del_subregion() instead to keep
>> object_unparent() functional. However, strictly speaking,
> 
> I think we can do that, or we don't.  It's the same as whether we want to
> explicitly detach links for an embeded MR: if it's not explicitly done, we
> can rely on the finalize() of the MR to do so. >
> In all cases, object_unparent() will still be needed because it will do
> more things than memory_region_del_subregion(): it removes the property
> link itself.
> 
> This is also one reason why I keep thinking what I suggested might be good:
> it covers not only mr->subregions and also mr->container, it means the
> detachment has no ordering constraints on which mr is finalize()ed first
> (in this case, either the container mr or the child mr).

I referred to the following statement you made by saying we don't have 
to object_unparent(mr->container):
 > IMHO we don't finalize the container at all - what I suggested was we
 > call del_subregion() for the case where container != NULL.  Since in
 > this case both container & mr belong to the same owner, it shouldn't
 > change any refcount, but only remove the link.

I discussed calling del_subregion() is still problematic below.

Regards,
Akihiko Odaki

> 
>> memory_region_del_subregion() is also a side effect and may affect
>> operations with the container later. Such a side effect is better to be
>> avoided whenever possible.
>>
>> My point in this discussion is that the device has a control of its memory
>> region even during finalization. We shouldn't call object_unparent(),
>> memory_region_del_subregion(), or perform whatever operation on another
>> memory region; otherwise the device may see an unexpected side effect.
>>
>> Now, let's consider the correct assumption that QOM deletes properties
>> *before* instance_finalize(). Under this assumption, the statement in
>> docs/devel/memory.rst saying that calling object_unparent() in
>> instance_finalize() is fine is wrong because the memory region is already
>> finalized and we shouldn't call object_unparent() for a finalized object.
>>
>> This also means a device does not have a control of its memory regions
>> during finalization and the device will not see the side effect of calling
>> memory_region_del_subregion(). However, if a device really needs to control
>> the order of memory region finalization, it can still call object_ref() for
>> memory regions to keep them alive until instance_finalize(). The effect of
>> memory_region_del_subregion() will be  visible to the device if a device
>> does such a trick. While it is very unlikely that any device does such a
>> trick, it is still better to avoid having any assumption on devices.
>>
>> With the proposed object_ref(mr->owner == subregion->owner ?
>> OBJECT(subregion) : subregion->owner) call, we don't have to make any
>> assumption on how finalization works, which is too complicated as this
>> discussion shows. It certainly makes the use of two reference counters, but
>> it does not deviate from the current policy on their usage and it is also
>> trivial to adapt when the reference counter in memory region gets removed.
>> Considering all above, I believe my patch is the most straightforward
>> solution.
> 
> Let's see how you thinks after you read above first.  I hope we can get
> more onto the same page at least on the context.
> 
> So far I still prefer using mr->refcount as less as possible.  Now it plays
> the only role as "whether this MR is put onto an owner property list", and
> for all the rest refcounts on MR it should always routes to the owner.  I
> still think we need to be extremely cautious on further complicating this
> refcount.  It's already complicated indeed, but hopefully this model is the
> best we can trivially have right now, and so far it's clear to me, but it's
> always possible I overlooked something.
Peter Xu Aug. 30, 2024, 3:05 p.m. UTC | #13
On Fri, Aug 30, 2024 at 03:11:38PM +0900, Akihiko Odaki wrote:
> On 2024/08/30 4:48, Peter Xu wrote:
> > On Thu, Aug 29, 2024 at 01:39:36PM +0900, Akihiko Odaki wrote:
> > > > > I am calling the fact that embedded memory regions are accessible in
> > > > > instance_finalize() "live". A device can perform operations on its memory
> > > > > regions during instance_finalize() and we should be aware of that.
> > > > 
> > > > This part is true.  I suppose we should still suggest device finalize() to
> > > > properly detach MRs, and that should normally be done there.
> > > 
> > > It is better to avoid manual resource deallocation in general because it is
> > > too error-prone.
> > 
> > I had an impression that you mixed up "finalize()" and "free()" in the
> > context of our discussion.. let us clarify this first before everything
> > else below, just in case I overlook stuff..
> I am aware of that distinction. I dealt with it with patch "virtio-gpu:
> Handle resource blob commands":
> https://lore.kernel.org/r/20240822185110.1757429-12-dmitry.osipenko@collabora.com
> 
> > 
> > MR is very special as an object, in that it should have no free() hook,
> > hence by default nothing is going to be freed when mr->refcount==0.  It
> > means MRs need to be freed manually always.
> > 
> > For example:
> > 
> > (gdb) p system_memory->parent_obj->free
> > $2 = (ObjectFree *) 0x0
> > 
> > It plays perfect because the majority of QEMU device model is using MR as a
> > field (rather than a pointer) of another structure, so that's exactly what
> > we're looking for: we don't want to free() the MR as it's allocated
> > together with the owner device.  That'll be released when the owner free().
> > 
> > When dynamic allocation gets into the picture for MR, it's more complicated
> > for sure, because it means the user (like VFIOQuirk) will need to manually
> > allocate the MRs, then it requires explicit object_unparent() to detach
> > that from the device / owner when finalize().  NOTE!  object_unparent()
> > will NOT free the MR yet so far.  The MR still need to be manually freed
> > with an explicit g_free(), normally.  Again, I'd suggest you refer to the
> > VFIOQuirk code just as an example.  In that case this part is done with
> > e.g. vfio_bar_quirk_finalize().
> > 
> >          for (i = 0; i < quirk->nr_mem; i++) {
> >              object_unparent(OBJECT(&quirk->mem[i]));
> >          }
> >          g_free(quirk->mem);
> > 
> > Here quirk->mem is a pointer to an array of MR which can contain one or
> > more MRs, but the idea is the same.
> > 
> > > 
> > > I have an impression with QEMU code base that it is failing manual resource
> > > deallocation so frequently although such deallocation can be easily
> > > automated by binding resources to objects and free them when objects die by
> > > providing a function like Linux's devm_kmalloc(). Unfortunately I haven't
> > > found time to do that though.
> > 
> > AFAICT, the property list is exactly what you're saying.  IIUC normally an
> > object will be properly finalized()ed + free()ed when the parent object is
> > finalize()ed.  Here MR is just special as it bypasses all the free() part.
> > 
> > > 
> > > So my opinion here is 1) we should automate resource deallocation as much as
> > > possible but 2) we shouldn't disturb code that performs manual resource
> > > management.
> > 
> > Agree.  However note again that in whatever case cross-device MR links will
> > still require explicit detachment or it's prone to memory leak.
> 
> I am not sure what you refer to with cross-device MR links so can you
> clarify it?

I was referring to Peter Maydell's example, where the MR can be used
outside of its owner.  In that case manual operation is a must before
finalize(), as finalize() can only make sense to resolve internal links
automatically.

But now knowing that you're explicitly mentioning "deallocation" rather
than "finalize()", and you're aware of diff between deallocation
v.s. finalize(), I suppose I misunderstood what you meant, and now I'm not
sure I get what you're suggesting.

> 
> > 
> > > 
> > > instance_finalize() is for manual resource management. It is better to have
> > > less code in instance_finalize() and fortunately MemoryRegion don't require
> > > any code in instance_finalize() in most cases. If instance_finalize() still
> > > insists to call object_unparent(), we shouldn't prevent that. (I changed my
> > > mind regarding this particular case of object_unparent() however as I
> > > describe below.)
> > > 
> > > > 
> > > > > 
> > > > > object_unparent() is such an example. instance_finalize() of a device can
> > > > > call object_unparent() for a subregion and for its container. If we
> > > > > automatically finalize the container when calling object_unparent() for the
> > > > > subregion, calling object_unparent() for its container will result in the
> > > > > second finalization, which is not good.
> > > > 
> > > > IMHO we don't finalize the container at all - what I suggested was we call
> > > > del_subregion() for the case where container != NULL.  Since in this case
> > > > both container & mr belong to the same owner, it shouldn't change any
> > > > refcount, but only remove the link.
> > > > 
> > > > However I think I see what you pointed out.  I wonder why we remove all
> > > > properties now before reaching instance_finalze(): shouldn't finalize() be
> > > > allowed to access some of the properties?
> > > > 
> > > > It goes back to this commit:
> > > > 
> > > > commit 76a6e1cc7cc3ad022e7159b37b291b75bc4615bf
> > > > Author: Paolo Bonzini <pbonzini@redhat.com>
> > > > Date:   Wed Jun 11 11:58:30 2014 +0200
> > > > 
> > > >       qom: object: delete properties before calling instance_finalize
> > > >       This ensures that the children's unparent callback will still
> > > >       have a usable parent.
> > > >       Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
> > > >       Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > > > 
> > > >   From this series (as the 1st patch there):
> > > > 
> > > > https://lore.kernel.org/qemu-devel/1406716032-21795-1-git-send-email-pbonzini@redhat.com/
> > > > 
> > > > I can't say I fully understand the commit yet so far.. because it seems
> > > > this patch was trying to pave way so that MR's unparent() can have a usable
> > > > parent.  However... I don't think mr implemented unparent() at all..
> > > > while it just sounds more reasonable that properties shouldn't be
> > > > auto-removed before calling instance_finalize() from my gut feeling.
> > > > 
> > > > I tried to revert 76a6e1cc7c ("qom: object: delete properties before
> > > > calling instance_finalize"), "make check" all passes here.  I am a bit
> > > > confused on where it applies, and whether we should revert it.
> > > > 
> > > > If with 76a6e1cc7cc reverted, I think your concern should go away because
> > > > then properties (including MRs) will only be detached after owner's
> > > > instance_finalize().  Again, I wished Paolo could chime in as he should
> > > > know the best.
> > > 
> > > I didn't know QOM deletes properties before instance_finalize().
> > > 
> > > I think it is a bad idea to delete properties before instance_finalize().
> > > The intention is that to keep a usable parent during unparenting, but it is
> > > inevitable that an object is in a semi-finalized state during finalization.
> > > If the order of finalization matters, it should be explicitly described in
> > > instance_finalize(). Deleting properties before instance_finalize() prevents
> > > that.
> > > 
> > > That said, I think it is too late to revert that change. "make check" works
> > > as a preliminary check, but ultimately we need manual tree-wide review,
> > > which is too costly.
> > 
> > I don't think it's too late. :)
> > 
> > IMHO if that statement is true, then QEMU will gradually become not
> > maintainable anymore with tons of such code nobody understands and nobody
> > can touch.  To me it can destine its death soon afterwards if so, sooner or
> > later.
> > 
> > OTOH, it'll be harder to justify or change indeed if some piece of code
> > stays longer in-tree.  So I agree with you even if we want to change
> > anything around this we should be with extreme cautions, and I don't think
> > we should merge anything like this too late of a release, just to give it
> > time to expose and break things.
> > 
> > > 
> > > Going back to the original topic, I had the (incorrect) assumption that QOM
> > > deletes properties *after* instance_finalize(). Under such an assumption, it
> > 
> > Let's stick with this model; so far I still think this is the right thing
> > to do, and I'm happy to know that you seem to at least agree (irrelevant of
> > whether we should move on with a change).
> > 
> > > would be unsafe to remove a subregion from its container when a subregion is
> > > gone. As you said, we don't have to object_unparent(mr->container) and
> > 
> > IMHO we need to do this. Please recheck this after reading above on
> > finalize() v.s. free().  Here object_unparent() is needed because the MR is
> > allocated by the owner, it means only the owner knows how to free() it
> > (again: MR's own free()==NULL).  Then if we want to free it properly, we
> > need to detach it from the owner object first, hence object_unparent().
> > 
> > > instead we can just call memory_region_del_subregion() instead to keep
> > > object_unparent() functional. However, strictly speaking,
> > 
> > I think we can do that, or we don't.  It's the same as whether we want to
> > explicitly detach links for an embeded MR: if it's not explicitly done, we
> > can rely on the finalize() of the MR to do so. >
> > In all cases, object_unparent() will still be needed because it will do
> > more things than memory_region_del_subregion(): it removes the property
> > link itself.
> > 
> > This is also one reason why I keep thinking what I suggested might be good:
> > it covers not only mr->subregions and also mr->container, it means the
> > detachment has no ordering constraints on which mr is finalize()ed first
> > (in this case, either the container mr or the child mr).
> 
> I referred to the following statement you made by saying we don't have to
> object_unparent(mr->container):
> > IMHO we don't finalize the container at all - what I suggested was we
> > call del_subregion() for the case where container != NULL.  Since in
> > this case both container & mr belong to the same owner, it shouldn't
> > change any refcount, but only remove the link.
> 
> I discussed calling del_subregion() is still problematic below.
> 
> Regards,
> Akihiko Odaki
> 
> > 
> > > memory_region_del_subregion() is also a side effect and may affect
> > > operations with the container later. Such a side effect is better to be
> > > avoided whenever possible.
> > > 
> > > My point in this discussion is that the device has a control of its memory
> > > region even during finalization. We shouldn't call object_unparent(),

I assume we're talking about embeded MRs, then I agree until here.

> > > memory_region_del_subregion(), or perform whatever operation on another
> > > memory region; otherwise the device may see an unexpected side effect.

I don't get it here.

We're still taking in the context of calling finalize() before property
list being detached, right?

In that case detachment of the property list should be the last thing we
do.  What unexpected side effect do we worry about?

To be explicit, IMHO the right sequence of doing it is (which remove
property later):

        static void object_finalize(void *data)
        {
                Object *obj = data;
                TypeImpl *ti = obj->class->type;

                object_deinit(obj, ti);
                object_property_del_all(obj); <------------ [1]

                g_assert(obj->ref == 0);
                g_assert(obj->parent == NULL);
                if (obj->free) {
                        obj->free(obj);
                }
        }

We're talking about MR automatically unlink both mr->container and
mr->subregions at [1] (comparing to the old world where we only auto detach
mr->subregions).  After that it calls obj->free() and memory is freed.
I don't see what can be the side effect you mentioned.

The only side effect I can think of is when the container MR finalize() is
called later than the subregion's, then the container will see empty
mr->subregions on its own, but that's exactly what we want to work out
here: we want to make sure mr finalize() work in random order.

> > > 
> > > Now, let's consider the correct assumption that QOM deletes properties
> > > *before* instance_finalize(). Under this assumption, the statement in
> > > docs/devel/memory.rst saying that calling object_unparent() in
> > > instance_finalize() is fine is wrong because the memory region is already
> > > finalized and we shouldn't call object_unparent() for a finalized object.
> > > 
> > > This also means a device does not have a control of its memory regions
> > > during finalization and the device will not see the side effect of calling
> > > memory_region_del_subregion(). However, if a device really needs to control
> > > the order of memory region finalization, it can still call object_ref() for
> > > memory regions to keep them alive until instance_finalize(). The effect of
> > > memory_region_del_subregion() will be  visible to the device if a device
> > > does such a trick. While it is very unlikely that any device does such a
> > > trick, it is still better to avoid having any assumption on devices.
> > > 
> > > With the proposed object_ref(mr->owner == subregion->owner ?
> > > OBJECT(subregion) : subregion->owner) call, we don't have to make any
> > > assumption on how finalization works, which is too complicated as this
> > > discussion shows. It certainly makes the use of two reference counters, but
> > > it does not deviate from the current policy on their usage and it is also
> > > trivial to adapt when the reference counter in memory region gets removed.
> > > Considering all above, I believe my patch is the most straightforward
> > > solution.
> > 
> > Let's see how you thinks after you read above first.  I hope we can get
> > more onto the same page at least on the context.
> > 
> > So far I still prefer using mr->refcount as less as possible.  Now it plays
> > the only role as "whether this MR is put onto an owner property list", and
> > for all the rest refcounts on MR it should always routes to the owner.  I
> > still think we need to be extremely cautious on further complicating this
> > refcount.  It's already complicated indeed, but hopefully this model is the
> > best we can trivially have right now, and so far it's clear to me, but it's
> > always possible I overlooked something.
>
Akihiko Odaki Aug. 31, 2024, 4:03 a.m. UTC | #14
On 2024/08/31 0:05, Peter Xu wrote:
> On Fri, Aug 30, 2024 at 03:11:38PM +0900, Akihiko Odaki wrote:
>> On 2024/08/30 4:48, Peter Xu wrote:
>>> On Thu, Aug 29, 2024 at 01:39:36PM +0900, Akihiko Odaki wrote:
>>>>>> I am calling the fact that embedded memory regions are accessible in
>>>>>> instance_finalize() "live". A device can perform operations on its memory
>>>>>> regions during instance_finalize() and we should be aware of that.
>>>>>
>>>>> This part is true.  I suppose we should still suggest device finalize() to
>>>>> properly detach MRs, and that should normally be done there.
>>>>
>>>> It is better to avoid manual resource deallocation in general because it is
>>>> too error-prone.
>>>
>>> I had an impression that you mixed up "finalize()" and "free()" in the
>>> context of our discussion.. let us clarify this first before everything
>>> else below, just in case I overlook stuff..
>> I am aware of that distinction. I dealt with it with patch "virtio-gpu:
>> Handle resource blob commands":
>> https://lore.kernel.org/r/20240822185110.1757429-12-dmitry.osipenko@collabora.com
>>
>>>
>>> MR is very special as an object, in that it should have no free() hook,
>>> hence by default nothing is going to be freed when mr->refcount==0.  It
>>> means MRs need to be freed manually always.
>>>
>>> For example:
>>>
>>> (gdb) p system_memory->parent_obj->free
>>> $2 = (ObjectFree *) 0x0
>>>
>>> It plays perfect because the majority of QEMU device model is using MR as a
>>> field (rather than a pointer) of another structure, so that's exactly what
>>> we're looking for: we don't want to free() the MR as it's allocated
>>> together with the owner device.  That'll be released when the owner free().
>>>
>>> When dynamic allocation gets into the picture for MR, it's more complicated
>>> for sure, because it means the user (like VFIOQuirk) will need to manually
>>> allocate the MRs, then it requires explicit object_unparent() to detach
>>> that from the device / owner when finalize().  NOTE!  object_unparent()
>>> will NOT free the MR yet so far.  The MR still need to be manually freed
>>> with an explicit g_free(), normally.  Again, I'd suggest you refer to the
>>> VFIOQuirk code just as an example.  In that case this part is done with
>>> e.g. vfio_bar_quirk_finalize().
>>>
>>>           for (i = 0; i < quirk->nr_mem; i++) {
>>>               object_unparent(OBJECT(&quirk->mem[i]));
>>>           }
>>>           g_free(quirk->mem);
>>>
>>> Here quirk->mem is a pointer to an array of MR which can contain one or
>>> more MRs, but the idea is the same.
>>>
>>>>
>>>> I have an impression with QEMU code base that it is failing manual resource
>>>> deallocation so frequently although such deallocation can be easily
>>>> automated by binding resources to objects and free them when objects die by
>>>> providing a function like Linux's devm_kmalloc(). Unfortunately I haven't
>>>> found time to do that though.
>>>
>>> AFAICT, the property list is exactly what you're saying.  IIUC normally an
>>> object will be properly finalized()ed + free()ed when the parent object is
>>> finalize()ed.  Here MR is just special as it bypasses all the free() part.
>>>
>>>>
>>>> So my opinion here is 1) we should automate resource deallocation as much as
>>>> possible but 2) we shouldn't disturb code that performs manual resource
>>>> management.
>>>
>>> Agree.  However note again that in whatever case cross-device MR links will
>>> still require explicit detachment or it's prone to memory leak.
>>
>> I am not sure what you refer to with cross-device MR links so can you
>> clarify it?
> 
> I was referring to Peter Maydell's example, where the MR can be used
> outside of its owner.  In that case manual operation is a must before
> finalize(), as finalize() can only make sense to resolve internal links
> automatically.
> 
> But now knowing that you're explicitly mentioning "deallocation" rather
> than "finalize()", and you're aware of diff between deallocation
> v.s. finalize(), I suppose I misunderstood what you meant, and now I'm not
> sure I get what you're suggesting.

I referred to both of finalize() and free() with "resource 
deallocation"; actually I referred to any resource that requires some 
operation after its use finishes.

Ideally, any resource should be automatically released. It should be 
also possible to manually release resource to enforce ordering.

> 
>>
>>>
>>>>
>>>> instance_finalize() is for manual resource management. It is better to have
>>>> less code in instance_finalize() and fortunately MemoryRegion don't require
>>>> any code in instance_finalize() in most cases. If instance_finalize() still
>>>> insists to call object_unparent(), we shouldn't prevent that. (I changed my
>>>> mind regarding this particular case of object_unparent() however as I
>>>> describe below.)
>>>>
>>>>>
>>>>>>
>>>>>> object_unparent() is such an example. instance_finalize() of a device can
>>>>>> call object_unparent() for a subregion and for its container. If we
>>>>>> automatically finalize the container when calling object_unparent() for the
>>>>>> subregion, calling object_unparent() for its container will result in the
>>>>>> second finalization, which is not good.
>>>>>
>>>>> IMHO we don't finalize the container at all - what I suggested was we call
>>>>> del_subregion() for the case where container != NULL.  Since in this case
>>>>> both container & mr belong to the same owner, it shouldn't change any
>>>>> refcount, but only remove the link.
>>>>>
>>>>> However I think I see what you pointed out.  I wonder why we remove all
>>>>> properties now before reaching instance_finalze(): shouldn't finalize() be
>>>>> allowed to access some of the properties?
>>>>>
>>>>> It goes back to this commit:
>>>>>
>>>>> commit 76a6e1cc7cc3ad022e7159b37b291b75bc4615bf
>>>>> Author: Paolo Bonzini <pbonzini@redhat.com>
>>>>> Date:   Wed Jun 11 11:58:30 2014 +0200
>>>>>
>>>>>        qom: object: delete properties before calling instance_finalize
>>>>>        This ensures that the children's unparent callback will still
>>>>>        have a usable parent.
>>>>>        Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
>>>>>        Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>>>
>>>>>    From this series (as the 1st patch there):
>>>>>
>>>>> https://lore.kernel.org/qemu-devel/1406716032-21795-1-git-send-email-pbonzini@redhat.com/
>>>>>
>>>>> I can't say I fully understand the commit yet so far.. because it seems
>>>>> this patch was trying to pave way so that MR's unparent() can have a usable
>>>>> parent.  However... I don't think mr implemented unparent() at all..
>>>>> while it just sounds more reasonable that properties shouldn't be
>>>>> auto-removed before calling instance_finalize() from my gut feeling.
>>>>>
>>>>> I tried to revert 76a6e1cc7c ("qom: object: delete properties before
>>>>> calling instance_finalize"), "make check" all passes here.  I am a bit
>>>>> confused on where it applies, and whether we should revert it.
>>>>>
>>>>> If with 76a6e1cc7cc reverted, I think your concern should go away because
>>>>> then properties (including MRs) will only be detached after owner's
>>>>> instance_finalize().  Again, I wished Paolo could chime in as he should
>>>>> know the best.
>>>>
>>>> I didn't know QOM deletes properties before instance_finalize().
>>>>
>>>> I think it is a bad idea to delete properties before instance_finalize().
>>>> The intention is that to keep a usable parent during unparenting, but it is
>>>> inevitable that an object is in a semi-finalized state during finalization.
>>>> If the order of finalization matters, it should be explicitly described in
>>>> instance_finalize(). Deleting properties before instance_finalize() prevents
>>>> that.
>>>>
>>>> That said, I think it is too late to revert that change. "make check" works
>>>> as a preliminary check, but ultimately we need manual tree-wide review,
>>>> which is too costly.
>>>
>>> I don't think it's too late. :)
>>>
>>> IMHO if that statement is true, then QEMU will gradually become not
>>> maintainable anymore with tons of such code nobody understands and nobody
>>> can touch.  To me it can destine its death soon afterwards if so, sooner or
>>> later.
>>>
>>> OTOH, it'll be harder to justify or change indeed if some piece of code
>>> stays longer in-tree.  So I agree with you even if we want to change
>>> anything around this we should be with extreme cautions, and I don't think
>>> we should merge anything like this too late of a release, just to give it
>>> time to expose and break things.
>>>
>>>>
>>>> Going back to the original topic, I had the (incorrect) assumption that QOM
>>>> deletes properties *after* instance_finalize(). Under such an assumption, it
>>>
>>> Let's stick with this model; so far I still think this is the right thing
>>> to do, and I'm happy to know that you seem to at least agree (irrelevant of
>>> whether we should move on with a change).
>>>
>>>> would be unsafe to remove a subregion from its container when a subregion is
>>>> gone. As you said, we don't have to object_unparent(mr->container) and
>>>
>>> IMHO we need to do this. Please recheck this after reading above on
>>> finalize() v.s. free().  Here object_unparent() is needed because the MR is
>>> allocated by the owner, it means only the owner knows how to free() it
>>> (again: MR's own free()==NULL).  Then if we want to free it properly, we
>>> need to detach it from the owner object first, hence object_unparent().
>>>
>>>> instead we can just call memory_region_del_subregion() instead to keep
>>>> object_unparent() functional. However, strictly speaking,
>>>
>>> I think we can do that, or we don't.  It's the same as whether we want to
>>> explicitly detach links for an embeded MR: if it's not explicitly done, we
>>> can rely on the finalize() of the MR to do so. >
>>> In all cases, object_unparent() will still be needed because it will do
>>> more things than memory_region_del_subregion(): it removes the property
>>> link itself.
>>>
>>> This is also one reason why I keep thinking what I suggested might be good:
>>> it covers not only mr->subregions and also mr->container, it means the
>>> detachment has no ordering constraints on which mr is finalize()ed first
>>> (in this case, either the container mr or the child mr).
>>
>> I referred to the following statement you made by saying we don't have to
>> object_unparent(mr->container):
>>> IMHO we don't finalize the container at all - what I suggested was we
>>> call del_subregion() for the case where container != NULL.  Since in
>>> this case both container & mr belong to the same owner, it shouldn't
>>> change any refcount, but only remove the link.
>>
>> I discussed calling del_subregion() is still problematic below.
>>
>> Regards,
>> Akihiko Odaki
>>
>>>
>>>> memory_region_del_subregion() is also a side effect and may affect
>>>> operations with the container later. Such a side effect is better to be
>>>> avoided whenever possible.
>>>>
>>>> My point in this discussion is that the device has a control of its memory
>>>> region even during finalization. We shouldn't call object_unparent(),
> 
> I assume we're talking about embeded MRs, then I agree until here.
> 
>>>> memory_region_del_subregion(), or perform whatever operation on another
>>>> memory region; otherwise the device may see an unexpected side effect.
> 
> I don't get it here.
> 
> We're still taking in the context of calling finalize() before property
> list being detached, right?
> 
> In that case detachment of the property list should be the last thing we
> do.  What unexpected side effect do we worry about?
> 
> To be explicit, IMHO the right sequence of doing it is (which remove
> property later):
> 
>          static void object_finalize(void *data)
>          {
>                  Object *obj = data;
>                  TypeImpl *ti = obj->class->type;
> 
>                  object_deinit(obj, ti);
>                  object_property_del_all(obj); <------------ [1]
> 
>                  g_assert(obj->ref == 0);
>                  g_assert(obj->parent == NULL);
>                  if (obj->free) {
>                          obj->free(obj);
>                  }
>          }
> 
> We're talking about MR automatically unlink both mr->container and
> mr->subregions at [1] (comparing to the old world where we only auto detach
> mr->subregions).  After that it calls obj->free() and memory is freed.
> I don't see what can be the side effect you mentioned.
> 
> The only side effect I can think of is when the container MR finalize() is
> called later than the subregion's, then the container will see empty
> mr->subregions on its own, but that's exactly what we want to work out
> here: we want to make sure mr finalize() work in random order.

We have full control when the scenario where the MR is automatically 
unparented so it is not really a problem. It is safe to automatically 
call memory_region_del_subregion() in such a case.

I am talking about side effects caused by automatically calling 
memory_region_del_subregion() when instance_finalize() of the owner 
calls object_unparent() for the subregion. This mutates the state of the 
container although object_unparent() is called for the subregion. This 
is a side effect a device does not expect.

So there is a tradeoff. There is no such side effect in my patch. My 
patch strictly keeps the direction of reference (a container refers to 
its subregion) so a subregion does not have to mutate the state of its 
container. However it makes use of the reference counter.

What you propose on the other hand does not use the reference counter, 
but it goes against the direction of reference and changes the state of 
container when the subregion gets unparented, which can be surprising 
for the device.

I argue that my patch makes the good tradeoff because the logic is so 
simple that it can be trivially adapted even if the reference counter 
gets removed. On the other hand, having a side effect when finalizing 
subregions makes an assumption on devices, and there are plenty of 
devices in QEMU. Comparing the numbers, it makes more sense to maintain 
one simple logic of reference tracking.

Regards,
Akihiko Odaki
diff mbox series

Patch

diff --git a/system/memory.c b/system/memory.c
index 5e6eb459d5de..e4d3e9d1f427 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2612,7 +2612,9 @@  static void memory_region_update_container_subregions(MemoryRegion *subregion)
 
     memory_region_transaction_begin();
 
-    memory_region_ref(subregion);
+    object_ref(mr->owner == subregion->owner ?
+               OBJECT(subregion) : subregion->owner);
+
     QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
         if (subregion->priority >= other->priority) {
             QTAILQ_INSERT_BEFORE(other, subregion, subregions_link);
@@ -2670,7 +2672,9 @@  void memory_region_del_subregion(MemoryRegion *mr,
         assert(alias->mapped_via_alias >= 0);
     }
     QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link);
-    memory_region_unref(subregion);
+    object_unref(mr->owner == subregion->owner ?
+                 OBJECT(subregion) : subregion->owner);
+
     memory_region_update_pending |= mr->enabled && subregion->enabled;
     memory_region_transaction_commit();
 }