[v5,6/6] domain: use PGC_extra domheap page for shared_info

Message ID	20200309102304.1251-7-paul@xen.org (mailing list archive)
State	Superseded
Headers	show Return-Path: <SRS0=wOs1=42=lists.xenproject.org=xen-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D393F208C3 IronPort-SDR: 1JNFNEQy1QS+3A8UqAFLiA2sW6odpkvK8kGpfVBn4Q//1HQtu879gO/lhm/DbShpgK+Rsb20YK 4xfNdQUyWLEA== From: <paul@xen.org> To: <xen-devel@lists.xenproject.org> Date: Mon, 9 Mar 2020 10:23:04 +0000 Message-ID: <20200309102304.1251-7-paul@xen.org> In-Reply-To: <20200309102304.1251-1-paul@xen.org> References: <20200309102304.1251-1-paul@xen.org> MIME-Version: 1.0 Precedence: Bulk Subject: [Xen-devel] [PATCH v5 6/6] domain: use PGC_extra domheap page for shared_info Cc: Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>, Wei Liu <wl@xen.org>, Paul Durrant <paul@xen.org>, Andrew Cooper <andrew.cooper3@citrix.com>, Paul Durrant <pdurrant@amazon.com>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>, Ian Jackson <ian.jackson@eu.citrix.com>, George Dunlap <george.dunlap@citrix.com>, Jan Beulich <jbeulich@suse.com>, Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
Series	remove one more shared xenheap page: shared_info \| expand [v5,0/6] remove one more shared xenheap page: shared_info [v5,1/6] domain: introduce alloc/free_shared_info() helpers... [v5,2/6] x86 / p2m: replace page_list check in p2m_alloc_table... [v5,3/6] x86 / pv: do not treat PGC_extra pages as RAM [v5,4/6] x86 / ioreq: use a MEMF_no_refcount allocation for server pages... [v5,5/6] mm: add 'is_special_page' inline function... [v5,6/6] domain: use PGC_extra domheap page for shared_info

Paul Durrant March 9, 2020, 10:23 a.m. UTC

From: Paul Durrant <pdurrant@amazon.com>

Currently shared_info is a shared xenheap page but shared xenheap pages
complicate future plans for live-update of Xen so it is desirable to,
where possible, not use them [1]. This patch therefore converts shared_info
into a PGC_extra domheap page. This does entail freeing shared_info during
domain_relinquish_resources() rather than domain_destroy() so care is
needed to avoid de-referencing a NULL shared_info pointer hence some
extra checks of 'is_dying' are needed.

Also, because shared_info will no longer be a xenheap page this patch adds
an extra dump function to make sure the shared_info MFN is included in the
output of dump_pageframe_info().

NOTE: For Arm, the call to free_shared_info() in arch_domain_destroy() is
      left in place since it is idempotent and called in the error path for
      arch_domain_create().

[1] See https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg02018.html

Signed-off-by: Paul Durrant <paul@xen.org>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien@xen.org>
Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Wei Liu <wl@xen.org>

v5:
 - Incorporate new dump_shared_info() function

v2:
 - Addressed comments from Julien
 - Expanded the commit comment to explain why this patch is wanted
---
 xen/arch/arm/domain.c      |  2 ++
 xen/arch/x86/domain.c      |  5 ++++-
 xen/common/domain.c        | 38 ++++++++++++++++++++++++++++++++++----
 xen/common/event_channel.c |  3 +++
 xen/common/time.c          | 15 +++++++++++++++
 xen/include/xen/domain.h   |  1 +
 6 files changed, 59 insertions(+), 5 deletions(-)

Jan Beulich March 9, 2020, 3:56 p.m. UTC | #1

On 09.03.2020 11:23, paul@xen.org wrote:
> From: Paul Durrant <pdurrant@amazon.com>
> 
> Currently shared_info is a shared xenheap page but shared xenheap pages
> complicate future plans for live-update of Xen so it is desirable to,
> where possible, not use them [1]. This patch therefore converts shared_info
> into a PGC_extra domheap page. This does entail freeing shared_info during
> domain_relinquish_resources() rather than domain_destroy() so care is
> needed to avoid de-referencing a NULL shared_info pointer hence some
> extra checks of 'is_dying' are needed.
> 
> Also, because shared_info will no longer be a xenheap page this patch adds
> an extra dump function to make sure the shared_info MFN is included in the
> output of dump_pageframe_info().

I've voiced my objection to such a model, and hence it'll take
another REST maintainer to approve of this despite my arguments
against it. (And of course, just to re-record this here, the
APIC access page, converted by ea3daabff5f2, will want to get a
dumping function added then, too.)

I wonder whether a domain's "extra" pages couldn't be put in a
separate, singly-linked list, using the union the next_shadow
field is in as the linking field. None of the other union
members look to be applicable to "extra" pages.

> +void dump_shared_info(struct domain *d)
> +{
> +    domain_lock(d);
> +
> +    if ( d->shared_info.virt )
> +        printk("Shared Info: %"PRI_mfn"\n", mfn_x(d->shared_info.mfn));

count_info and type_info should be logged imo, just like
dump_pageframe_info() does. On the whole I think the actual
dumping might better be uniform, and these functions would
then only exist to "know" which page(s) to dump.

> --- a/xen/common/time.c
> +++ b/xen/common/time.c
> @@ -99,6 +99,18 @@ void update_domain_wallclock_time(struct domain *d)
>      uint32_t *wc_version;
>      uint64_t sec;
>  
> +    if ( d != current->domain )
> +    {
> +        /*
> +         * We need to check is_dying here as, if it is set, the
> +         * shared_info may have been freed. To do this safely we need
> +         * hold the domain lock.
> +         */
> +        domain_lock(d);
> +        if ( d->is_dying )
> +            goto unlock;
> +    }

This shouldn't happen very often, but it's pretty heavy a lock.
It's a fundamental aspect of xenheap pages that their disposal
can b e delay until almost the last moment of guest cleanup. I
continue to think it's not really a good ideal to have special
purpose allocation (and mapping) accompanied by these pages
getting taken care of by the generic relinquish-resources logic
here (from a more general pov such is of course often nice to
have). Instead of freeing these pages there, couldn't they just
be taken off d->page_list, with the unmapping and freeing left
as it was?

Jan

Paul Durrant March 10, 2020, 5:33 p.m. UTC | #2

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 09 March 2020 15:56
> To: paul@xen.org
> Cc: xen-devel@lists.xenproject.org; Paul Durrant <pdurrant@amazon.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Julien Grall <julien@xen.org>; Volodymyr Babchuk
> <Volodymyr_Babchuk@epam.com>; Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap
> <george.dunlap@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>; Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com>; Wei Liu <wl@xen.org>
> Subject: Re: [PATCH v5 6/6] domain: use PGC_extra domheap page for shared_info
> 
> On 09.03.2020 11:23, paul@xen.org wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> >
> > Currently shared_info is a shared xenheap page but shared xenheap pages
> > complicate future plans for live-update of Xen so it is desirable to,
> > where possible, not use them [1]. This patch therefore converts shared_info
> > into a PGC_extra domheap page. This does entail freeing shared_info during
> > domain_relinquish_resources() rather than domain_destroy() so care is
> > needed to avoid de-referencing a NULL shared_info pointer hence some
> > extra checks of 'is_dying' are needed.
> >
> > Also, because shared_info will no longer be a xenheap page this patch adds
> > an extra dump function to make sure the shared_info MFN is included in the
> > output of dump_pageframe_info().
> 
> I've voiced my objection to such a model, and hence it'll take
> another REST maintainer to approve of this despite my arguments
> against it. (And of course, just to re-record this here, the
> APIC access page, converted by ea3daabff5f2, will want to get a
> dumping function added then, too.)
> 
> I wonder whether a domain's "extra" pages couldn't be put in a
> separate, singly-linked list, using the union the next_shadow
> field is in as the linking field. None of the other union
> members look to be applicable to "extra" pages.
> 
> > +void dump_shared_info(struct domain *d)
> > +{
> > +    domain_lock(d);
> > +
> > +    if ( d->shared_info.virt )
> > +        printk("Shared Info: %"PRI_mfn"\n", mfn_x(d->shared_info.mfn));
> 
> count_info and type_info should be logged imo, just like
> dump_pageframe_info() does. On the whole I think the actual
> dumping might better be uniform, and these functions would
> then only exist to "know" which page(s) to dump.
> 

I think the extra page list should cover these issues.

> > --- a/xen/common/time.c
> > +++ b/xen/common/time.c
> > @@ -99,6 +99,18 @@ void update_domain_wallclock_time(struct domain *d)
> >      uint32_t *wc_version;
> >      uint64_t sec;
> >
> > +    if ( d != current->domain )
> > +    {
> > +        /*
> > +         * We need to check is_dying here as, if it is set, the
> > +         * shared_info may have been freed. To do this safely we need
> > +         * hold the domain lock.
> > +         */
> > +        domain_lock(d);
> > +        if ( d->is_dying )
> > +            goto unlock;
> > +    }
> 
> This shouldn't happen very often, but it's pretty heavy a lock.
> It's a fundamental aspect of xenheap pages that their disposal
> can b e delay until almost the last moment of guest cleanup. I
> continue to think it's not really a good ideal to have special
> purpose allocation (and mapping) accompanied by these pages
> getting taken care of by the generic relinquish-resources logic
> here (from a more general pov such is of course often nice to
> have). Instead of freeing these pages there, couldn't they just
> be taken off d->page_list, with the unmapping and freeing left
> as it was?

I don't think this can be achieved without being able de-assign pages and I don't really want to have to invent new logic to do that (basically re-implementing what happens to xenheap pages). I really don't think it is that bad to deal with shared info and grant table pages as domheap pages. Yes, we have to be careful, but in this case the lock is only taken in a toolstack update of the wallclock and, apart from start of day access, I think this is limited to XEN_DOMCTL_settimeoffset and XEN_DOMCTL_setvcpucontext neither of which I believe is particularly performance-critical.

  Paul

> 
> Jan

Jan Beulich March 11, 2020, 9:16 a.m. UTC | #3

On 10.03.2020 18:33, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 09 March 2020 15:56
>>
>> On 09.03.2020 11:23, paul@xen.org wrote:
>>> --- a/xen/common/time.c
>>> +++ b/xen/common/time.c
>>> @@ -99,6 +99,18 @@ void update_domain_wallclock_time(struct domain *d)
>>>      uint32_t *wc_version;
>>>      uint64_t sec;
>>>
>>> +    if ( d != current->domain )
>>> +    {
>>> +        /*
>>> +         * We need to check is_dying here as, if it is set, the
>>> +         * shared_info may have been freed. To do this safely we need
>>> +         * hold the domain lock.
>>> +         */
>>> +        domain_lock(d);
>>> +        if ( d->is_dying )
>>> +            goto unlock;
>>> +    }
>>
>> This shouldn't happen very often, but it's pretty heavy a lock.
>> It's a fundamental aspect of xenheap pages that their disposal
>> can b e delay until almost the last moment of guest cleanup. I
>> continue to think it's not really a good ideal to have special
>> purpose allocation (and mapping) accompanied by these pages
>> getting taken care of by the generic relinquish-resources logic
>> here (from a more general pov such is of course often nice to
>> have). Instead of freeing these pages there, couldn't they just
>> be taken off d->page_list, with the unmapping and freeing left
>> as it was?
> 
> I don't think this can be achieved without being able de-assign
> pages and I don't really want to have to invent new logic to do
> that (basically re-implementing what happens to xenheap pages).

Where's the connection to being able to de-assign pages here?
There'll be one when the same conversion is to be done for
gnttab code, but I don't see it here - the shared info page is
never to be de-assigned. As to gnttab code, I think it was
noted before that we may be better off not "unpopulating"
status pages when switching back from v2 to v1. At which point
the de-assignment need would go away there, too.

> I really don't think it is that bad to deal with shared info
> and grant table pages as domheap pages. Yes, we have to be
> careful, but in this case the lock is only taken in a
> toolstack update of the wallclock and, apart from start of
> day access, I think this is limited to XEN_DOMCTL_settimeoffset
> and XEN_DOMCTL_setvcpucontext neither of which I believe is
> particularly performance-critical.

It's not, I agree (and hence I had started my previous reply
also with "This shouldn't happen very often"). How all of this
is going to look like with the new extra_page_list I'll have
to see anyway. But for now I remain unconvinced of the want /
need to de-allocate the shared info page early.

Jan

Paul Durrant March 11, 2020, 3:28 p.m. UTC | #4

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 11 March 2020 09:17
> To: paul@xen.org
> Cc: xen-devel@lists.xenproject.org; 'Paul Durrant' <pdurrant@amazon.com>; 'Stefano Stabellini'
> <sstabellini@kernel.org>; 'Julien Grall' <julien@xen.org>; 'Volodymyr Babchuk'
> <Volodymyr_Babchuk@epam.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'George Dunlap'
> <george.dunlap@citrix.com>; 'Ian Jackson' <ian.jackson@eu.citrix.com>; 'Konrad Rzeszutek Wilk'
> <konrad.wilk@oracle.com>; 'Wei Liu' <wl@xen.org>
> Subject: Re: [PATCH v5 6/6] domain: use PGC_extra domheap page for shared_info
> 
> On 10.03.2020 18:33, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 09 March 2020 15:56
> >>
> >> On 09.03.2020 11:23, paul@xen.org wrote:
> >>> --- a/xen/common/time.c
> >>> +++ b/xen/common/time.c
> >>> @@ -99,6 +99,18 @@ void update_domain_wallclock_time(struct domain *d)
> >>>      uint32_t *wc_version;
> >>>      uint64_t sec;
> >>>
> >>> +    if ( d != current->domain )
> >>> +    {
> >>> +        /*
> >>> +         * We need to check is_dying here as, if it is set, the
> >>> +         * shared_info may have been freed. To do this safely we need
> >>> +         * hold the domain lock.
> >>> +         */
> >>> +        domain_lock(d);
> >>> +        if ( d->is_dying )
> >>> +            goto unlock;
> >>> +    }
> >>
> >> This shouldn't happen very often, but it's pretty heavy a lock.
> >> It's a fundamental aspect of xenheap pages that their disposal
> >> can b e delay until almost the last moment of guest cleanup. I
> >> continue to think it's not really a good ideal to have special
> >> purpose allocation (and mapping) accompanied by these pages
> >> getting taken care of by the generic relinquish-resources logic
> >> here (from a more general pov such is of course often nice to
> >> have). Instead of freeing these pages there, couldn't they just
> >> be taken off d->page_list, with the unmapping and freeing left
> >> as it was?
> >
> > I don't think this can be achieved without being able de-assign
> > pages and I don't really want to have to invent new logic to do
> > that (basically re-implementing what happens to xenheap pages).
> 
> Where's the connection to being able to de-assign pages here?
> There'll be one when the same conversion is to be done for
> gnttab code, but I don't see it here - the shared info page is
> never to be de-assigned. As to gnttab code, I think it was
> noted before that we may be better off not "unpopulating"
> status pages when switching back from v2 to v1. At which point
> the de-assignment need would go away there, too.

Ok, maybe I'm misunderstanding something then. We need to call free_domheap_pages() on all pages assigned to a domain so that the domain references get dropped. The xenpage ref is dropped when d->xenheap_pages == 0. The domheap ref is dropped when domain_adjust_tot_pages() returns zero. (This is what I meant by de-assigning... but that was probably a poor choice of words). So, because domain_adjust_tot_pages() returns d->tot_pages (which includes the extra_pages count) it won't fall to zero until the last put_page() on any PGC_extra page. So how is it possible to free shared_info in domain destroy? We'll never get that far, because the domheap ref will never get dropped. I guess this could be fixed by having domain_adjust_tot_pages() return the same values as domain_tot_pages() (i.e. tot_pages - extra_pages). Is that what you're suggesting?

> 
> > I really don't think it is that bad to deal with shared info
> > and grant table pages as domheap pages. Yes, we have to be
> > careful, but in this case the lock is only taken in a
> > toolstack update of the wallclock and, apart from start of
> > day access, I think this is limited to XEN_DOMCTL_settimeoffset
> > and XEN_DOMCTL_setvcpucontext neither of which I believe is
> > particularly performance-critical.
> 
> It's not, I agree (and hence I had started my previous reply
> also with "This shouldn't happen very often"). How all of this
> is going to look like with the new extra_page_list I'll have
> to see anyway. But for now I remain unconvinced of the want /
> need to de-allocate the shared info page early.
> 

Well hopefully I've explained above why that is currently necessary if it becomes a domheap page.

  Paul

Jan Beulich March 12, 2020, 8:34 a.m. UTC | #5

On 11.03.2020 16:28, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 11 March 2020 09:17
>> To: paul@xen.org
>> Cc: xen-devel@lists.xenproject.org; 'Paul Durrant' <pdurrant@amazon.com>; 'Stefano Stabellini'
>> <sstabellini@kernel.org>; 'Julien Grall' <julien@xen.org>; 'Volodymyr Babchuk'
>> <Volodymyr_Babchuk@epam.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'George Dunlap'
>> <george.dunlap@citrix.com>; 'Ian Jackson' <ian.jackson@eu.citrix.com>; 'Konrad Rzeszutek Wilk'
>> <konrad.wilk@oracle.com>; 'Wei Liu' <wl@xen.org>
>> Subject: Re: [PATCH v5 6/6] domain: use PGC_extra domheap page for shared_info
>>
>> On 10.03.2020 18:33, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: 09 March 2020 15:56
>>>>
>>>> On 09.03.2020 11:23, paul@xen.org wrote:
>>>>> --- a/xen/common/time.c
>>>>> +++ b/xen/common/time.c
>>>>> @@ -99,6 +99,18 @@ void update_domain_wallclock_time(struct domain *d)
>>>>>      uint32_t *wc_version;
>>>>>      uint64_t sec;
>>>>>
>>>>> +    if ( d != current->domain )
>>>>> +    {
>>>>> +        /*
>>>>> +         * We need to check is_dying here as, if it is set, the
>>>>> +         * shared_info may have been freed. To do this safely we need
>>>>> +         * hold the domain lock.
>>>>> +         */
>>>>> +        domain_lock(d);
>>>>> +        if ( d->is_dying )
>>>>> +            goto unlock;
>>>>> +    }
>>>>
>>>> This shouldn't happen very often, but it's pretty heavy a lock.
>>>> It's a fundamental aspect of xenheap pages that their disposal
>>>> can b e delay until almost the last moment of guest cleanup. I
>>>> continue to think it's not really a good ideal to have special
>>>> purpose allocation (and mapping) accompanied by these pages
>>>> getting taken care of by the generic relinquish-resources logic
>>>> here (from a more general pov such is of course often nice to
>>>> have). Instead of freeing these pages there, couldn't they just
>>>> be taken off d->page_list, with the unmapping and freeing left
>>>> as it was?
>>>
>>> I don't think this can be achieved without being able de-assign
>>> pages and I don't really want to have to invent new logic to do
>>> that (basically re-implementing what happens to xenheap pages).
>>
>> Where's the connection to being able to de-assign pages here?
>> There'll be one when the same conversion is to be done for
>> gnttab code, but I don't see it here - the shared info page is
>> never to be de-assigned. As to gnttab code, I think it was
>> noted before that we may be better off not "unpopulating"
>> status pages when switching back from v2 to v1. At which point
>> the de-assignment need would go away there, too.
> 
> Ok, maybe I'm misunderstanding something then. We need to call
> free_domheap_pages() on all pages assigned to a domain so that
> the domain references get dropped. The xenpage ref is dropped
> when d->xenheap_pages == 0. The domheap ref is dropped when
> domain_adjust_tot_pages() returns zero. (This is what I meant
> by de-assigning... but that was probably a poor choice of words).
> So, because domain_adjust_tot_pages() returns d->tot_pages
> (which includes the extra_pages count) it won't fall to zero
> until the last put_page() on any PGC_extra page. So how is it
> possible to free shared_info in domain destroy? We'll never get
> that far, because the domheap ref will never get dropped.

Well, now that these pages sit on a separate list, it would
look even less problematic than before to me to also give them
special treatment here: You wouldn't even have to take them
off the list anymore, but just call domain_adjust_tot_pages()
with -d->extra_pages (and suitably deal with the return value;
perhaps for consistency then followed by also zeroing
d->extra_pages, so overall accounting still looks correct).
Actually taking these pages off the list could (for dumping
purposes) then be done alongside their actual freeing. Such a
transition would apparently imply clearing PGC_extra alongside
the new domain_adjust_tot_pages() call.

I realize though that the end result won't be much different
from the current PGC_xen_heap handling (at least as far as
domain cleanup goes, but after all that's what I'm in fact
trying to convince you of), so the question would be whether
the whole transition then is worth it. Without having seen at
least a sketch of the LU code that is affected by the current
behavior, it remains hard for me to tell what issues might
remain, despite your and David's explanations.

> I
> guess this could be fixed by having domain_adjust_tot_pages()
> return the same values as domain_tot_pages() (i.e.
> tot_pages - extra_pages). Is that what you're suggesting?

That's an option, too, but imo less desirable.

Jan

Paul Durrant March 12, 2020, 9:09 a.m. UTC | #6

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 12 March 2020 08:34
> To: paul@xen.org
> Cc: xen-devel@lists.xenproject.org; Durrant, Paul <pdurrant@amazon.co.uk>; 'Stefano Stabellini'
> <sstabellini@kernel.org>; 'Julien Grall' <julien@xen.org>; 'Volodymyr Babchuk'
> <Volodymyr_Babchuk@epam.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'George Dunlap'
> <george.dunlap@citrix.com>; 'Ian Jackson' <ian.jackson@eu.citrix.com>; 'Konrad Rzeszutek Wilk'
> <konrad.wilk@oracle.com>; 'Wei Liu' <wl@xen.org>
> Subject: RE: [EXTERNAL][PATCH v5 6/6] domain: use PGC_extra domheap page for shared_info
> 
> CAUTION: This email originated from outside of the organization. Do not click links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 11.03.2020 16:28, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 11 March 2020 09:17
> >> To: paul@xen.org
> >> Cc: xen-devel@lists.xenproject.org; 'Paul Durrant' <pdurrant@amazon.com>; 'Stefano Stabellini'
> >> <sstabellini@kernel.org>; 'Julien Grall' <julien@xen.org>; 'Volodymyr Babchuk'
> >> <Volodymyr_Babchuk@epam.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'George Dunlap'
> >> <george.dunlap@citrix.com>; 'Ian Jackson' <ian.jackson@eu.citrix.com>; 'Konrad Rzeszutek Wilk'
> >> <konrad.wilk@oracle.com>; 'Wei Liu' <wl@xen.org>
> >> Subject: Re: [PATCH v5 6/6] domain: use PGC_extra domheap page for shared_info
> >>
> >> On 10.03.2020 18:33, Paul Durrant wrote:
> >>>> -----Original Message-----
> >>>> From: Jan Beulich <jbeulich@suse.com>
> >>>> Sent: 09 March 2020 15:56
> >>>>
> >>>> On 09.03.2020 11:23, paul@xen.org wrote:
> >>>>> --- a/xen/common/time.c
> >>>>> +++ b/xen/common/time.c
> >>>>> @@ -99,6 +99,18 @@ void update_domain_wallclock_time(struct domain *d)
> >>>>>      uint32_t *wc_version;
> >>>>>      uint64_t sec;
> >>>>>
> >>>>> +    if ( d != current->domain )
> >>>>> +    {
> >>>>> +        /*
> >>>>> +         * We need to check is_dying here as, if it is set, the
> >>>>> +         * shared_info may have been freed. To do this safely we need
> >>>>> +         * hold the domain lock.
> >>>>> +         */
> >>>>> +        domain_lock(d);
> >>>>> +        if ( d->is_dying )
> >>>>> +            goto unlock;
> >>>>> +    }
> >>>>
> >>>> This shouldn't happen very often, but it's pretty heavy a lock.
> >>>> It's a fundamental aspect of xenheap pages that their disposal
> >>>> can b e delay until almost the last moment of guest cleanup. I
> >>>> continue to think it's not really a good ideal to have special
> >>>> purpose allocation (and mapping) accompanied by these pages
> >>>> getting taken care of by the generic relinquish-resources logic
> >>>> here (from a more general pov such is of course often nice to
> >>>> have). Instead of freeing these pages there, couldn't they just
> >>>> be taken off d->page_list, with the unmapping and freeing left
> >>>> as it was?
> >>>
> >>> I don't think this can be achieved without being able de-assign
> >>> pages and I don't really want to have to invent new logic to do
> >>> that (basically re-implementing what happens to xenheap pages).
> >>
> >> Where's the connection to being able to de-assign pages here?
> >> There'll be one when the same conversion is to be done for
> >> gnttab code, but I don't see it here - the shared info page is
> >> never to be de-assigned. As to gnttab code, I think it was
> >> noted before that we may be better off not "unpopulating"
> >> status pages when switching back from v2 to v1. At which point
> >> the de-assignment need would go away there, too.
> >
> > Ok, maybe I'm misunderstanding something then. We need to call
> > free_domheap_pages() on all pages assigned to a domain so that
> > the domain references get dropped. The xenpage ref is dropped
> > when d->xenheap_pages == 0. The domheap ref is dropped when
> > domain_adjust_tot_pages() returns zero. (This is what I meant
> > by de-assigning... but that was probably a poor choice of words).
> > So, because domain_adjust_tot_pages() returns d->tot_pages
> > (which includes the extra_pages count) it won't fall to zero
> > until the last put_page() on any PGC_extra page. So how is it
> > possible to free shared_info in domain destroy? We'll never get
> > that far, because the domheap ref will never get dropped.
> 
> Well, now that these pages sit on a separate list, it would
> look even less problematic than before to me to also give them
> special treatment here: You wouldn't even have to take them
> off the list anymore, but just call domain_adjust_tot_pages()
> with -d->extra_pages (and suitably deal with the return value;
> perhaps for consistency then followed by also zeroing
> d->extra_pages, so overall accounting still looks correct).
> Actually taking these pages off the list could (for dumping
> purposes) then be done alongside their actual freeing. Such a
> transition would apparently imply clearing PGC_extra alongside
> the new domain_adjust_tot_pages() call.
> 
> I realize though that the end result won't be much different
> from the current PGC_xen_heap handling (at least as far as
> domain cleanup goes, but after all that's what I'm in fact
> trying to convince you of), so the question would be whether
> the whole transition then is worth it.

Yes... with such adjustments the code gets increasingly pointless from a simplification PoV... which is why I coded this patch as it is. I am happy to make the concession of using the extra page list for dumping purposes, and to avoid the need to special-case PGC_extra pages in a few places, but otherwise I want them to be treated as 'normal' domheap pages.

So, if you're not willing to ack this patch then I guess I'll have to re-send the series including the grant table changes and the removal of the xenpage list.

  Paul

Jan Beulich March 12, 2020, 9:26 a.m. UTC | #7

On 12.03.2020 10:09, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 12 March 2020 08:34
>> To: paul@xen.org
>> Cc: xen-devel@lists.xenproject.org; Durrant, Paul <pdurrant@amazon.co.uk>; 'Stefano Stabellini'
>> <sstabellini@kernel.org>; 'Julien Grall' <julien@xen.org>; 'Volodymyr Babchuk'
>> <Volodymyr_Babchuk@epam.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'George Dunlap'
>> <george.dunlap@citrix.com>; 'Ian Jackson' <ian.jackson@eu.citrix.com>; 'Konrad Rzeszutek Wilk'
>> <konrad.wilk@oracle.com>; 'Wei Liu' <wl@xen.org>
>> Subject: RE: [EXTERNAL][PATCH v5 6/6] domain: use PGC_extra domheap page for shared_info
>>
>> CAUTION: This email originated from outside of the organization. Do not click links or open
>> attachments unless you can confirm the sender and know the content is safe.
>>
>>
>>
>> On 11.03.2020 16:28, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: 11 March 2020 09:17
>>>> To: paul@xen.org
>>>> Cc: xen-devel@lists.xenproject.org; 'Paul Durrant' <pdurrant@amazon.com>; 'Stefano Stabellini'
>>>> <sstabellini@kernel.org>; 'Julien Grall' <julien@xen.org>; 'Volodymyr Babchuk'
>>>> <Volodymyr_Babchuk@epam.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'George Dunlap'
>>>> <george.dunlap@citrix.com>; 'Ian Jackson' <ian.jackson@eu.citrix.com>; 'Konrad Rzeszutek Wilk'
>>>> <konrad.wilk@oracle.com>; 'Wei Liu' <wl@xen.org>
>>>> Subject: Re: [PATCH v5 6/6] domain: use PGC_extra domheap page for shared_info
>>>>
>>>> On 10.03.2020 18:33, Paul Durrant wrote:
>>>>>> -----Original Message-----
>>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>>> Sent: 09 March 2020 15:56
>>>>>>
>>>>>> On 09.03.2020 11:23, paul@xen.org wrote:
>>>>>>> --- a/xen/common/time.c
>>>>>>> +++ b/xen/common/time.c
>>>>>>> @@ -99,6 +99,18 @@ void update_domain_wallclock_time(struct domain *d)
>>>>>>>      uint32_t *wc_version;
>>>>>>>      uint64_t sec;
>>>>>>>
>>>>>>> +    if ( d != current->domain )
>>>>>>> +    {
>>>>>>> +        /*
>>>>>>> +         * We need to check is_dying here as, if it is set, the
>>>>>>> +         * shared_info may have been freed. To do this safely we need
>>>>>>> +         * hold the domain lock.
>>>>>>> +         */
>>>>>>> +        domain_lock(d);
>>>>>>> +        if ( d->is_dying )
>>>>>>> +            goto unlock;
>>>>>>> +    }
>>>>>>
>>>>>> This shouldn't happen very often, but it's pretty heavy a lock.
>>>>>> It's a fundamental aspect of xenheap pages that their disposal
>>>>>> can b e delay until almost the last moment of guest cleanup. I
>>>>>> continue to think it's not really a good ideal to have special
>>>>>> purpose allocation (and mapping) accompanied by these pages
>>>>>> getting taken care of by the generic relinquish-resources logic
>>>>>> here (from a more general pov such is of course often nice to
>>>>>> have). Instead of freeing these pages there, couldn't they just
>>>>>> be taken off d->page_list, with the unmapping and freeing left
>>>>>> as it was?
>>>>>
>>>>> I don't think this can be achieved without being able de-assign
>>>>> pages and I don't really want to have to invent new logic to do
>>>>> that (basically re-implementing what happens to xenheap pages).
>>>>
>>>> Where's the connection to being able to de-assign pages here?
>>>> There'll be one when the same conversion is to be done for
>>>> gnttab code, but I don't see it here - the shared info page is
>>>> never to be de-assigned. As to gnttab code, I think it was
>>>> noted before that we may be better off not "unpopulating"
>>>> status pages when switching back from v2 to v1. At which point
>>>> the de-assignment need would go away there, too.
>>>
>>> Ok, maybe I'm misunderstanding something then. We need to call
>>> free_domheap_pages() on all pages assigned to a domain so that
>>> the domain references get dropped. The xenpage ref is dropped
>>> when d->xenheap_pages == 0. The domheap ref is dropped when
>>> domain_adjust_tot_pages() returns zero. (This is what I meant
>>> by de-assigning... but that was probably a poor choice of words).
>>> So, because domain_adjust_tot_pages() returns d->tot_pages
>>> (which includes the extra_pages count) it won't fall to zero
>>> until the last put_page() on any PGC_extra page. So how is it
>>> possible to free shared_info in domain destroy? We'll never get
>>> that far, because the domheap ref will never get dropped.
>>
>> Well, now that these pages sit on a separate list, it would
>> look even less problematic than before to me to also give them
>> special treatment here: You wouldn't even have to take them
>> off the list anymore, but just call domain_adjust_tot_pages()
>> with -d->extra_pages (and suitably deal with the return value;
>> perhaps for consistency then followed by also zeroing
>> d->extra_pages, so overall accounting still looks correct).
>> Actually taking these pages off the list could (for dumping
>> purposes) then be done alongside their actual freeing. Such a
>> transition would apparently imply clearing PGC_extra alongside
>> the new domain_adjust_tot_pages() call.
>>
>> I realize though that the end result won't be much different
>> from the current PGC_xen_heap handling (at least as far as
>> domain cleanup goes, but after all that's what I'm in fact
>> trying to convince you of), so the question would be whether
>> the whole transition then is worth it.
> 
> Yes... with such adjustments the code gets increasingly pointless from a simplification PoV... which is why I coded this patch as it is. I am happy to make the concession of using the extra page list for dumping purposes, and to avoid the need to special-case PGC_extra pages in a few places, but otherwise I want them to be treated as 'normal' domheap pages.
> 
> So, if you're not willing to ack this patch

Well - I'm not the only one in the position to ack this. The
problem is that right now this is just a discussion between
the two of us.

> then I guess I'll have to re-send the series including the
> grant table changes and the removal of the xenpage list.

I'm not sure I see how this would help. My reservations
against the earlier freeing of the extra pages wouldn't go
away.

But let me first get around to look at v6.

Jan

[v5,6/6] domain: use PGC_extra domheap page for shared_info

Commit Message

Comments

Patch