diff mbox series

[1/2] x86: Perform mem_sharing teardown before paging teardown

Message ID 3af8dbf3134b48f6bbc8f917e5fecaf8daee1c3d.1676351034.git.tamas@tklengyel.com (mailing list archive)
State Superseded
Headers show
Series [1/2] x86: Perform mem_sharing teardown before paging teardown | expand

Commit Message

Tamas K Lengyel Feb. 14, 2023, 5:07 a.m. UTC
An assert failure has been observed at p2m-basic.c:173 when performing vm
forking and then destroying the forked VM. The assert checks whether the
domain's shared pages counter is 0. According to the patch that originally
added the assert (7bedbbb5c31) the p2m_teardown should only happen after
mem_sharing already relinquished all shared pages.

In this patch we flip the order in which relinquish ops are called to avoid
tripping the assert.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
---
Note: it is unclear why this assert hasn't tripped in the past. It hasn't
been observed with 4.17-rc1 but it is in RELEASE-4.17.0
---
 xen/arch/x86/domain.c | 52 +++++++++++++++++++++----------------------
 1 file changed, 26 insertions(+), 26 deletions(-)

Comments

Jan Beulich Feb. 15, 2023, 10:27 a.m. UTC | #1
On 14.02.2023 06:07, Tamas K Lengyel wrote:
> An assert failure has been observed at p2m-basic.c:173 when performing vm

Please can you at least also name the function here? The line number is
going to change, and when coming back to this change later, it'll be
more troublesome to figure out which precise assertion was meant. Yes,
...

> forking and then destroying the forked VM. The assert checks whether the
> domain's shared pages counter is 0.

... you verbally describe it here, but still.

> According to the patch that originally
> added the assert (7bedbbb5c31) the p2m_teardown should only happen after
> mem_sharing already relinquished all shared pages.
> 
> In this patch we flip the order in which relinquish ops are called to avoid
> tripping the assert.
> 
> Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
> ---
> Note: it is unclear why this assert hasn't tripped in the past. It hasn't
> been observed with 4.17-rc1 but it is in RELEASE-4.17.0

That's almost certainly a result of e7aa55c0aab3 ("x86/p2m: free the paging
memory pool preemptively"), which added calls to p2m_teardown() to
hap_teardown(). If you agree, this wants recording in a Fixes: tag, as
- being an XSA fix - that change has been backported to everywhere, and
hence the issue now also exists everywhere.

> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -2310,6 +2310,32 @@ int domain_relinquish_resources(struct domain *d)
>          if ( ret )
>              return ret;
>  
> +#ifdef CONFIG_MEM_SHARING
> +    PROGRESS(shared):

If we go with the re-ordering as you suggest, then please also move the
enumerator near the top of the switch() body.

Did you consider the alternative of adjusting the ASSERT() in question
instead? It could reasonably become

#ifdef CONFIG_MEM_SHARING
    ASSERT(!p2m_is_hostp2m(p2m) || !remove_root || !atomic_read(&d->shr_pages));
#endif

now, I think. That would be less intrusive a change (helpful for
backporting), but there may be other (so far unnamed) benefits of pulling
ahead the shared memory teardown.

> +        if ( is_hvm_domain(d) )
> +        {
> +            /* If the domain has shared pages, relinquish them allowing
> +             * for preemption. */

Similarly, if the code is to be moved, please correct style here at this
occasion.

> +            ret = relinquish_shared_pages(d);
> +            if ( ret )
> +                return ret;

While I can easily agree with the movement ahead of this being okay, ...

> +            /*
> +             * If the domain is forked, decrement the parent's pause count
> +             * and release the domain.
> +             */
> +            if ( mem_sharing_is_fork(d) )
> +            {
> +                struct domain *parent = d->parent;
> +
> +                d->parent = NULL;
> +                domain_unpause(parent);
> +                put_domain(parent);
> +            }

... I can only trust you on being sure that moving ahead of this is
okay, too.

Jan
Tamas K Lengyel Feb. 15, 2023, 4:29 p.m. UTC | #2
On Wed, Feb 15, 2023 at 5:27 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 14.02.2023 06:07, Tamas K Lengyel wrote:
> > An assert failure has been observed at p2m-basic.c:173 when performing
vm
>
> Please can you at least also name the function here? The line number is
> going to change, and when coming back to this change later, it'll be
> more troublesome to figure out which precise assertion was meant. Yes,
> ...
>
> > forking and then destroying the forked VM. The assert checks whether the
> > domain's shared pages counter is 0.
>
> ... you verbally describe it here, but still.

Sure.

>
> > According to the patch that originally
> > added the assert (7bedbbb5c31) the p2m_teardown should only happen after
> > mem_sharing already relinquished all shared pages.
> >
> > In this patch we flip the order in which relinquish ops are called to
avoid
> > tripping the assert.
> >
> > Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
> > ---
> > Note: it is unclear why this assert hasn't tripped in the past. It
hasn't
> > been observed with 4.17-rc1 but it is in RELEASE-4.17.0
>
> That's almost certainly a result of e7aa55c0aab3 ("x86/p2m: free the
paging
> memory pool preemptively"), which added calls to p2m_teardown() to
> hap_teardown(). If you agree, this wants recording in a Fixes: tag, as
> - being an XSA fix - that change has been backported to everywhere, and
> hence the issue now also exists everywhere.

Ough.. In that case we'll need this patch backported too. Will add the
Fixes: tag.

>
> > --- a/xen/arch/x86/domain.c
> > +++ b/xen/arch/x86/domain.c
> > @@ -2310,6 +2310,32 @@ int domain_relinquish_resources(struct domain *d)
> >          if ( ret )
> >              return ret;
> >
> > +#ifdef CONFIG_MEM_SHARING
> > +    PROGRESS(shared):
>
> If we go with the re-ordering as you suggest, then please also move the
> enumerator near the top of the switch() body.

Sure.

>
> Did you consider the alternative of adjusting the ASSERT() in question
> instead? It could reasonably become
>
> #ifdef CONFIG_MEM_SHARING
>     ASSERT(!p2m_is_hostp2m(p2m) || !remove_root ||
!atomic_read(&d->shr_pages));
> #endif
>
> now, I think. That would be less intrusive a change (helpful for
> backporting), but there may be other (so far unnamed) benefits of pulling
> ahead the shared memory teardown.

I have a hard time understanding this proposed ASSERT.

>
> > +        if ( is_hvm_domain(d) )
> > +        {
> > +            /* If the domain has shared pages, relinquish them allowing
> > +             * for preemption. */
>
> Similarly, if the code is to be moved, please correct style here at this
> occasion.

Sure.

>
> > +            ret = relinquish_shared_pages(d);
> > +            if ( ret )
> > +                return ret;
>
> While I can easily agree with the movement ahead of this being okay, ...
>
> > +            /*
> > +             * If the domain is forked, decrement the parent's pause
count
> > +             * and release the domain.
> > +             */
> > +            if ( mem_sharing_is_fork(d) )
> > +            {
> > +                struct domain *parent = d->parent;
> > +
> > +                d->parent = NULL;
> > +                domain_unpause(parent);
> > +                put_domain(parent);
> > +            }
>
> ... I can only trust you on being sure that moving ahead of this is
> okay, too.

That's fine, we are in the teardown of the fork so it doesn't matter where
you are releasing the parent as long as its after the fork is unlinked from
it.

Thanks,
Tamas
Jan Beulich Feb. 16, 2023, 8:31 a.m. UTC | #3
On 15.02.2023 17:29, Tamas K Lengyel wrote:
> On Wed, Feb 15, 2023 at 5:27 AM Jan Beulich <jbeulich@suse.com> wrote:
>> Did you consider the alternative of adjusting the ASSERT() in question
>> instead? It could reasonably become
>>
>> #ifdef CONFIG_MEM_SHARING
>>     ASSERT(!p2m_is_hostp2m(p2m) || !remove_root ||
> !atomic_read(&d->shr_pages));
>> #endif
>>
>> now, I think. That would be less intrusive a change (helpful for
>> backporting), but there may be other (so far unnamed) benefits of pulling
>> ahead the shared memory teardown.
> 
> I have a hard time understanding this proposed ASSERT.

It accounts for the various ways p2m_teardown() can (now) be called,
limiting the actual check for no remaining shared pages to the last
of all these invocations (on the host p2m with remove_root=true).

Maybe

    /* Limit the check to the final invocation. */
    if ( p2m_is_hostp2m(p2m) && remove_root )
        ASSERT(!atomic_read(&d->shr_pages));

would make this easier to follow? Another option might be to move
the assertion to paging_final_teardown(), ahead of that specific call
to p2m_teardown().

Jan
Tamas K Lengyel Feb. 16, 2023, 12:22 p.m. UTC | #4
On Thu, Feb 16, 2023 at 3:31 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 15.02.2023 17:29, Tamas K Lengyel wrote:
> > On Wed, Feb 15, 2023 at 5:27 AM Jan Beulich <jbeulich@suse.com> wrote:
> >> Did you consider the alternative of adjusting the ASSERT() in question
> >> instead? It could reasonably become
> >>
> >> #ifdef CONFIG_MEM_SHARING
> >>     ASSERT(!p2m_is_hostp2m(p2m) || !remove_root ||
> > !atomic_read(&d->shr_pages));
> >> #endif
> >>
> >> now, I think. That would be less intrusive a change (helpful for
> >> backporting), but there may be other (so far unnamed) benefits of
pulling
> >> ahead the shared memory teardown.
> >
> > I have a hard time understanding this proposed ASSERT.
>
> It accounts for the various ways p2m_teardown() can (now) be called,
> limiting the actual check for no remaining shared pages to the last
> of all these invocations (on the host p2m with remove_root=true).
>
> Maybe
>
>     /* Limit the check to the final invocation. */
>     if ( p2m_is_hostp2m(p2m) && remove_root )
>         ASSERT(!atomic_read(&d->shr_pages));
>
> would make this easier to follow? Another option might be to move
> the assertion to paging_final_teardown(), ahead of that specific call
> to p2m_teardown().

AFAICT d->shr_pages would still be more then 0 when this is called before
sharing is torn down so the rearrangement is necessary even if we do this
assert only on the final invocation. I did a printk in place of this assert
without the rearrangement and afaict it was always != 0.

Tamas
Jan Beulich Feb. 16, 2023, 2:26 p.m. UTC | #5
On 16.02.2023 13:22, Tamas K Lengyel wrote:
> On Thu, Feb 16, 2023 at 3:31 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 15.02.2023 17:29, Tamas K Lengyel wrote:
>>> On Wed, Feb 15, 2023 at 5:27 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>> Did you consider the alternative of adjusting the ASSERT() in question
>>>> instead? It could reasonably become
>>>>
>>>> #ifdef CONFIG_MEM_SHARING
>>>>     ASSERT(!p2m_is_hostp2m(p2m) || !remove_root ||
>>> !atomic_read(&d->shr_pages));
>>>> #endif
>>>>
>>>> now, I think. That would be less intrusive a change (helpful for
>>>> backporting), but there may be other (so far unnamed) benefits of
> pulling
>>>> ahead the shared memory teardown.
>>>
>>> I have a hard time understanding this proposed ASSERT.
>>
>> It accounts for the various ways p2m_teardown() can (now) be called,
>> limiting the actual check for no remaining shared pages to the last
>> of all these invocations (on the host p2m with remove_root=true).
>>
>> Maybe
>>
>>     /* Limit the check to the final invocation. */
>>     if ( p2m_is_hostp2m(p2m) && remove_root )
>>         ASSERT(!atomic_read(&d->shr_pages));
>>
>> would make this easier to follow? Another option might be to move
>> the assertion to paging_final_teardown(), ahead of that specific call
>> to p2m_teardown().
> 
> AFAICT d->shr_pages would still be more then 0 when this is called before
> sharing is torn down so the rearrangement is necessary even if we do this
> assert only on the final invocation. I did a printk in place of this assert
> without the rearrangement and afaict it was always != 0.

Was your printk() in an if() as above? paging_final_teardown() runs really
late during cleanup (when we're about to free struct domain), so I would
be somewhat concerned if by that time the count was still non-zero.

Jan
Tamas K Lengyel Feb. 16, 2023, 3:10 p.m. UTC | #6
On Thu, Feb 16, 2023 at 9:27 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 16.02.2023 13:22, Tamas K Lengyel wrote:
> > On Thu, Feb 16, 2023 at 3:31 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 15.02.2023 17:29, Tamas K Lengyel wrote:
> >>> On Wed, Feb 15, 2023 at 5:27 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>> Did you consider the alternative of adjusting the ASSERT() in
question
> >>>> instead? It could reasonably become
> >>>>
> >>>> #ifdef CONFIG_MEM_SHARING
> >>>>     ASSERT(!p2m_is_hostp2m(p2m) || !remove_root ||
> >>> !atomic_read(&d->shr_pages));
> >>>> #endif
> >>>>
> >>>> now, I think. That would be less intrusive a change (helpful for
> >>>> backporting), but there may be other (so far unnamed) benefits of
> > pulling
> >>>> ahead the shared memory teardown.
> >>>
> >>> I have a hard time understanding this proposed ASSERT.
> >>
> >> It accounts for the various ways p2m_teardown() can (now) be called,
> >> limiting the actual check for no remaining shared pages to the last
> >> of all these invocations (on the host p2m with remove_root=true).
> >>
> >> Maybe
> >>
> >>     /* Limit the check to the final invocation. */
> >>     if ( p2m_is_hostp2m(p2m) && remove_root )
> >>         ASSERT(!atomic_read(&d->shr_pages));
> >>
> >> would make this easier to follow? Another option might be to move
> >> the assertion to paging_final_teardown(), ahead of that specific call
> >> to p2m_teardown().
> >
> > AFAICT d->shr_pages would still be more then 0 when this is called
before
> > sharing is torn down so the rearrangement is necessary even if we do
this
> > assert only on the final invocation. I did a printk in place of this
assert
> > without the rearrangement and afaict it was always != 0.
>
> Was your printk() in an if() as above? paging_final_teardown() runs really
> late during cleanup (when we're about to free struct domain), so I would
> be somewhat concerned if by that time the count was still non-zero.

Just replaced the assert with a printk. Without calling relinquish shared
pages I don't find it odd that the count is non-zero, it only gets
decremented when you do call relinquish. Once the order is corrected it is
zero.

Tamas
Jan Beulich Feb. 16, 2023, 3:24 p.m. UTC | #7
On 16.02.2023 16:10, Tamas K Lengyel wrote:
> On Thu, Feb 16, 2023 at 9:27 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 16.02.2023 13:22, Tamas K Lengyel wrote:
>>> On Thu, Feb 16, 2023 at 3:31 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 15.02.2023 17:29, Tamas K Lengyel wrote:
>>>>> On Wed, Feb 15, 2023 at 5:27 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>> Did you consider the alternative of adjusting the ASSERT() in
> question
>>>>>> instead? It could reasonably become
>>>>>>
>>>>>> #ifdef CONFIG_MEM_SHARING
>>>>>>     ASSERT(!p2m_is_hostp2m(p2m) || !remove_root ||
>>>>> !atomic_read(&d->shr_pages));
>>>>>> #endif
>>>>>>
>>>>>> now, I think. That would be less intrusive a change (helpful for
>>>>>> backporting), but there may be other (so far unnamed) benefits of
>>> pulling
>>>>>> ahead the shared memory teardown.
>>>>>
>>>>> I have a hard time understanding this proposed ASSERT.
>>>>
>>>> It accounts for the various ways p2m_teardown() can (now) be called,
>>>> limiting the actual check for no remaining shared pages to the last
>>>> of all these invocations (on the host p2m with remove_root=true).
>>>>
>>>> Maybe
>>>>
>>>>     /* Limit the check to the final invocation. */
>>>>     if ( p2m_is_hostp2m(p2m) && remove_root )
>>>>         ASSERT(!atomic_read(&d->shr_pages));
>>>>
>>>> would make this easier to follow? Another option might be to move
>>>> the assertion to paging_final_teardown(), ahead of that specific call
>>>> to p2m_teardown().
>>>
>>> AFAICT d->shr_pages would still be more then 0 when this is called
> before
>>> sharing is torn down so the rearrangement is necessary even if we do
> this
>>> assert only on the final invocation. I did a printk in place of this
> assert
>>> without the rearrangement and afaict it was always != 0.
>>
>> Was your printk() in an if() as above? paging_final_teardown() runs really
>> late during cleanup (when we're about to free struct domain), so I would
>> be somewhat concerned if by that time the count was still non-zero.
> 
> Just replaced the assert with a printk. Without calling relinquish shared
> pages I don't find it odd that the count is non-zero, it only gets
> decremented when you do call relinquish. Once the order is corrected it is
> zero.

I may be getting you wrong, but it feels like you're still talking about
early invocations of p2m_teardown() (from underneath domain_kill()) when
I'm talking about the final one. No matter what ordering inside
domain_relinquish_resources() (called from domain_kill()), the freeing
will have happened by the time that process completes. Which is before
the (typically last) last domain ref would be put (near the end of
domain_kill()), and hence before the domain freeing process might be
invoked (which is where paging_final_teardown() gets involved; see
{,arch_}domain_destroy()).

Jan
Tamas K Lengyel Feb. 16, 2023, 3:35 p.m. UTC | #8
On Thu, Feb 16, 2023 at 10:24 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 16.02.2023 16:10, Tamas K Lengyel wrote:
> > On Thu, Feb 16, 2023 at 9:27 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 16.02.2023 13:22, Tamas K Lengyel wrote:
> >>> On Thu, Feb 16, 2023 at 3:31 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> On 15.02.2023 17:29, Tamas K Lengyel wrote:
> >>>>> On Wed, Feb 15, 2023 at 5:27 AM Jan Beulich <jbeulich@suse.com>
wrote:
> >>>>>> Did you consider the alternative of adjusting the ASSERT() in
> > question
> >>>>>> instead? It could reasonably become
> >>>>>>
> >>>>>> #ifdef CONFIG_MEM_SHARING
> >>>>>>     ASSERT(!p2m_is_hostp2m(p2m) || !remove_root ||
> >>>>> !atomic_read(&d->shr_pages));
> >>>>>> #endif
> >>>>>>
> >>>>>> now, I think. That would be less intrusive a change (helpful for
> >>>>>> backporting), but there may be other (so far unnamed) benefits of
> >>> pulling
> >>>>>> ahead the shared memory teardown.
> >>>>>
> >>>>> I have a hard time understanding this proposed ASSERT.
> >>>>
> >>>> It accounts for the various ways p2m_teardown() can (now) be called,
> >>>> limiting the actual check for no remaining shared pages to the last
> >>>> of all these invocations (on the host p2m with remove_root=true).
> >>>>
> >>>> Maybe
> >>>>
> >>>>     /* Limit the check to the final invocation. */
> >>>>     if ( p2m_is_hostp2m(p2m) && remove_root )
> >>>>         ASSERT(!atomic_read(&d->shr_pages));
> >>>>
> >>>> would make this easier to follow? Another option might be to move
> >>>> the assertion to paging_final_teardown(), ahead of that specific call
> >>>> to p2m_teardown().
> >>>
> >>> AFAICT d->shr_pages would still be more then 0 when this is called
> > before
> >>> sharing is torn down so the rearrangement is necessary even if we do
> > this
> >>> assert only on the final invocation. I did a printk in place of this
> > assert
> >>> without the rearrangement and afaict it was always != 0.
> >>
> >> Was your printk() in an if() as above? paging_final_teardown() runs
really
> >> late during cleanup (when we're about to free struct domain), so I
would
> >> be somewhat concerned if by that time the count was still non-zero.
> >
> > Just replaced the assert with a printk. Without calling relinquish
shared
> > pages I don't find it odd that the count is non-zero, it only gets
> > decremented when you do call relinquish. Once the order is corrected it
is
> > zero.
>
> I may be getting you wrong, but it feels like you're still talking about
> early invocations of p2m_teardown() (from underneath domain_kill()) when
> I'm talking about the final one. No matter what ordering inside
> domain_relinquish_resources() (called from domain_kill()), the freeing
> will have happened by the time that process completes. Which is before
> the (typically last) last domain ref would be put (near the end of
> domain_kill()), and hence before the domain freeing process might be
> invoked (which is where paging_final_teardown() gets involved; see
> {,arch_}domain_destroy()).

I don't recall seeing a count with 0 before I reordered things but the
output on the serial may also have been truncated due to it printing a ton
of lines very quickly, so it could be the last one was zero just didn't
make it to my screen.

Tamas
diff mbox series

Patch

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index db3ebf062d..453ec52b6a 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2310,6 +2310,32 @@  int domain_relinquish_resources(struct domain *d)
         if ( ret )
             return ret;
 
+#ifdef CONFIG_MEM_SHARING
+    PROGRESS(shared):
+
+        if ( is_hvm_domain(d) )
+        {
+            /* If the domain has shared pages, relinquish them allowing
+             * for preemption. */
+            ret = relinquish_shared_pages(d);
+            if ( ret )
+                return ret;
+
+            /*
+             * If the domain is forked, decrement the parent's pause count
+             * and release the domain.
+             */
+            if ( mem_sharing_is_fork(d) )
+            {
+                struct domain *parent = d->parent;
+
+                d->parent = NULL;
+                domain_unpause(parent);
+                put_domain(parent);
+            }
+        }
+#endif
+
     PROGRESS(paging):
 
         /* Tear down paging-assistance stuff. */
@@ -2350,32 +2376,6 @@  int domain_relinquish_resources(struct domain *d)
             d->arch.auto_unmask = 0;
         }
 
-#ifdef CONFIG_MEM_SHARING
-    PROGRESS(shared):
-
-        if ( is_hvm_domain(d) )
-        {
-            /* If the domain has shared pages, relinquish them allowing
-             * for preemption. */
-            ret = relinquish_shared_pages(d);
-            if ( ret )
-                return ret;
-
-            /*
-             * If the domain is forked, decrement the parent's pause count
-             * and release the domain.
-             */
-            if ( mem_sharing_is_fork(d) )
-            {
-                struct domain *parent = d->parent;
-
-                d->parent = NULL;
-                domain_unpause(parent);
-                put_domain(parent);
-            }
-        }
-#endif
-
         spin_lock(&d->page_alloc_lock);
         page_list_splice(&d->arch.relmem_list, &d->page_list);
         INIT_PAGE_LIST_HEAD(&d->arch.relmem_list);