diff mbox series

[v5,06/10] x86/mem-sharing: copy GADDR based shared guest areas

Message ID 20231002151127.71020-7-roger.pau@citrix.com (mailing list archive)
State New, archived
Headers show
Series runstate/time area registration by (guest) physical address | expand

Commit Message

Roger Pau Monné Oct. 2, 2023, 3:11 p.m. UTC
From: Jan Beulich <jbeulich@suse.com>

In preparation of the introduction of new vCPU operations allowing to
register the respective areas (one of the two is x86-specific) by
guest-physical address, add the necessary fork handling (with the
backing function yet to be filled in).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v4:
 - Rely on map_guest_area() to populate the child p2m if necessary.
---
 xen/arch/x86/mm/mem_sharing.c | 31 +++++++++++++++++++++++++++++++
 xen/common/domain.c           |  7 +++++++
 2 files changed, 38 insertions(+)

Comments

Roger Pau Monné Oct. 3, 2023, 2:29 p.m. UTC | #1
On Tue, Oct 03, 2023 at 09:53:11AM -0400, Tamas K Lengyel wrote:
> On Mon, Oct 2, 2023 at 11:13 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
> >
> > From: Jan Beulich <jbeulich@suse.com>
> >
> > In preparation of the introduction of new vCPU operations allowing to
> > register the respective areas (one of the two is x86-specific) by
> > guest-physical address, add the necessary fork handling (with the
> > backing function yet to be filled in).
> >
> > Signed-off-by: Jan Beulich <jbeulich@suse.com>
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Changes since v4:
> >  - Rely on map_guest_area() to populate the child p2m if necessary.
> > ---
> >  xen/arch/x86/mm/mem_sharing.c | 31 +++++++++++++++++++++++++++++++
> >  xen/common/domain.c           |  7 +++++++
> >  2 files changed, 38 insertions(+)
> >
> > diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> > index 5f8f1fb4d871..99cf001fd70f 100644
> > --- a/xen/arch/x86/mm/mem_sharing.c
> > +++ b/xen/arch/x86/mm/mem_sharing.c
> > @@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
> >      hvm_set_nonreg_state(cd_vcpu, &nrs);
> >  }
> >
> > +static int copy_guest_area(struct guest_area *cd_area,
> > +                           const struct guest_area *d_area,
> > +                           struct vcpu *cd_vcpu,
> > +                           const struct domain *d)
> > +{
> > +    unsigned int offset;
> > +
> > +    /* Check if no area to map, or already mapped. */
> > +    if ( !d_area->pg || cd_area->pg )
> > +        return 0;
> > +
> > +    offset = PAGE_OFFSET(d_area->map);
> > +    return map_guest_area(cd_vcpu, gfn_to_gaddr(
> > +                                       mfn_to_gfn(d, page_to_mfn(d_area->pg))) +
> > +                                   offset,
> > +                          PAGE_SIZE - offset, cd_area, NULL);
> > +}
> > +
> >  static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
> >  {
> >      struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu);
> > @@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
> >                  return ret;
> >          }
> >
> > +        /* Same for the (physically registered) runstate and time info areas. */
> > +        ret = copy_guest_area(&cd_vcpu->runstate_guest_area,
> > +                              &d_vcpu->runstate_guest_area, cd_vcpu, d);
> > +        if ( ret )
> > +            return ret;
> > +        ret = copy_guest_area(&cd_vcpu->arch.time_guest_area,
> > +                              &d_vcpu->arch.time_guest_area, cd_vcpu, d);
> > +        if ( ret )
> > +            return ret;
> > +
> >          ret = copy_vpmu(d_vcpu, cd_vcpu);
> >          if ( ret )
> >              return ret;
> > @@ -1950,7 +1978,10 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state,
> >
> >   state:
> >      if ( reset_state )
> > +    {
> >          rc = copy_settings(d, pd);
> > +        /* TBD: What to do here with -ERESTART? */
> 
> There is no situation where we get an -ERESTART here currently. Is
> map_guest_area expected to run into situations where it fails with
> that rc?

Yes, there's a spin_trylock() call that will result in
map_guest_area() returning -ERESTART.

> If yes we might need a lock in place so we can block until it
> can succeed.

I'm not sure whether returning -ERESTART can actually happen in
map_guest_area() for the fork case: the child domain is still paused
at this point, so there can't be concurrent guest hypercalls that
would also cause the domain hypercall_deadlock_mutex to be acquired.

The comment was added by Jan, so I cannot be certain about the
intention, neither I would like to misinterpret his words.  My
understanding is that future uses of copy_settings() might indeed need
to report -ERESTART, and that it would need to be propagated for
proper hypercall continuations at some point.

Thanks, Roger.
Julien Grall Oct. 3, 2023, 3:07 p.m. UTC | #2
Hi Roger,

On 03/10/2023 15:29, Roger Pau Monné wrote:
> On Tue, Oct 03, 2023 at 09:53:11AM -0400, Tamas K Lengyel wrote:

Tamas, somehow your e-mails don't show up in my inbox (even if I am 
CCed) or even on lore.kernel.org/xen-devel. It is not even in my SPAM 
folder.

>> On Mon, Oct 2, 2023 at 11:13 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
>>>
>>> From: Jan Beulich <jbeulich@suse.com>
>>>
>>> In preparation of the introduction of new vCPU operations allowing to
>>> register the respective areas (one of the two is x86-specific) by
>>> guest-physical address, add the necessary fork handling (with the
>>> backing function yet to be filled in).
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> ---
>>> Changes since v4:
>>>   - Rely on map_guest_area() to populate the child p2m if necessary.
>>> ---
>>>   xen/arch/x86/mm/mem_sharing.c | 31 +++++++++++++++++++++++++++++++
>>>   xen/common/domain.c           |  7 +++++++
>>>   2 files changed, 38 insertions(+)
>>>
>>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
>>> index 5f8f1fb4d871..99cf001fd70f 100644
>>> --- a/xen/arch/x86/mm/mem_sharing.c
>>> +++ b/xen/arch/x86/mm/mem_sharing.c
>>> @@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
>>>       hvm_set_nonreg_state(cd_vcpu, &nrs);
>>>   }
>>>
>>> +static int copy_guest_area(struct guest_area *cd_area,
>>> +                           const struct guest_area *d_area,
>>> +                           struct vcpu *cd_vcpu,
>>> +                           const struct domain *d)
>>> +{
>>> +    unsigned int offset;
>>> +
>>> +    /* Check if no area to map, or already mapped. */
>>> +    if ( !d_area->pg || cd_area->pg )
>>> +        return 0;
>>> +
>>> +    offset = PAGE_OFFSET(d_area->map);
>>> +    return map_guest_area(cd_vcpu, gfn_to_gaddr(
>>> +                                       mfn_to_gfn(d, page_to_mfn(d_area->pg))) +
>>> +                                   offset,
>>> +                          PAGE_SIZE - offset, cd_area, NULL);
>>> +}
>>> +
>>>   static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
>>>   {
>>>       struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu);
>>> @@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
>>>                   return ret;
>>>           }
>>>
>>> +        /* Same for the (physically registered) runstate and time info areas. */
>>> +        ret = copy_guest_area(&cd_vcpu->runstate_guest_area,
>>> +                              &d_vcpu->runstate_guest_area, cd_vcpu, d);
>>> +        if ( ret )
>>> +            return ret;
>>> +        ret = copy_guest_area(&cd_vcpu->arch.time_guest_area,
>>> +                              &d_vcpu->arch.time_guest_area, cd_vcpu, d);
>>> +        if ( ret )
>>> +            return ret;
>>> +
>>>           ret = copy_vpmu(d_vcpu, cd_vcpu);
>>>           if ( ret )
>>>               return ret;
>>> @@ -1950,7 +1978,10 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state,
>>>
>>>    state:
>>>       if ( reset_state )
>>> +    {
>>>           rc = copy_settings(d, pd);
>>> +        /* TBD: What to do here with -ERESTART? */
>>
>> There is no situation where we get an -ERESTART here currently. Is
>> map_guest_area expected to run into situations where it fails with
>> that rc?
> 
> Yes, there's a spin_trylock() call that will result in
> map_guest_area() returning -ERESTART.
> 
>> If yes we might need a lock in place so we can block until it
>> can succeed.
> 
> I'm not sure whether returning -ERESTART can actually happen in
> map_guest_area() for the fork case: the child domain is still paused
> at this point, so there can't be concurrent guest hypercalls that
> would also cause the domain hypercall_deadlock_mutex to be acquired.

hypercall_deadlock_mutex is also acquired by domctls. So, I believe, 
-ERESTART could be returned if the toolstack is also issuing domclt 
right at the same time as forking.

Cheers,
Tamas K Lengyel Oct. 3, 2023, 8:25 p.m. UTC | #3
On Tue, Oct 3, 2023 at 11:07 AM Julien Grall <julien@xen.org> wrote:
>
> Hi Roger,
>
> On 03/10/2023 15:29, Roger Pau Monné wrote:
> > On Tue, Oct 03, 2023 at 09:53:11AM -0400, Tamas K Lengyel wrote:
>
> Tamas, somehow your e-mails don't show up in my inbox (even if I am
> CCed) or even on lore.kernel.org/xen-devel. It is not even in my SPAM
> folder.

Thanks, I've switched mailservers, hopefully that resolves the issue.

>
> >> On Mon, Oct 2, 2023 at 11:13 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
> >>>
> >>> From: Jan Beulich <jbeulich@suse.com>
> >>>
> >>> In preparation of the introduction of new vCPU operations allowing to
> >>> register the respective areas (one of the two is x86-specific) by
> >>> guest-physical address, add the necessary fork handling (with the
> >>> backing function yet to be filled in).
> >>>
> >>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >>> ---
> >>> Changes since v4:
> >>>   - Rely on map_guest_area() to populate the child p2m if necessary.
> >>> ---
> >>>   xen/arch/x86/mm/mem_sharing.c | 31 +++++++++++++++++++++++++++++++
> >>>   xen/common/domain.c           |  7 +++++++
> >>>   2 files changed, 38 insertions(+)
> >>>
> >>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> >>> index 5f8f1fb4d871..99cf001fd70f 100644
> >>> --- a/xen/arch/x86/mm/mem_sharing.c
> >>> +++ b/xen/arch/x86/mm/mem_sharing.c
> >>> @@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
> >>>       hvm_set_nonreg_state(cd_vcpu, &nrs);
> >>>   }
> >>>
> >>> +static int copy_guest_area(struct guest_area *cd_area,
> >>> +                           const struct guest_area *d_area,
> >>> +                           struct vcpu *cd_vcpu,
> >>> +                           const struct domain *d)
> >>> +{
> >>> +    unsigned int offset;
> >>> +
> >>> +    /* Check if no area to map, or already mapped. */
> >>> +    if ( !d_area->pg || cd_area->pg )
> >>> +        return 0;
> >>> +
> >>> +    offset = PAGE_OFFSET(d_area->map);
> >>> +    return map_guest_area(cd_vcpu, gfn_to_gaddr(
> >>> +                                       mfn_to_gfn(d, page_to_mfn(d_area->pg))) +
> >>> +                                   offset,
> >>> +                          PAGE_SIZE - offset, cd_area, NULL);
> >>> +}
> >>> +
> >>>   static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
> >>>   {
> >>>       struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu);
> >>> @@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
> >>>                   return ret;
> >>>           }
> >>>
> >>> +        /* Same for the (physically registered) runstate and time info areas. */
> >>> +        ret = copy_guest_area(&cd_vcpu->runstate_guest_area,
> >>> +                              &d_vcpu->runstate_guest_area, cd_vcpu, d);
> >>> +        if ( ret )
> >>> +            return ret;
> >>> +        ret = copy_guest_area(&cd_vcpu->arch.time_guest_area,
> >>> +                              &d_vcpu->arch.time_guest_area, cd_vcpu, d);
> >>> +        if ( ret )
> >>> +            return ret;
> >>> +
> >>>           ret = copy_vpmu(d_vcpu, cd_vcpu);
> >>>           if ( ret )
> >>>               return ret;
> >>> @@ -1950,7 +1978,10 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state,
> >>>
> >>>    state:
> >>>       if ( reset_state )
> >>> +    {
> >>>           rc = copy_settings(d, pd);
> >>> +        /* TBD: What to do here with -ERESTART? */
> >>
> >> There is no situation where we get an -ERESTART here currently. Is
> >> map_guest_area expected to run into situations where it fails with
> >> that rc?
> >
> > Yes, there's a spin_trylock() call that will result in
> > map_guest_area() returning -ERESTART.
> >
> >> If yes we might need a lock in place so we can block until it
> >> can succeed.
> >
> > I'm not sure whether returning -ERESTART can actually happen in
> > map_guest_area() for the fork case: the child domain is still paused
> > at this point, so there can't be concurrent guest hypercalls that
> > would also cause the domain hypercall_deadlock_mutex to be acquired.

Perhaps turning it into an ASSERT(rc != -ERESTART) is the way to go at
this point. If we run into any cases where it trips we can reason it
out.

> hypercall_deadlock_mutex is also acquired by domctls. So, I believe,
> -ERESTART could be returned if the toolstack is also issuing domclt
> right at the same time as forking.

That's not a concern in this path, only toolstack can start the reset
so we can assume it can coordinate not to have another toolstack
messing with the fork at the same time.

Thanks,
Tamas
Roger Pau Monné Oct. 4, 2023, 8:20 a.m. UTC | #4
On Tue, Oct 03, 2023 at 04:25:58PM -0400, Tamas K Lengyel wrote:
> On Tue, Oct 3, 2023 at 11:07 AM Julien Grall <julien@xen.org> wrote:
> >
> > Hi Roger,
> >
> > On 03/10/2023 15:29, Roger Pau Monné wrote:
> > > On Tue, Oct 03, 2023 at 09:53:11AM -0400, Tamas K Lengyel wrote:
> >
> > Tamas, somehow your e-mails don't show up in my inbox (even if I am
> > CCed) or even on lore.kernel.org/xen-devel. It is not even in my SPAM
> > folder.
> 
> Thanks, I've switched mailservers, hopefully that resolves the issue.
> 
> >
> > >> On Mon, Oct 2, 2023 at 11:13 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
> > >>>
> > >>> From: Jan Beulich <jbeulich@suse.com>
> > >>>
> > >>> In preparation of the introduction of new vCPU operations allowing to
> > >>> register the respective areas (one of the two is x86-specific) by
> > >>> guest-physical address, add the necessary fork handling (with the
> > >>> backing function yet to be filled in).
> > >>>
> > >>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> > >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > >>> ---
> > >>> Changes since v4:
> > >>>   - Rely on map_guest_area() to populate the child p2m if necessary.
> > >>> ---
> > >>>   xen/arch/x86/mm/mem_sharing.c | 31 +++++++++++++++++++++++++++++++
> > >>>   xen/common/domain.c           |  7 +++++++
> > >>>   2 files changed, 38 insertions(+)
> > >>>
> > >>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> > >>> index 5f8f1fb4d871..99cf001fd70f 100644
> > >>> --- a/xen/arch/x86/mm/mem_sharing.c
> > >>> +++ b/xen/arch/x86/mm/mem_sharing.c
> > >>> @@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
> > >>>       hvm_set_nonreg_state(cd_vcpu, &nrs);
> > >>>   }
> > >>>
> > >>> +static int copy_guest_area(struct guest_area *cd_area,
> > >>> +                           const struct guest_area *d_area,
> > >>> +                           struct vcpu *cd_vcpu,
> > >>> +                           const struct domain *d)
> > >>> +{
> > >>> +    unsigned int offset;
> > >>> +
> > >>> +    /* Check if no area to map, or already mapped. */
> > >>> +    if ( !d_area->pg || cd_area->pg )
> > >>> +        return 0;
> > >>> +
> > >>> +    offset = PAGE_OFFSET(d_area->map);
> > >>> +    return map_guest_area(cd_vcpu, gfn_to_gaddr(
> > >>> +                                       mfn_to_gfn(d, page_to_mfn(d_area->pg))) +
> > >>> +                                   offset,
> > >>> +                          PAGE_SIZE - offset, cd_area, NULL);
> > >>> +}
> > >>> +
> > >>>   static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
> > >>>   {
> > >>>       struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu);
> > >>> @@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
> > >>>                   return ret;
> > >>>           }
> > >>>
> > >>> +        /* Same for the (physically registered) runstate and time info areas. */
> > >>> +        ret = copy_guest_area(&cd_vcpu->runstate_guest_area,
> > >>> +                              &d_vcpu->runstate_guest_area, cd_vcpu, d);
> > >>> +        if ( ret )
> > >>> +            return ret;
> > >>> +        ret = copy_guest_area(&cd_vcpu->arch.time_guest_area,
> > >>> +                              &d_vcpu->arch.time_guest_area, cd_vcpu, d);
> > >>> +        if ( ret )
> > >>> +            return ret;
> > >>> +
> > >>>           ret = copy_vpmu(d_vcpu, cd_vcpu);
> > >>>           if ( ret )
> > >>>               return ret;
> > >>> @@ -1950,7 +1978,10 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state,
> > >>>
> > >>>    state:
> > >>>       if ( reset_state )
> > >>> +    {
> > >>>           rc = copy_settings(d, pd);
> > >>> +        /* TBD: What to do here with -ERESTART? */
> > >>
> > >> There is no situation where we get an -ERESTART here currently. Is
> > >> map_guest_area expected to run into situations where it fails with
> > >> that rc?
> > >
> > > Yes, there's a spin_trylock() call that will result in
> > > map_guest_area() returning -ERESTART.
> > >
> > >> If yes we might need a lock in place so we can block until it
> > >> can succeed.
> > >
> > > I'm not sure whether returning -ERESTART can actually happen in
> > > map_guest_area() for the fork case: the child domain is still paused
> > > at this point, so there can't be concurrent guest hypercalls that
> > > would also cause the domain hypercall_deadlock_mutex to be acquired.
> 
> Perhaps turning it into an ASSERT(rc != -ERESTART) is the way to go at
> this point. If we run into any cases where it trips we can reason it
> out.

In order to avoid possibly returning -ERESTART (which should never be
seen by hypercall callers) we might want to convert it to -EBUSY and
let the caller pick the pieces.

Thanks, Roger.
Julien Grall Oct. 4, 2023, 11:01 a.m. UTC | #5
Hi Roger,

On 04/10/2023 09:20, Roger Pau Monné wrote:
> On Tue, Oct 03, 2023 at 04:25:58PM -0400, Tamas K Lengyel wrote:
>> On Tue, Oct 3, 2023 at 11:07 AM Julien Grall <julien@xen.org> wrote:
>>>
>>> Hi Roger,
>>>
>>> On 03/10/2023 15:29, Roger Pau Monné wrote:
>>>> On Tue, Oct 03, 2023 at 09:53:11AM -0400, Tamas K Lengyel wrote:
>>>
>>> Tamas, somehow your e-mails don't show up in my inbox (even if I am
>>> CCed) or even on lore.kernel.org/xen-devel. It is not even in my SPAM
>>> folder.
>>
>> Thanks, I've switched mailservers, hopefully that resolves the issue.

It did. Thanks!

>>
>>>
>>>>> On Mon, Oct 2, 2023 at 11:13 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
>>>>>>
>>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>>>
>>>>>> In preparation of the introduction of new vCPU operations allowing to
>>>>>> register the respective areas (one of the two is x86-specific) by
>>>>>> guest-physical address, add the necessary fork handling (with the
>>>>>> backing function yet to be filled in).
>>>>>>
>>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>>>>> ---
>>>>>> Changes since v4:
>>>>>>    - Rely on map_guest_area() to populate the child p2m if necessary.
>>>>>> ---
>>>>>>    xen/arch/x86/mm/mem_sharing.c | 31 +++++++++++++++++++++++++++++++
>>>>>>    xen/common/domain.c           |  7 +++++++
>>>>>>    2 files changed, 38 insertions(+)
>>>>>>
>>>>>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
>>>>>> index 5f8f1fb4d871..99cf001fd70f 100644
>>>>>> --- a/xen/arch/x86/mm/mem_sharing.c
>>>>>> +++ b/xen/arch/x86/mm/mem_sharing.c
>>>>>> @@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
>>>>>>        hvm_set_nonreg_state(cd_vcpu, &nrs);
>>>>>>    }
>>>>>>
>>>>>> +static int copy_guest_area(struct guest_area *cd_area,
>>>>>> +                           const struct guest_area *d_area,
>>>>>> +                           struct vcpu *cd_vcpu,
>>>>>> +                           const struct domain *d)
>>>>>> +{
>>>>>> +    unsigned int offset;
>>>>>> +
>>>>>> +    /* Check if no area to map, or already mapped. */
>>>>>> +    if ( !d_area->pg || cd_area->pg )
>>>>>> +        return 0;
>>>>>> +
>>>>>> +    offset = PAGE_OFFSET(d_area->map);
>>>>>> +    return map_guest_area(cd_vcpu, gfn_to_gaddr(
>>>>>> +                                       mfn_to_gfn(d, page_to_mfn(d_area->pg))) +
>>>>>> +                                   offset,
>>>>>> +                          PAGE_SIZE - offset, cd_area, NULL);
>>>>>> +}
>>>>>> +
>>>>>>    static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
>>>>>>    {
>>>>>>        struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu);
>>>>>> @@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
>>>>>>                    return ret;
>>>>>>            }
>>>>>>
>>>>>> +        /* Same for the (physically registered) runstate and time info areas. */
>>>>>> +        ret = copy_guest_area(&cd_vcpu->runstate_guest_area,
>>>>>> +                              &d_vcpu->runstate_guest_area, cd_vcpu, d);
>>>>>> +        if ( ret )
>>>>>> +            return ret;
>>>>>> +        ret = copy_guest_area(&cd_vcpu->arch.time_guest_area,
>>>>>> +                              &d_vcpu->arch.time_guest_area, cd_vcpu, d);
>>>>>> +        if ( ret )
>>>>>> +            return ret;
>>>>>> +
>>>>>>            ret = copy_vpmu(d_vcpu, cd_vcpu);
>>>>>>            if ( ret )
>>>>>>                return ret;
>>>>>> @@ -1950,7 +1978,10 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state,
>>>>>>
>>>>>>     state:
>>>>>>        if ( reset_state )
>>>>>> +    {
>>>>>>            rc = copy_settings(d, pd);
>>>>>> +        /* TBD: What to do here with -ERESTART? */
>>>>>
>>>>> There is no situation where we get an -ERESTART here currently. Is
>>>>> map_guest_area expected to run into situations where it fails with
>>>>> that rc?
>>>>
>>>> Yes, there's a spin_trylock() call that will result in
>>>> map_guest_area() returning -ERESTART.
>>>>
>>>>> If yes we might need a lock in place so we can block until it
>>>>> can succeed.
>>>>
>>>> I'm not sure whether returning -ERESTART can actually happen in
>>>> map_guest_area() for the fork case: the child domain is still paused
>>>> at this point, so there can't be concurrent guest hypercalls that
>>>> would also cause the domain hypercall_deadlock_mutex to be acquired.
>>
>> Perhaps turning it into an ASSERT(rc != -ERESTART) is the way to go at
>> this point. If we run into any cases where it trips we can reason it
>> out.
> 
> In order to avoid possibly returning -ERESTART (which should never be
> seen by hypercall callers) we might want to convert it to -EBUSY and
> let the caller pick the pieces.

I realize this is a matter of taste. I think EAGAIN is a better 
conversion for ERESTART because we effectively want to caller to try again.

Cheers,
Roger Pau Monné Oct. 4, 2023, 1 p.m. UTC | #6
On Wed, Oct 04, 2023 at 12:01:21PM +0100, Julien Grall wrote:
> Hi Roger,
> 
> On 04/10/2023 09:20, Roger Pau Monné wrote:
> > On Tue, Oct 03, 2023 at 04:25:58PM -0400, Tamas K Lengyel wrote:
> > > On Tue, Oct 3, 2023 at 11:07 AM Julien Grall <julien@xen.org> wrote:
> > > > 
> > > > Hi Roger,
> > > > 
> > > > On 03/10/2023 15:29, Roger Pau Monné wrote:
> > > > > On Tue, Oct 03, 2023 at 09:53:11AM -0400, Tamas K Lengyel wrote:
> > > > 
> > > > Tamas, somehow your e-mails don't show up in my inbox (even if I am
> > > > CCed) or even on lore.kernel.org/xen-devel. It is not even in my SPAM
> > > > folder.
> > > 
> > > Thanks, I've switched mailservers, hopefully that resolves the issue.
> 
> It did. Thanks!
> 
> > > 
> > > > 
> > > > > > On Mon, Oct 2, 2023 at 11:13 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
> > > > > > > 
> > > > > > > From: Jan Beulich <jbeulich@suse.com>
> > > > > > > 
> > > > > > > In preparation of the introduction of new vCPU operations allowing to
> > > > > > > register the respective areas (one of the two is x86-specific) by
> > > > > > > guest-physical address, add the necessary fork handling (with the
> > > > > > > backing function yet to be filled in).
> > > > > > > 
> > > > > > > Signed-off-by: Jan Beulich <jbeulich@suse.com>
> > > > > > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > > > > > > ---
> > > > > > > Changes since v4:
> > > > > > >    - Rely on map_guest_area() to populate the child p2m if necessary.
> > > > > > > ---
> > > > > > >    xen/arch/x86/mm/mem_sharing.c | 31 +++++++++++++++++++++++++++++++
> > > > > > >    xen/common/domain.c           |  7 +++++++
> > > > > > >    2 files changed, 38 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
> > > > > > > index 5f8f1fb4d871..99cf001fd70f 100644
> > > > > > > --- a/xen/arch/x86/mm/mem_sharing.c
> > > > > > > +++ b/xen/arch/x86/mm/mem_sharing.c
> > > > > > > @@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
> > > > > > >        hvm_set_nonreg_state(cd_vcpu, &nrs);
> > > > > > >    }
> > > > > > > 
> > > > > > > +static int copy_guest_area(struct guest_area *cd_area,
> > > > > > > +                           const struct guest_area *d_area,
> > > > > > > +                           struct vcpu *cd_vcpu,
> > > > > > > +                           const struct domain *d)
> > > > > > > +{
> > > > > > > +    unsigned int offset;
> > > > > > > +
> > > > > > > +    /* Check if no area to map, or already mapped. */
> > > > > > > +    if ( !d_area->pg || cd_area->pg )
> > > > > > > +        return 0;
> > > > > > > +
> > > > > > > +    offset = PAGE_OFFSET(d_area->map);
> > > > > > > +    return map_guest_area(cd_vcpu, gfn_to_gaddr(
> > > > > > > +                                       mfn_to_gfn(d, page_to_mfn(d_area->pg))) +
> > > > > > > +                                   offset,
> > > > > > > +                          PAGE_SIZE - offset, cd_area, NULL);
> > > > > > > +}
> > > > > > > +
> > > > > > >    static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
> > > > > > >    {
> > > > > > >        struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu);
> > > > > > > @@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
> > > > > > >                    return ret;
> > > > > > >            }
> > > > > > > 
> > > > > > > +        /* Same for the (physically registered) runstate and time info areas. */
> > > > > > > +        ret = copy_guest_area(&cd_vcpu->runstate_guest_area,
> > > > > > > +                              &d_vcpu->runstate_guest_area, cd_vcpu, d);
> > > > > > > +        if ( ret )
> > > > > > > +            return ret;
> > > > > > > +        ret = copy_guest_area(&cd_vcpu->arch.time_guest_area,
> > > > > > > +                              &d_vcpu->arch.time_guest_area, cd_vcpu, d);
> > > > > > > +        if ( ret )
> > > > > > > +            return ret;
> > > > > > > +
> > > > > > >            ret = copy_vpmu(d_vcpu, cd_vcpu);
> > > > > > >            if ( ret )
> > > > > > >                return ret;
> > > > > > > @@ -1950,7 +1978,10 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state,
> > > > > > > 
> > > > > > >     state:
> > > > > > >        if ( reset_state )
> > > > > > > +    {
> > > > > > >            rc = copy_settings(d, pd);
> > > > > > > +        /* TBD: What to do here with -ERESTART? */
> > > > > > 
> > > > > > There is no situation where we get an -ERESTART here currently. Is
> > > > > > map_guest_area expected to run into situations where it fails with
> > > > > > that rc?
> > > > > 
> > > > > Yes, there's a spin_trylock() call that will result in
> > > > > map_guest_area() returning -ERESTART.
> > > > > 
> > > > > > If yes we might need a lock in place so we can block until it
> > > > > > can succeed.
> > > > > 
> > > > > I'm not sure whether returning -ERESTART can actually happen in
> > > > > map_guest_area() for the fork case: the child domain is still paused
> > > > > at this point, so there can't be concurrent guest hypercalls that
> > > > > would also cause the domain hypercall_deadlock_mutex to be acquired.
> > > 
> > > Perhaps turning it into an ASSERT(rc != -ERESTART) is the way to go at
> > > this point. If we run into any cases where it trips we can reason it
> > > out.
> > 
> > In order to avoid possibly returning -ERESTART (which should never be
> > seen by hypercall callers) we might want to convert it to -EBUSY and
> > let the caller pick the pieces.
> 
> I realize this is a matter of taste. I think EAGAIN is a better conversion
> for ERESTART because we effectively want to caller to try again.

That's fine with me, but could we leave adding such translation to a
further patch?

I would rather modify Jans code as less as possible.

Thanks, Roger.
Julien Grall Oct. 4, 2023, 1:06 p.m. UTC | #7
On 04/10/2023 14:00, Roger Pau Monné wrote:
> On Wed, Oct 04, 2023 at 12:01:21PM +0100, Julien Grall wrote:
>> Hi Roger,
>>
>> On 04/10/2023 09:20, Roger Pau Monné wrote:
>>> On Tue, Oct 03, 2023 at 04:25:58PM -0400, Tamas K Lengyel wrote:
>>>> On Tue, Oct 3, 2023 at 11:07 AM Julien Grall <julien@xen.org> wrote:
>>>>>
>>>>> Hi Roger,
>>>>>
>>>>> On 03/10/2023 15:29, Roger Pau Monné wrote:
>>>>>> On Tue, Oct 03, 2023 at 09:53:11AM -0400, Tamas K Lengyel wrote:
>>>>>
>>>>> Tamas, somehow your e-mails don't show up in my inbox (even if I am
>>>>> CCed) or even on lore.kernel.org/xen-devel. It is not even in my SPAM
>>>>> folder.
>>>>
>>>> Thanks, I've switched mailservers, hopefully that resolves the issue.
>>
>> It did. Thanks!
>>
>>>>
>>>>>
>>>>>>> On Mon, Oct 2, 2023 at 11:13 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
>>>>>>>>
>>>>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>>>>>
>>>>>>>> In preparation of the introduction of new vCPU operations allowing to
>>>>>>>> register the respective areas (one of the two is x86-specific) by
>>>>>>>> guest-physical address, add the necessary fork handling (with the
>>>>>>>> backing function yet to be filled in).
>>>>>>>>
>>>>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>>>>>>> ---
>>>>>>>> Changes since v4:
>>>>>>>>     - Rely on map_guest_area() to populate the child p2m if necessary.
>>>>>>>> ---
>>>>>>>>     xen/arch/x86/mm/mem_sharing.c | 31 +++++++++++++++++++++++++++++++
>>>>>>>>     xen/common/domain.c           |  7 +++++++
>>>>>>>>     2 files changed, 38 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
>>>>>>>> index 5f8f1fb4d871..99cf001fd70f 100644
>>>>>>>> --- a/xen/arch/x86/mm/mem_sharing.c
>>>>>>>> +++ b/xen/arch/x86/mm/mem_sharing.c
>>>>>>>> @@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
>>>>>>>>         hvm_set_nonreg_state(cd_vcpu, &nrs);
>>>>>>>>     }
>>>>>>>>
>>>>>>>> +static int copy_guest_area(struct guest_area *cd_area,
>>>>>>>> +                           const struct guest_area *d_area,
>>>>>>>> +                           struct vcpu *cd_vcpu,
>>>>>>>> +                           const struct domain *d)
>>>>>>>> +{
>>>>>>>> +    unsigned int offset;
>>>>>>>> +
>>>>>>>> +    /* Check if no area to map, or already mapped. */
>>>>>>>> +    if ( !d_area->pg || cd_area->pg )
>>>>>>>> +        return 0;
>>>>>>>> +
>>>>>>>> +    offset = PAGE_OFFSET(d_area->map);
>>>>>>>> +    return map_guest_area(cd_vcpu, gfn_to_gaddr(
>>>>>>>> +                                       mfn_to_gfn(d, page_to_mfn(d_area->pg))) +
>>>>>>>> +                                   offset,
>>>>>>>> +                          PAGE_SIZE - offset, cd_area, NULL);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>     static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
>>>>>>>>     {
>>>>>>>>         struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu);
>>>>>>>> @@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
>>>>>>>>                     return ret;
>>>>>>>>             }
>>>>>>>>
>>>>>>>> +        /* Same for the (physically registered) runstate and time info areas. */
>>>>>>>> +        ret = copy_guest_area(&cd_vcpu->runstate_guest_area,
>>>>>>>> +                              &d_vcpu->runstate_guest_area, cd_vcpu, d);
>>>>>>>> +        if ( ret )
>>>>>>>> +            return ret;
>>>>>>>> +        ret = copy_guest_area(&cd_vcpu->arch.time_guest_area,
>>>>>>>> +                              &d_vcpu->arch.time_guest_area, cd_vcpu, d);
>>>>>>>> +        if ( ret )
>>>>>>>> +            return ret;
>>>>>>>> +
>>>>>>>>             ret = copy_vpmu(d_vcpu, cd_vcpu);
>>>>>>>>             if ( ret )
>>>>>>>>                 return ret;
>>>>>>>> @@ -1950,7 +1978,10 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state,
>>>>>>>>
>>>>>>>>      state:
>>>>>>>>         if ( reset_state )
>>>>>>>> +    {
>>>>>>>>             rc = copy_settings(d, pd);
>>>>>>>> +        /* TBD: What to do here with -ERESTART? */
>>>>>>>
>>>>>>> There is no situation where we get an -ERESTART here currently. Is
>>>>>>> map_guest_area expected to run into situations where it fails with
>>>>>>> that rc?
>>>>>>
>>>>>> Yes, there's a spin_trylock() call that will result in
>>>>>> map_guest_area() returning -ERESTART.
>>>>>>
>>>>>>> If yes we might need a lock in place so we can block until it
>>>>>>> can succeed.
>>>>>>
>>>>>> I'm not sure whether returning -ERESTART can actually happen in
>>>>>> map_guest_area() for the fork case: the child domain is still paused
>>>>>> at this point, so there can't be concurrent guest hypercalls that
>>>>>> would also cause the domain hypercall_deadlock_mutex to be acquired.
>>>>
>>>> Perhaps turning it into an ASSERT(rc != -ERESTART) is the way to go at
>>>> this point. If we run into any cases where it trips we can reason it
>>>> out.
>>>
>>> In order to avoid possibly returning -ERESTART (which should never be
>>> seen by hypercall callers) we might want to convert it to -EBUSY and
>>> let the caller pick the pieces.
>>
>> I realize this is a matter of taste. I think EAGAIN is a better conversion
>> for ERESTART because we effectively want to caller to try again.
> 
> That's fine with me, but could we leave adding such translation to a
> further patch?

Wouldn't this mean that -ERESTART could be returned to the caller? If 
yes, then I think this should be handled here. Otherwise, we will be 
exposing a value that is not supposed to be exposed.

Cheers,
diff mbox series

Patch

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 5f8f1fb4d871..99cf001fd70f 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1641,6 +1641,24 @@  static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
     hvm_set_nonreg_state(cd_vcpu, &nrs);
 }
 
+static int copy_guest_area(struct guest_area *cd_area,
+                           const struct guest_area *d_area,
+                           struct vcpu *cd_vcpu,
+                           const struct domain *d)
+{
+    unsigned int offset;
+
+    /* Check if no area to map, or already mapped. */
+    if ( !d_area->pg || cd_area->pg )
+        return 0;
+
+    offset = PAGE_OFFSET(d_area->map);
+    return map_guest_area(cd_vcpu, gfn_to_gaddr(
+                                       mfn_to_gfn(d, page_to_mfn(d_area->pg))) +
+                                   offset,
+                          PAGE_SIZE - offset, cd_area, NULL);
+}
+
 static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
 {
     struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu);
@@ -1709,6 +1727,16 @@  static int copy_vcpu_settings(struct domain *cd, const struct domain *d)
                 return ret;
         }
 
+        /* Same for the (physically registered) runstate and time info areas. */
+        ret = copy_guest_area(&cd_vcpu->runstate_guest_area,
+                              &d_vcpu->runstate_guest_area, cd_vcpu, d);
+        if ( ret )
+            return ret;
+        ret = copy_guest_area(&cd_vcpu->arch.time_guest_area,
+                              &d_vcpu->arch.time_guest_area, cd_vcpu, d);
+        if ( ret )
+            return ret;
+
         ret = copy_vpmu(d_vcpu, cd_vcpu);
         if ( ret )
             return ret;
@@ -1950,7 +1978,10 @@  int mem_sharing_fork_reset(struct domain *d, bool reset_state,
 
  state:
     if ( reset_state )
+    {
         rc = copy_settings(d, pd);
+        /* TBD: What to do here with -ERESTART? */
+    }
 
     domain_unpause(d);
 
diff --git a/xen/common/domain.c b/xen/common/domain.c
index d4958ec5e149..47fc90271901 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1601,6 +1601,13 @@  void unmap_vcpu_info(struct vcpu *v)
     put_page_and_type(mfn_to_page(mfn));
 }
 
+int map_guest_area(struct vcpu *v, paddr_t gaddr, unsigned int size,
+                   struct guest_area *area,
+                   void (*populate)(void *dst, struct vcpu *v))
+{
+    return -EOPNOTSUPP;
+}
+
 /*
  * This is only intended to be used for domain cleanup (or more generally only
  * with at least the respective vCPU, if it's not the current one, reliably