Message ID | 20231002151127.71020-7-roger.pau@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | runstate/time area registration by (guest) physical address | expand |
On Tue, Oct 03, 2023 at 09:53:11AM -0400, Tamas K Lengyel wrote: > On Mon, Oct 2, 2023 at 11:13 AM Roger Pau Monne <roger.pau@citrix.com> wrote: > > > > From: Jan Beulich <jbeulich@suse.com> > > > > In preparation of the introduction of new vCPU operations allowing to > > register the respective areas (one of the two is x86-specific) by > > guest-physical address, add the necessary fork handling (with the > > backing function yet to be filled in). > > > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > > --- > > Changes since v4: > > - Rely on map_guest_area() to populate the child p2m if necessary. > > --- > > xen/arch/x86/mm/mem_sharing.c | 31 +++++++++++++++++++++++++++++++ > > xen/common/domain.c | 7 +++++++ > > 2 files changed, 38 insertions(+) > > > > diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c > > index 5f8f1fb4d871..99cf001fd70f 100644 > > --- a/xen/arch/x86/mm/mem_sharing.c > > +++ b/xen/arch/x86/mm/mem_sharing.c > > @@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu) > > hvm_set_nonreg_state(cd_vcpu, &nrs); > > } > > > > +static int copy_guest_area(struct guest_area *cd_area, > > + const struct guest_area *d_area, > > + struct vcpu *cd_vcpu, > > + const struct domain *d) > > +{ > > + unsigned int offset; > > + > > + /* Check if no area to map, or already mapped. */ > > + if ( !d_area->pg || cd_area->pg ) > > + return 0; > > + > > + offset = PAGE_OFFSET(d_area->map); > > + return map_guest_area(cd_vcpu, gfn_to_gaddr( > > + mfn_to_gfn(d, page_to_mfn(d_area->pg))) + > > + offset, > > + PAGE_SIZE - offset, cd_area, NULL); > > +} > > + > > static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu) > > { > > struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu); > > @@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d) > > return ret; > > } > > > > + /* Same for the (physically registered) runstate and time info areas. */ > > + ret = copy_guest_area(&cd_vcpu->runstate_guest_area, > > + &d_vcpu->runstate_guest_area, cd_vcpu, d); > > + if ( ret ) > > + return ret; > > + ret = copy_guest_area(&cd_vcpu->arch.time_guest_area, > > + &d_vcpu->arch.time_guest_area, cd_vcpu, d); > > + if ( ret ) > > + return ret; > > + > > ret = copy_vpmu(d_vcpu, cd_vcpu); > > if ( ret ) > > return ret; > > @@ -1950,7 +1978,10 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state, > > > > state: > > if ( reset_state ) > > + { > > rc = copy_settings(d, pd); > > + /* TBD: What to do here with -ERESTART? */ > > There is no situation where we get an -ERESTART here currently. Is > map_guest_area expected to run into situations where it fails with > that rc? Yes, there's a spin_trylock() call that will result in map_guest_area() returning -ERESTART. > If yes we might need a lock in place so we can block until it > can succeed. I'm not sure whether returning -ERESTART can actually happen in map_guest_area() for the fork case: the child domain is still paused at this point, so there can't be concurrent guest hypercalls that would also cause the domain hypercall_deadlock_mutex to be acquired. The comment was added by Jan, so I cannot be certain about the intention, neither I would like to misinterpret his words. My understanding is that future uses of copy_settings() might indeed need to report -ERESTART, and that it would need to be propagated for proper hypercall continuations at some point. Thanks, Roger.
Hi Roger, On 03/10/2023 15:29, Roger Pau Monné wrote: > On Tue, Oct 03, 2023 at 09:53:11AM -0400, Tamas K Lengyel wrote: Tamas, somehow your e-mails don't show up in my inbox (even if I am CCed) or even on lore.kernel.org/xen-devel. It is not even in my SPAM folder. >> On Mon, Oct 2, 2023 at 11:13 AM Roger Pau Monne <roger.pau@citrix.com> wrote: >>> >>> From: Jan Beulich <jbeulich@suse.com> >>> >>> In preparation of the introduction of new vCPU operations allowing to >>> register the respective areas (one of the two is x86-specific) by >>> guest-physical address, add the necessary fork handling (with the >>> backing function yet to be filled in). >>> >>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> >>> --- >>> Changes since v4: >>> - Rely on map_guest_area() to populate the child p2m if necessary. >>> --- >>> xen/arch/x86/mm/mem_sharing.c | 31 +++++++++++++++++++++++++++++++ >>> xen/common/domain.c | 7 +++++++ >>> 2 files changed, 38 insertions(+) >>> >>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c >>> index 5f8f1fb4d871..99cf001fd70f 100644 >>> --- a/xen/arch/x86/mm/mem_sharing.c >>> +++ b/xen/arch/x86/mm/mem_sharing.c >>> @@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu) >>> hvm_set_nonreg_state(cd_vcpu, &nrs); >>> } >>> >>> +static int copy_guest_area(struct guest_area *cd_area, >>> + const struct guest_area *d_area, >>> + struct vcpu *cd_vcpu, >>> + const struct domain *d) >>> +{ >>> + unsigned int offset; >>> + >>> + /* Check if no area to map, or already mapped. */ >>> + if ( !d_area->pg || cd_area->pg ) >>> + return 0; >>> + >>> + offset = PAGE_OFFSET(d_area->map); >>> + return map_guest_area(cd_vcpu, gfn_to_gaddr( >>> + mfn_to_gfn(d, page_to_mfn(d_area->pg))) + >>> + offset, >>> + PAGE_SIZE - offset, cd_area, NULL); >>> +} >>> + >>> static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu) >>> { >>> struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu); >>> @@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d) >>> return ret; >>> } >>> >>> + /* Same for the (physically registered) runstate and time info areas. */ >>> + ret = copy_guest_area(&cd_vcpu->runstate_guest_area, >>> + &d_vcpu->runstate_guest_area, cd_vcpu, d); >>> + if ( ret ) >>> + return ret; >>> + ret = copy_guest_area(&cd_vcpu->arch.time_guest_area, >>> + &d_vcpu->arch.time_guest_area, cd_vcpu, d); >>> + if ( ret ) >>> + return ret; >>> + >>> ret = copy_vpmu(d_vcpu, cd_vcpu); >>> if ( ret ) >>> return ret; >>> @@ -1950,7 +1978,10 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state, >>> >>> state: >>> if ( reset_state ) >>> + { >>> rc = copy_settings(d, pd); >>> + /* TBD: What to do here with -ERESTART? */ >> >> There is no situation where we get an -ERESTART here currently. Is >> map_guest_area expected to run into situations where it fails with >> that rc? > > Yes, there's a spin_trylock() call that will result in > map_guest_area() returning -ERESTART. > >> If yes we might need a lock in place so we can block until it >> can succeed. > > I'm not sure whether returning -ERESTART can actually happen in > map_guest_area() for the fork case: the child domain is still paused > at this point, so there can't be concurrent guest hypercalls that > would also cause the domain hypercall_deadlock_mutex to be acquired. hypercall_deadlock_mutex is also acquired by domctls. So, I believe, -ERESTART could be returned if the toolstack is also issuing domclt right at the same time as forking. Cheers,
On Tue, Oct 3, 2023 at 11:07 AM Julien Grall <julien@xen.org> wrote: > > Hi Roger, > > On 03/10/2023 15:29, Roger Pau Monné wrote: > > On Tue, Oct 03, 2023 at 09:53:11AM -0400, Tamas K Lengyel wrote: > > Tamas, somehow your e-mails don't show up in my inbox (even if I am > CCed) or even on lore.kernel.org/xen-devel. It is not even in my SPAM > folder. Thanks, I've switched mailservers, hopefully that resolves the issue. > > >> On Mon, Oct 2, 2023 at 11:13 AM Roger Pau Monne <roger.pau@citrix.com> wrote: > >>> > >>> From: Jan Beulich <jbeulich@suse.com> > >>> > >>> In preparation of the introduction of new vCPU operations allowing to > >>> register the respective areas (one of the two is x86-specific) by > >>> guest-physical address, add the necessary fork handling (with the > >>> backing function yet to be filled in). > >>> > >>> Signed-off-by: Jan Beulich <jbeulich@suse.com> > >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > >>> --- > >>> Changes since v4: > >>> - Rely on map_guest_area() to populate the child p2m if necessary. > >>> --- > >>> xen/arch/x86/mm/mem_sharing.c | 31 +++++++++++++++++++++++++++++++ > >>> xen/common/domain.c | 7 +++++++ > >>> 2 files changed, 38 insertions(+) > >>> > >>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c > >>> index 5f8f1fb4d871..99cf001fd70f 100644 > >>> --- a/xen/arch/x86/mm/mem_sharing.c > >>> +++ b/xen/arch/x86/mm/mem_sharing.c > >>> @@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu) > >>> hvm_set_nonreg_state(cd_vcpu, &nrs); > >>> } > >>> > >>> +static int copy_guest_area(struct guest_area *cd_area, > >>> + const struct guest_area *d_area, > >>> + struct vcpu *cd_vcpu, > >>> + const struct domain *d) > >>> +{ > >>> + unsigned int offset; > >>> + > >>> + /* Check if no area to map, or already mapped. */ > >>> + if ( !d_area->pg || cd_area->pg ) > >>> + return 0; > >>> + > >>> + offset = PAGE_OFFSET(d_area->map); > >>> + return map_guest_area(cd_vcpu, gfn_to_gaddr( > >>> + mfn_to_gfn(d, page_to_mfn(d_area->pg))) + > >>> + offset, > >>> + PAGE_SIZE - offset, cd_area, NULL); > >>> +} > >>> + > >>> static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu) > >>> { > >>> struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu); > >>> @@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d) > >>> return ret; > >>> } > >>> > >>> + /* Same for the (physically registered) runstate and time info areas. */ > >>> + ret = copy_guest_area(&cd_vcpu->runstate_guest_area, > >>> + &d_vcpu->runstate_guest_area, cd_vcpu, d); > >>> + if ( ret ) > >>> + return ret; > >>> + ret = copy_guest_area(&cd_vcpu->arch.time_guest_area, > >>> + &d_vcpu->arch.time_guest_area, cd_vcpu, d); > >>> + if ( ret ) > >>> + return ret; > >>> + > >>> ret = copy_vpmu(d_vcpu, cd_vcpu); > >>> if ( ret ) > >>> return ret; > >>> @@ -1950,7 +1978,10 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state, > >>> > >>> state: > >>> if ( reset_state ) > >>> + { > >>> rc = copy_settings(d, pd); > >>> + /* TBD: What to do here with -ERESTART? */ > >> > >> There is no situation where we get an -ERESTART here currently. Is > >> map_guest_area expected to run into situations where it fails with > >> that rc? > > > > Yes, there's a spin_trylock() call that will result in > > map_guest_area() returning -ERESTART. > > > >> If yes we might need a lock in place so we can block until it > >> can succeed. > > > > I'm not sure whether returning -ERESTART can actually happen in > > map_guest_area() for the fork case: the child domain is still paused > > at this point, so there can't be concurrent guest hypercalls that > > would also cause the domain hypercall_deadlock_mutex to be acquired. Perhaps turning it into an ASSERT(rc != -ERESTART) is the way to go at this point. If we run into any cases where it trips we can reason it out. > hypercall_deadlock_mutex is also acquired by domctls. So, I believe, > -ERESTART could be returned if the toolstack is also issuing domclt > right at the same time as forking. That's not a concern in this path, only toolstack can start the reset so we can assume it can coordinate not to have another toolstack messing with the fork at the same time. Thanks, Tamas
On Tue, Oct 03, 2023 at 04:25:58PM -0400, Tamas K Lengyel wrote: > On Tue, Oct 3, 2023 at 11:07 AM Julien Grall <julien@xen.org> wrote: > > > > Hi Roger, > > > > On 03/10/2023 15:29, Roger Pau Monné wrote: > > > On Tue, Oct 03, 2023 at 09:53:11AM -0400, Tamas K Lengyel wrote: > > > > Tamas, somehow your e-mails don't show up in my inbox (even if I am > > CCed) or even on lore.kernel.org/xen-devel. It is not even in my SPAM > > folder. > > Thanks, I've switched mailservers, hopefully that resolves the issue. > > > > > >> On Mon, Oct 2, 2023 at 11:13 AM Roger Pau Monne <roger.pau@citrix.com> wrote: > > >>> > > >>> From: Jan Beulich <jbeulich@suse.com> > > >>> > > >>> In preparation of the introduction of new vCPU operations allowing to > > >>> register the respective areas (one of the two is x86-specific) by > > >>> guest-physical address, add the necessary fork handling (with the > > >>> backing function yet to be filled in). > > >>> > > >>> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > > >>> --- > > >>> Changes since v4: > > >>> - Rely on map_guest_area() to populate the child p2m if necessary. > > >>> --- > > >>> xen/arch/x86/mm/mem_sharing.c | 31 +++++++++++++++++++++++++++++++ > > >>> xen/common/domain.c | 7 +++++++ > > >>> 2 files changed, 38 insertions(+) > > >>> > > >>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c > > >>> index 5f8f1fb4d871..99cf001fd70f 100644 > > >>> --- a/xen/arch/x86/mm/mem_sharing.c > > >>> +++ b/xen/arch/x86/mm/mem_sharing.c > > >>> @@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu) > > >>> hvm_set_nonreg_state(cd_vcpu, &nrs); > > >>> } > > >>> > > >>> +static int copy_guest_area(struct guest_area *cd_area, > > >>> + const struct guest_area *d_area, > > >>> + struct vcpu *cd_vcpu, > > >>> + const struct domain *d) > > >>> +{ > > >>> + unsigned int offset; > > >>> + > > >>> + /* Check if no area to map, or already mapped. */ > > >>> + if ( !d_area->pg || cd_area->pg ) > > >>> + return 0; > > >>> + > > >>> + offset = PAGE_OFFSET(d_area->map); > > >>> + return map_guest_area(cd_vcpu, gfn_to_gaddr( > > >>> + mfn_to_gfn(d, page_to_mfn(d_area->pg))) + > > >>> + offset, > > >>> + PAGE_SIZE - offset, cd_area, NULL); > > >>> +} > > >>> + > > >>> static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu) > > >>> { > > >>> struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu); > > >>> @@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d) > > >>> return ret; > > >>> } > > >>> > > >>> + /* Same for the (physically registered) runstate and time info areas. */ > > >>> + ret = copy_guest_area(&cd_vcpu->runstate_guest_area, > > >>> + &d_vcpu->runstate_guest_area, cd_vcpu, d); > > >>> + if ( ret ) > > >>> + return ret; > > >>> + ret = copy_guest_area(&cd_vcpu->arch.time_guest_area, > > >>> + &d_vcpu->arch.time_guest_area, cd_vcpu, d); > > >>> + if ( ret ) > > >>> + return ret; > > >>> + > > >>> ret = copy_vpmu(d_vcpu, cd_vcpu); > > >>> if ( ret ) > > >>> return ret; > > >>> @@ -1950,7 +1978,10 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state, > > >>> > > >>> state: > > >>> if ( reset_state ) > > >>> + { > > >>> rc = copy_settings(d, pd); > > >>> + /* TBD: What to do here with -ERESTART? */ > > >> > > >> There is no situation where we get an -ERESTART here currently. Is > > >> map_guest_area expected to run into situations where it fails with > > >> that rc? > > > > > > Yes, there's a spin_trylock() call that will result in > > > map_guest_area() returning -ERESTART. > > > > > >> If yes we might need a lock in place so we can block until it > > >> can succeed. > > > > > > I'm not sure whether returning -ERESTART can actually happen in > > > map_guest_area() for the fork case: the child domain is still paused > > > at this point, so there can't be concurrent guest hypercalls that > > > would also cause the domain hypercall_deadlock_mutex to be acquired. > > Perhaps turning it into an ASSERT(rc != -ERESTART) is the way to go at > this point. If we run into any cases where it trips we can reason it > out. In order to avoid possibly returning -ERESTART (which should never be seen by hypercall callers) we might want to convert it to -EBUSY and let the caller pick the pieces. Thanks, Roger.
Hi Roger, On 04/10/2023 09:20, Roger Pau Monné wrote: > On Tue, Oct 03, 2023 at 04:25:58PM -0400, Tamas K Lengyel wrote: >> On Tue, Oct 3, 2023 at 11:07 AM Julien Grall <julien@xen.org> wrote: >>> >>> Hi Roger, >>> >>> On 03/10/2023 15:29, Roger Pau Monné wrote: >>>> On Tue, Oct 03, 2023 at 09:53:11AM -0400, Tamas K Lengyel wrote: >>> >>> Tamas, somehow your e-mails don't show up in my inbox (even if I am >>> CCed) or even on lore.kernel.org/xen-devel. It is not even in my SPAM >>> folder. >> >> Thanks, I've switched mailservers, hopefully that resolves the issue. It did. Thanks! >> >>> >>>>> On Mon, Oct 2, 2023 at 11:13 AM Roger Pau Monne <roger.pau@citrix.com> wrote: >>>>>> >>>>>> From: Jan Beulich <jbeulich@suse.com> >>>>>> >>>>>> In preparation of the introduction of new vCPU operations allowing to >>>>>> register the respective areas (one of the two is x86-specific) by >>>>>> guest-physical address, add the necessary fork handling (with the >>>>>> backing function yet to be filled in). >>>>>> >>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> >>>>>> --- >>>>>> Changes since v4: >>>>>> - Rely on map_guest_area() to populate the child p2m if necessary. >>>>>> --- >>>>>> xen/arch/x86/mm/mem_sharing.c | 31 +++++++++++++++++++++++++++++++ >>>>>> xen/common/domain.c | 7 +++++++ >>>>>> 2 files changed, 38 insertions(+) >>>>>> >>>>>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c >>>>>> index 5f8f1fb4d871..99cf001fd70f 100644 >>>>>> --- a/xen/arch/x86/mm/mem_sharing.c >>>>>> +++ b/xen/arch/x86/mm/mem_sharing.c >>>>>> @@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu) >>>>>> hvm_set_nonreg_state(cd_vcpu, &nrs); >>>>>> } >>>>>> >>>>>> +static int copy_guest_area(struct guest_area *cd_area, >>>>>> + const struct guest_area *d_area, >>>>>> + struct vcpu *cd_vcpu, >>>>>> + const struct domain *d) >>>>>> +{ >>>>>> + unsigned int offset; >>>>>> + >>>>>> + /* Check if no area to map, or already mapped. */ >>>>>> + if ( !d_area->pg || cd_area->pg ) >>>>>> + return 0; >>>>>> + >>>>>> + offset = PAGE_OFFSET(d_area->map); >>>>>> + return map_guest_area(cd_vcpu, gfn_to_gaddr( >>>>>> + mfn_to_gfn(d, page_to_mfn(d_area->pg))) + >>>>>> + offset, >>>>>> + PAGE_SIZE - offset, cd_area, NULL); >>>>>> +} >>>>>> + >>>>>> static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu) >>>>>> { >>>>>> struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu); >>>>>> @@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d) >>>>>> return ret; >>>>>> } >>>>>> >>>>>> + /* Same for the (physically registered) runstate and time info areas. */ >>>>>> + ret = copy_guest_area(&cd_vcpu->runstate_guest_area, >>>>>> + &d_vcpu->runstate_guest_area, cd_vcpu, d); >>>>>> + if ( ret ) >>>>>> + return ret; >>>>>> + ret = copy_guest_area(&cd_vcpu->arch.time_guest_area, >>>>>> + &d_vcpu->arch.time_guest_area, cd_vcpu, d); >>>>>> + if ( ret ) >>>>>> + return ret; >>>>>> + >>>>>> ret = copy_vpmu(d_vcpu, cd_vcpu); >>>>>> if ( ret ) >>>>>> return ret; >>>>>> @@ -1950,7 +1978,10 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state, >>>>>> >>>>>> state: >>>>>> if ( reset_state ) >>>>>> + { >>>>>> rc = copy_settings(d, pd); >>>>>> + /* TBD: What to do here with -ERESTART? */ >>>>> >>>>> There is no situation where we get an -ERESTART here currently. Is >>>>> map_guest_area expected to run into situations where it fails with >>>>> that rc? >>>> >>>> Yes, there's a spin_trylock() call that will result in >>>> map_guest_area() returning -ERESTART. >>>> >>>>> If yes we might need a lock in place so we can block until it >>>>> can succeed. >>>> >>>> I'm not sure whether returning -ERESTART can actually happen in >>>> map_guest_area() for the fork case: the child domain is still paused >>>> at this point, so there can't be concurrent guest hypercalls that >>>> would also cause the domain hypercall_deadlock_mutex to be acquired. >> >> Perhaps turning it into an ASSERT(rc != -ERESTART) is the way to go at >> this point. If we run into any cases where it trips we can reason it >> out. > > In order to avoid possibly returning -ERESTART (which should never be > seen by hypercall callers) we might want to convert it to -EBUSY and > let the caller pick the pieces. I realize this is a matter of taste. I think EAGAIN is a better conversion for ERESTART because we effectively want to caller to try again. Cheers,
On Wed, Oct 04, 2023 at 12:01:21PM +0100, Julien Grall wrote: > Hi Roger, > > On 04/10/2023 09:20, Roger Pau Monné wrote: > > On Tue, Oct 03, 2023 at 04:25:58PM -0400, Tamas K Lengyel wrote: > > > On Tue, Oct 3, 2023 at 11:07 AM Julien Grall <julien@xen.org> wrote: > > > > > > > > Hi Roger, > > > > > > > > On 03/10/2023 15:29, Roger Pau Monné wrote: > > > > > On Tue, Oct 03, 2023 at 09:53:11AM -0400, Tamas K Lengyel wrote: > > > > > > > > Tamas, somehow your e-mails don't show up in my inbox (even if I am > > > > CCed) or even on lore.kernel.org/xen-devel. It is not even in my SPAM > > > > folder. > > > > > > Thanks, I've switched mailservers, hopefully that resolves the issue. > > It did. Thanks! > > > > > > > > > > > > > > On Mon, Oct 2, 2023 at 11:13 AM Roger Pau Monne <roger.pau@citrix.com> wrote: > > > > > > > > > > > > > > From: Jan Beulich <jbeulich@suse.com> > > > > > > > > > > > > > > In preparation of the introduction of new vCPU operations allowing to > > > > > > > register the respective areas (one of the two is x86-specific) by > > > > > > > guest-physical address, add the necessary fork handling (with the > > > > > > > backing function yet to be filled in). > > > > > > > > > > > > > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > > > > > > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > > > > > > > --- > > > > > > > Changes since v4: > > > > > > > - Rely on map_guest_area() to populate the child p2m if necessary. > > > > > > > --- > > > > > > > xen/arch/x86/mm/mem_sharing.c | 31 +++++++++++++++++++++++++++++++ > > > > > > > xen/common/domain.c | 7 +++++++ > > > > > > > 2 files changed, 38 insertions(+) > > > > > > > > > > > > > > diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c > > > > > > > index 5f8f1fb4d871..99cf001fd70f 100644 > > > > > > > --- a/xen/arch/x86/mm/mem_sharing.c > > > > > > > +++ b/xen/arch/x86/mm/mem_sharing.c > > > > > > > @@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu) > > > > > > > hvm_set_nonreg_state(cd_vcpu, &nrs); > > > > > > > } > > > > > > > > > > > > > > +static int copy_guest_area(struct guest_area *cd_area, > > > > > > > + const struct guest_area *d_area, > > > > > > > + struct vcpu *cd_vcpu, > > > > > > > + const struct domain *d) > > > > > > > +{ > > > > > > > + unsigned int offset; > > > > > > > + > > > > > > > + /* Check if no area to map, or already mapped. */ > > > > > > > + if ( !d_area->pg || cd_area->pg ) > > > > > > > + return 0; > > > > > > > + > > > > > > > + offset = PAGE_OFFSET(d_area->map); > > > > > > > + return map_guest_area(cd_vcpu, gfn_to_gaddr( > > > > > > > + mfn_to_gfn(d, page_to_mfn(d_area->pg))) + > > > > > > > + offset, > > > > > > > + PAGE_SIZE - offset, cd_area, NULL); > > > > > > > +} > > > > > > > + > > > > > > > static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu) > > > > > > > { > > > > > > > struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu); > > > > > > > @@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d) > > > > > > > return ret; > > > > > > > } > > > > > > > > > > > > > > + /* Same for the (physically registered) runstate and time info areas. */ > > > > > > > + ret = copy_guest_area(&cd_vcpu->runstate_guest_area, > > > > > > > + &d_vcpu->runstate_guest_area, cd_vcpu, d); > > > > > > > + if ( ret ) > > > > > > > + return ret; > > > > > > > + ret = copy_guest_area(&cd_vcpu->arch.time_guest_area, > > > > > > > + &d_vcpu->arch.time_guest_area, cd_vcpu, d); > > > > > > > + if ( ret ) > > > > > > > + return ret; > > > > > > > + > > > > > > > ret = copy_vpmu(d_vcpu, cd_vcpu); > > > > > > > if ( ret ) > > > > > > > return ret; > > > > > > > @@ -1950,7 +1978,10 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state, > > > > > > > > > > > > > > state: > > > > > > > if ( reset_state ) > > > > > > > + { > > > > > > > rc = copy_settings(d, pd); > > > > > > > + /* TBD: What to do here with -ERESTART? */ > > > > > > > > > > > > There is no situation where we get an -ERESTART here currently. Is > > > > > > map_guest_area expected to run into situations where it fails with > > > > > > that rc? > > > > > > > > > > Yes, there's a spin_trylock() call that will result in > > > > > map_guest_area() returning -ERESTART. > > > > > > > > > > > If yes we might need a lock in place so we can block until it > > > > > > can succeed. > > > > > > > > > > I'm not sure whether returning -ERESTART can actually happen in > > > > > map_guest_area() for the fork case: the child domain is still paused > > > > > at this point, so there can't be concurrent guest hypercalls that > > > > > would also cause the domain hypercall_deadlock_mutex to be acquired. > > > > > > Perhaps turning it into an ASSERT(rc != -ERESTART) is the way to go at > > > this point. If we run into any cases where it trips we can reason it > > > out. > > > > In order to avoid possibly returning -ERESTART (which should never be > > seen by hypercall callers) we might want to convert it to -EBUSY and > > let the caller pick the pieces. > > I realize this is a matter of taste. I think EAGAIN is a better conversion > for ERESTART because we effectively want to caller to try again. That's fine with me, but could we leave adding such translation to a further patch? I would rather modify Jans code as less as possible. Thanks, Roger.
On 04/10/2023 14:00, Roger Pau Monné wrote: > On Wed, Oct 04, 2023 at 12:01:21PM +0100, Julien Grall wrote: >> Hi Roger, >> >> On 04/10/2023 09:20, Roger Pau Monné wrote: >>> On Tue, Oct 03, 2023 at 04:25:58PM -0400, Tamas K Lengyel wrote: >>>> On Tue, Oct 3, 2023 at 11:07 AM Julien Grall <julien@xen.org> wrote: >>>>> >>>>> Hi Roger, >>>>> >>>>> On 03/10/2023 15:29, Roger Pau Monné wrote: >>>>>> On Tue, Oct 03, 2023 at 09:53:11AM -0400, Tamas K Lengyel wrote: >>>>> >>>>> Tamas, somehow your e-mails don't show up in my inbox (even if I am >>>>> CCed) or even on lore.kernel.org/xen-devel. It is not even in my SPAM >>>>> folder. >>>> >>>> Thanks, I've switched mailservers, hopefully that resolves the issue. >> >> It did. Thanks! >> >>>> >>>>> >>>>>>> On Mon, Oct 2, 2023 at 11:13 AM Roger Pau Monne <roger.pau@citrix.com> wrote: >>>>>>>> >>>>>>>> From: Jan Beulich <jbeulich@suse.com> >>>>>>>> >>>>>>>> In preparation of the introduction of new vCPU operations allowing to >>>>>>>> register the respective areas (one of the two is x86-specific) by >>>>>>>> guest-physical address, add the necessary fork handling (with the >>>>>>>> backing function yet to be filled in). >>>>>>>> >>>>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>>>>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> >>>>>>>> --- >>>>>>>> Changes since v4: >>>>>>>> - Rely on map_guest_area() to populate the child p2m if necessary. >>>>>>>> --- >>>>>>>> xen/arch/x86/mm/mem_sharing.c | 31 +++++++++++++++++++++++++++++++ >>>>>>>> xen/common/domain.c | 7 +++++++ >>>>>>>> 2 files changed, 38 insertions(+) >>>>>>>> >>>>>>>> diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c >>>>>>>> index 5f8f1fb4d871..99cf001fd70f 100644 >>>>>>>> --- a/xen/arch/x86/mm/mem_sharing.c >>>>>>>> +++ b/xen/arch/x86/mm/mem_sharing.c >>>>>>>> @@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu) >>>>>>>> hvm_set_nonreg_state(cd_vcpu, &nrs); >>>>>>>> } >>>>>>>> >>>>>>>> +static int copy_guest_area(struct guest_area *cd_area, >>>>>>>> + const struct guest_area *d_area, >>>>>>>> + struct vcpu *cd_vcpu, >>>>>>>> + const struct domain *d) >>>>>>>> +{ >>>>>>>> + unsigned int offset; >>>>>>>> + >>>>>>>> + /* Check if no area to map, or already mapped. */ >>>>>>>> + if ( !d_area->pg || cd_area->pg ) >>>>>>>> + return 0; >>>>>>>> + >>>>>>>> + offset = PAGE_OFFSET(d_area->map); >>>>>>>> + return map_guest_area(cd_vcpu, gfn_to_gaddr( >>>>>>>> + mfn_to_gfn(d, page_to_mfn(d_area->pg))) + >>>>>>>> + offset, >>>>>>>> + PAGE_SIZE - offset, cd_area, NULL); >>>>>>>> +} >>>>>>>> + >>>>>>>> static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu) >>>>>>>> { >>>>>>>> struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu); >>>>>>>> @@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d) >>>>>>>> return ret; >>>>>>>> } >>>>>>>> >>>>>>>> + /* Same for the (physically registered) runstate and time info areas. */ >>>>>>>> + ret = copy_guest_area(&cd_vcpu->runstate_guest_area, >>>>>>>> + &d_vcpu->runstate_guest_area, cd_vcpu, d); >>>>>>>> + if ( ret ) >>>>>>>> + return ret; >>>>>>>> + ret = copy_guest_area(&cd_vcpu->arch.time_guest_area, >>>>>>>> + &d_vcpu->arch.time_guest_area, cd_vcpu, d); >>>>>>>> + if ( ret ) >>>>>>>> + return ret; >>>>>>>> + >>>>>>>> ret = copy_vpmu(d_vcpu, cd_vcpu); >>>>>>>> if ( ret ) >>>>>>>> return ret; >>>>>>>> @@ -1950,7 +1978,10 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state, >>>>>>>> >>>>>>>> state: >>>>>>>> if ( reset_state ) >>>>>>>> + { >>>>>>>> rc = copy_settings(d, pd); >>>>>>>> + /* TBD: What to do here with -ERESTART? */ >>>>>>> >>>>>>> There is no situation where we get an -ERESTART here currently. Is >>>>>>> map_guest_area expected to run into situations where it fails with >>>>>>> that rc? >>>>>> >>>>>> Yes, there's a spin_trylock() call that will result in >>>>>> map_guest_area() returning -ERESTART. >>>>>> >>>>>>> If yes we might need a lock in place so we can block until it >>>>>>> can succeed. >>>>>> >>>>>> I'm not sure whether returning -ERESTART can actually happen in >>>>>> map_guest_area() for the fork case: the child domain is still paused >>>>>> at this point, so there can't be concurrent guest hypercalls that >>>>>> would also cause the domain hypercall_deadlock_mutex to be acquired. >>>> >>>> Perhaps turning it into an ASSERT(rc != -ERESTART) is the way to go at >>>> this point. If we run into any cases where it trips we can reason it >>>> out. >>> >>> In order to avoid possibly returning -ERESTART (which should never be >>> seen by hypercall callers) we might want to convert it to -EBUSY and >>> let the caller pick the pieces. >> >> I realize this is a matter of taste. I think EAGAIN is a better conversion >> for ERESTART because we effectively want to caller to try again. > > That's fine with me, but could we leave adding such translation to a > further patch? Wouldn't this mean that -ERESTART could be returned to the caller? If yes, then I think this should be handled here. Otherwise, we will be exposing a value that is not supposed to be exposed. Cheers,
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index 5f8f1fb4d871..99cf001fd70f 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -1641,6 +1641,24 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu) hvm_set_nonreg_state(cd_vcpu, &nrs); } +static int copy_guest_area(struct guest_area *cd_area, + const struct guest_area *d_area, + struct vcpu *cd_vcpu, + const struct domain *d) +{ + unsigned int offset; + + /* Check if no area to map, or already mapped. */ + if ( !d_area->pg || cd_area->pg ) + return 0; + + offset = PAGE_OFFSET(d_area->map); + return map_guest_area(cd_vcpu, gfn_to_gaddr( + mfn_to_gfn(d, page_to_mfn(d_area->pg))) + + offset, + PAGE_SIZE - offset, cd_area, NULL); +} + static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu) { struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu); @@ -1709,6 +1727,16 @@ static int copy_vcpu_settings(struct domain *cd, const struct domain *d) return ret; } + /* Same for the (physically registered) runstate and time info areas. */ + ret = copy_guest_area(&cd_vcpu->runstate_guest_area, + &d_vcpu->runstate_guest_area, cd_vcpu, d); + if ( ret ) + return ret; + ret = copy_guest_area(&cd_vcpu->arch.time_guest_area, + &d_vcpu->arch.time_guest_area, cd_vcpu, d); + if ( ret ) + return ret; + ret = copy_vpmu(d_vcpu, cd_vcpu); if ( ret ) return ret; @@ -1950,7 +1978,10 @@ int mem_sharing_fork_reset(struct domain *d, bool reset_state, state: if ( reset_state ) + { rc = copy_settings(d, pd); + /* TBD: What to do here with -ERESTART? */ + } domain_unpause(d); diff --git a/xen/common/domain.c b/xen/common/domain.c index d4958ec5e149..47fc90271901 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -1601,6 +1601,13 @@ void unmap_vcpu_info(struct vcpu *v) put_page_and_type(mfn_to_page(mfn)); } +int map_guest_area(struct vcpu *v, paddr_t gaddr, unsigned int size, + struct guest_area *area, + void (*populate)(void *dst, struct vcpu *v)) +{ + return -EOPNOTSUPP; +} + /* * This is only intended to be used for domain cleanup (or more generally only * with at least the respective vCPU, if it's not the current one, reliably