Message ID | 1456822933-25041-3-git-send-email-jgross@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 01.03.16 at 10:02, <JGross@suse.com> wrote: > @@ -752,14 +766,20 @@ static int vcpu_set_affinity( > struct vcpu *v, const cpumask_t *affinity, cpumask_t *which) > { > spinlock_t *lock; > + int ret = 0; > > lock = vcpu_schedule_lock_irq(v); > > - cpumask_copy(which, affinity); > + if ( v->affinity_broken ) > + ret = -EBUSY; > + else > + { > + cpumask_copy(which, affinity); > > - /* Always ask the scheduler to re-evaluate placement > - * when changing the affinity */ > - set_bit(_VPF_migrating, &v->pause_flags); > + /* Always ask the scheduler to re-evaluate placement > + * when changing the affinity */ > + set_bit(_VPF_migrating, &v->pause_flags); When you touch code like this, would it be possible to at once fix the coding style issues it (the comment in this case) has? > @@ -978,6 +998,51 @@ void watchdog_domain_destroy(struct domain *d) > kill_timer(&d->watchdog_timer[i]); > } > > +static long do_pin_temp(int cpu) As expressed before, throughout this patch I dislike the "temp" naming, when the temporary nature of this operation isn't being enforced by anything. Apart from that I (vaguely) recall there having been previous suggestions in the direction of (temporary), which have got rejected. On both points I think we need to have input from the scheduler maintainers. > +{ > + struct vcpu *v = current; > + spinlock_t *lock; > + long ret = -EINVAL; "int" seems completely sufficient for both the variable and the function return type. > @@ -1087,6 +1152,23 @@ ret_t do_sched_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) > break; > } > > + case SCHEDOP_pin_temp: > + { > + struct sched_pin_temp sched_pin_temp; > + > + ret = -EFAULT; > + if ( copy_from_guest(&sched_pin_temp, arg, 1) ) > + break; > + > + ret = -EPERM; > + if ( !is_hardware_domain(current->domain) ) > + break; I'd generally suggest swapping these two. > --- a/xen/include/public/sched.h > +++ b/xen/include/public/sched.h > @@ -118,6 +118,17 @@ > * With id != 0 and timeout != 0, poke watchdog timer and set new timeout. > */ > #define SCHEDOP_watchdog 6 > + > +/* > + * Temporarily pin the current vcpu to one physical cpu or undo that pinning. > + * @arg == pointer to sched_pin_temp_t structure. > + * > + * Setting pcpu to -1 will undo a previous temporary pinning and restore the > + * previous cpu affinity. The temporary aspect of the pinning isn't enforced > + * by the hypervisor. This comment is now out of sync with the code, since you now accept any negative CPU number as "undo" request. Jan
On 01/03/16 09:02, Juergen Gross wrote: > Some hardware (e.g. Dell studio 1555 laptops) require SMIs to be > called on physical cpu 0 only. Linux drivers like dcdbas or i8k try > to achieve this by pinning the running thread to cpu 0, but in Dom0 > this is not enough: the vcpu must be pinned to physical cpu 0 via > Xen, too. > > Add a stable hypercall option SCHEDOP_pin_temp to the sched_op > hypercall to achieve this. It is taking a physical cpu number as > parameter. If pinning is possible (the calling domain has the > privilege to make the call and the cpu is available in the domain's > cpupool) the calling vcpu is pinned to the specified cpu. The old > cpu affinity is saved. To undo the temporary pinning a cpu -1 is > specified. This will restore the original cpu affinity for the vcpu. I suggest SCHEDOP_pin_override as a name. David
On 01/03/16 12:27, Jan Beulich wrote: >>>> On 01.03.16 at 10:02, <JGross@suse.com> wrote: >> @@ -752,14 +766,20 @@ static int vcpu_set_affinity( >> struct vcpu *v, const cpumask_t *affinity, cpumask_t *which) >> { >> spinlock_t *lock; >> + int ret = 0; >> >> lock = vcpu_schedule_lock_irq(v); >> >> - cpumask_copy(which, affinity); >> + if ( v->affinity_broken ) >> + ret = -EBUSY; >> + else >> + { >> + cpumask_copy(which, affinity); >> >> - /* Always ask the scheduler to re-evaluate placement >> - * when changing the affinity */ >> - set_bit(_VPF_migrating, &v->pause_flags); >> + /* Always ask the scheduler to re-evaluate placement >> + * when changing the affinity */ >> + set_bit(_VPF_migrating, &v->pause_flags); > > When you touch code like this, would it be possible to at once fix > the coding style issues it (the comment in this case) has? Sure, NP. > >> @@ -978,6 +998,51 @@ void watchdog_domain_destroy(struct domain *d) >> kill_timer(&d->watchdog_timer[i]); >> } >> >> +static long do_pin_temp(int cpu) > > As expressed before, throughout this patch I dislike the "temp" > naming, when the temporary nature of this operation isn't being > enforced by anything. > > Apart from that I (vaguely) recall there having been previous > suggestions in the direction of (temporary), which have got > rejected. > > On both points I think we need to have input from the scheduler > maintainers. Okay. I don't mind changing the name. We should just agree on one. > >> +{ >> + struct vcpu *v = current; >> + spinlock_t *lock; >> + long ret = -EINVAL; > > "int" seems completely sufficient for both the variable and the > function return type. Hmm, yes. > >> @@ -1087,6 +1152,23 @@ ret_t do_sched_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) >> break; >> } >> >> + case SCHEDOP_pin_temp: >> + { >> + struct sched_pin_temp sched_pin_temp; >> + >> + ret = -EFAULT; >> + if ( copy_from_guest(&sched_pin_temp, arg, 1) ) >> + break; >> + >> + ret = -EPERM; >> + if ( !is_hardware_domain(current->domain) ) >> + break; > > I'd generally suggest swapping these two. Will do. > >> --- a/xen/include/public/sched.h >> +++ b/xen/include/public/sched.h >> @@ -118,6 +118,17 @@ >> * With id != 0 and timeout != 0, poke watchdog timer and set new timeout. >> */ >> #define SCHEDOP_watchdog 6 >> + >> +/* >> + * Temporarily pin the current vcpu to one physical cpu or undo that pinning. >> + * @arg == pointer to sched_pin_temp_t structure. >> + * >> + * Setting pcpu to -1 will undo a previous temporary pinning and restore the >> + * previous cpu affinity. The temporary aspect of the pinning isn't enforced >> + * by the hypervisor. > > This comment is now out of sync with the code, since you now > accept any negative CPU number as "undo" request. Will change it. Juergen
On 01/03/16 12:55, David Vrabel wrote: > On 01/03/16 09:02, Juergen Gross wrote: >> Some hardware (e.g. Dell studio 1555 laptops) require SMIs to be >> called on physical cpu 0 only. Linux drivers like dcdbas or i8k try >> to achieve this by pinning the running thread to cpu 0, but in Dom0 >> this is not enough: the vcpu must be pinned to physical cpu 0 via >> Xen, too. >> >> Add a stable hypercall option SCHEDOP_pin_temp to the sched_op >> hypercall to achieve this. It is taking a physical cpu number as >> parameter. If pinning is possible (the calling domain has the >> privilege to make the call and the cpu is available in the domain's >> cpupool) the calling vcpu is pinned to the specified cpu. The old >> cpu affinity is saved. To undo the temporary pinning a cpu -1 is >> specified. This will restore the original cpu affinity for the vcpu. > > I suggest SCHEDOP_pin_override as a name. I'm fine with that. Any objections? Juergen
On Tue, 2016-03-01 at 12:58 +0100, Juergen Gross wrote: > On 01/03/16 12:55, David Vrabel wrote: > > > > On 01/03/16 09:02, Juergen Gross wrote: > > > > > > Some hardware (e.g. Dell studio 1555 laptops) require SMIs to be > > > called on physical cpu 0 only. Linux drivers like dcdbas or i8k > > > try > > > to achieve this by pinning the running thread to cpu 0, but in > > > Dom0 > > > this is not enough: the vcpu must be pinned to physical cpu 0 via > > > Xen, too. > > > > > > Add a stable hypercall option SCHEDOP_pin_temp to the sched_op > > > hypercall to achieve this. It is taking a physical cpu number as > > > parameter. If pinning is possible (the calling domain has the > > > privilege to make the call and the cpu is available in the > > > domain's > > > cpupool) the calling vcpu is pinned to the specified cpu. The old > > > cpu affinity is saved. To undo the temporary pinning a cpu -1 is > > > specified. This will restore the original cpu affinity for the > > > vcpu. > > I suggest SCHEDOP_pin_override as a name. > > I'm fine with that. Any objections? > Not at all. I actually like it a lot. Thanks and Regards, Dario
On 01/03/16 12:15, Dario Faggioli wrote: > On Tue, 2016-03-01 at 12:58 +0100, Juergen Gross wrote: >> On 01/03/16 12:55, David Vrabel wrote: >>> >>> On 01/03/16 09:02, Juergen Gross wrote: >>>> >>>> Some hardware (e.g. Dell studio 1555 laptops) require SMIs to be >>>> called on physical cpu 0 only. Linux drivers like dcdbas or i8k >>>> try >>>> to achieve this by pinning the running thread to cpu 0, but in >>>> Dom0 >>>> this is not enough: the vcpu must be pinned to physical cpu 0 via >>>> Xen, too. >>>> >>>> Add a stable hypercall option SCHEDOP_pin_temp to the sched_op >>>> hypercall to achieve this. It is taking a physical cpu number as >>>> parameter. If pinning is possible (the calling domain has the >>>> privilege to make the call and the cpu is available in the >>>> domain's >>>> cpupool) the calling vcpu is pinned to the specified cpu. The old >>>> cpu affinity is saved. To undo the temporary pinning a cpu -1 is >>>> specified. This will restore the original cpu affinity for the >>>> vcpu. >>> I suggest SCHEDOP_pin_override as a name. >> >> I'm fine with that. Any objections? >> > Not at all. I actually like it a lot. +1 to the name. -George
On 01/03/16 09:02, Juergen Gross wrote: > Some hardware (e.g. Dell studio 1555 laptops) require SMIs to be > called on physical cpu 0 only. Linux drivers like dcdbas or i8k try > to achieve this by pinning the running thread to cpu 0, but in Dom0 > this is not enough: the vcpu must be pinned to physical cpu 0 via > Xen, too. > > Add a stable hypercall option SCHEDOP_pin_temp to the sched_op > hypercall to achieve this. It is taking a physical cpu number as > parameter. If pinning is possible (the calling domain has the > privilege to make the call and the cpu is available in the domain's > cpupool) the calling vcpu is pinned to the specified cpu. The old > cpu affinity is saved. To undo the temporary pinning a cpu -1 is > specified. This will restore the original cpu affinity for the vcpu. > > Signed-off-by: Juergen Gross <jgross@suse.com> > --- > V2: - limit operation to hardware domain as suggested by Jan Beulich > - some style issues corrected as requested by Jan Beulich > - use fixed width types in interface as requested by Jan Beulich > - add compat layer checking as requested by Jan Beulich > --- > xen/common/compat/schedule.c | 4 ++ > xen/common/schedule.c | 92 +++++++++++++++++++++++++++++++++++++++++--- > xen/include/public/sched.h | 17 ++++++++ > xen/include/xlat.lst | 1 + > 4 files changed, 109 insertions(+), 5 deletions(-) > > diff --git a/xen/common/compat/schedule.c b/xen/common/compat/schedule.c > index 812c550..73b0f01 100644 > --- a/xen/common/compat/schedule.c > +++ b/xen/common/compat/schedule.c > @@ -10,6 +10,10 @@ > > #define do_sched_op compat_sched_op > > +#define xen_sched_pin_temp sched_pin_temp > +CHECK_sched_pin_temp; > +#undef xen_sched_pin_temp > + > #define xen_sched_shutdown sched_shutdown > CHECK_sched_shutdown; > #undef xen_sched_shutdown > diff --git a/xen/common/schedule.c b/xen/common/schedule.c > index b0d4b18..653f852 100644 > --- a/xen/common/schedule.c > +++ b/xen/common/schedule.c > @@ -271,6 +271,12 @@ int sched_move_domain(struct domain *d, struct cpupool *c) > struct scheduler *old_ops; > void *old_domdata; > > + for_each_vcpu ( d, v ) > + { > + if ( v->affinity_broken ) > + return -EBUSY; > + } > + > domdata = SCHED_OP(c->sched, alloc_domdata, d); > if ( domdata == NULL ) > return -ENOMEM; > @@ -669,6 +675,14 @@ int cpu_disable_scheduler(unsigned int cpu) > if ( cpumask_empty(&online_affinity) && > cpumask_test_cpu(cpu, v->cpu_hard_affinity) ) > { > + if ( v->affinity_broken ) > + { > + /* The vcpu is temporarily pinned, can't move it. */ > + vcpu_schedule_unlock_irqrestore(lock, flags, v); > + ret = -EBUSY; > + break; > + } Does this mean that if the user closes the laptop lid while one of these drivers has vcpu0 pinned, that Xen will crash (see xen/arch/x86/smpboot.c:__cpu_disable())? Or is it the OS's job to make sure that all temporary pins are removed before suspending? Also -- have you actually tested the "cpupool move while pinned" functionality to make sure it actually works? There's a weird bit in cpupool_unassign_cpu_helper() where after calling cpu_disable_scheduler(cpu), it unconditionally sets the cpu bit in the cpupool_free_cpus mask, even if it returns an error. That can't be right, even for the existing -EAGAIN case, can it? I see that you have a loop to retry this call several times in the next patch; but what if it fails every time -- what state is the system in? And, in general, what happens if the device driver gets mixed up and forgets to unpin the vcpu? Is the only recourse to reboot your host (or deal with the fact that you can't reconfigure your cpupools)? -George
On 01/03/16 15:52, George Dunlap wrote: > On 01/03/16 09:02, Juergen Gross wrote: >> Some hardware (e.g. Dell studio 1555 laptops) require SMIs to be >> called on physical cpu 0 only. Linux drivers like dcdbas or i8k try >> to achieve this by pinning the running thread to cpu 0, but in Dom0 >> this is not enough: the vcpu must be pinned to physical cpu 0 via >> Xen, too. >> >> Add a stable hypercall option SCHEDOP_pin_temp to the sched_op >> hypercall to achieve this. It is taking a physical cpu number as >> parameter. If pinning is possible (the calling domain has the >> privilege to make the call and the cpu is available in the domain's >> cpupool) the calling vcpu is pinned to the specified cpu. The old >> cpu affinity is saved. To undo the temporary pinning a cpu -1 is >> specified. This will restore the original cpu affinity for the vcpu. >> >> Signed-off-by: Juergen Gross <jgross@suse.com> >> --- >> V2: - limit operation to hardware domain as suggested by Jan Beulich >> - some style issues corrected as requested by Jan Beulich >> - use fixed width types in interface as requested by Jan Beulich >> - add compat layer checking as requested by Jan Beulich >> --- >> xen/common/compat/schedule.c | 4 ++ >> xen/common/schedule.c | 92 +++++++++++++++++++++++++++++++++++++++++--- >> xen/include/public/sched.h | 17 ++++++++ >> xen/include/xlat.lst | 1 + >> 4 files changed, 109 insertions(+), 5 deletions(-) >> >> diff --git a/xen/common/compat/schedule.c b/xen/common/compat/schedule.c >> index 812c550..73b0f01 100644 >> --- a/xen/common/compat/schedule.c >> +++ b/xen/common/compat/schedule.c >> @@ -10,6 +10,10 @@ >> >> #define do_sched_op compat_sched_op >> >> +#define xen_sched_pin_temp sched_pin_temp >> +CHECK_sched_pin_temp; >> +#undef xen_sched_pin_temp >> + >> #define xen_sched_shutdown sched_shutdown >> CHECK_sched_shutdown; >> #undef xen_sched_shutdown >> diff --git a/xen/common/schedule.c b/xen/common/schedule.c >> index b0d4b18..653f852 100644 >> --- a/xen/common/schedule.c >> +++ b/xen/common/schedule.c >> @@ -271,6 +271,12 @@ int sched_move_domain(struct domain *d, struct cpupool *c) >> struct scheduler *old_ops; >> void *old_domdata; >> >> + for_each_vcpu ( d, v ) >> + { >> + if ( v->affinity_broken ) >> + return -EBUSY; >> + } >> + >> domdata = SCHED_OP(c->sched, alloc_domdata, d); >> if ( domdata == NULL ) >> return -ENOMEM; >> @@ -669,6 +675,14 @@ int cpu_disable_scheduler(unsigned int cpu) >> if ( cpumask_empty(&online_affinity) && >> cpumask_test_cpu(cpu, v->cpu_hard_affinity) ) >> { >> + if ( v->affinity_broken ) >> + { >> + /* The vcpu is temporarily pinned, can't move it. */ >> + vcpu_schedule_unlock_irqrestore(lock, flags, v); >> + ret = -EBUSY; >> + break; >> + } > > Does this mean that if the user closes the laptop lid while one of these > drivers has vcpu0 pinned, that Xen will crash (see > xen/arch/x86/smpboot.c:__cpu_disable())? Or is it the OS's job to make > sure that all temporary pins are removed before suspending? > > Also -- have you actually tested the "cpupool move while pinned" > functionality to make sure it actually works? There's a weird bit in > cpupool_unassign_cpu_helper() where after calling > cpu_disable_scheduler(cpu), it unconditionally sets the cpu bit in the > cpupool_free_cpus mask, even if it returns an error. That can't be > right, even for the existing -EAGAIN case, can it? > > I see that you have a loop to retry this call several times in the next > patch; but what if it fails every time -- what state is the system in? > > And, in general, what happens if the device driver gets mixed up and > forgets to unpin the vcpu? Is the only recourse to reboot your host (or > deal with the fact that you can't reconfigure your cpupools)? (I should say, I think this probably is the best solution to this problem; I just want to make sure we think about the error cases carefully.) -George
>>> On 01.03.16 at 16:55, <george.dunlap@citrix.com> wrote: > On 01/03/16 15:52, George Dunlap wrote: >> On 01/03/16 09:02, Juergen Gross wrote: >>> --- a/xen/common/schedule.c >>> +++ b/xen/common/schedule.c >>> @@ -271,6 +271,12 @@ int sched_move_domain(struct domain *d, struct cpupool *c) >>> struct scheduler *old_ops; >>> void *old_domdata; >>> >>> + for_each_vcpu ( d, v ) >>> + { >>> + if ( v->affinity_broken ) >>> + return -EBUSY; >>> + } >>> + >>> domdata = SCHED_OP(c->sched, alloc_domdata, d); >>> if ( domdata == NULL ) >>> return -ENOMEM; >>> @@ -669,6 +675,14 @@ int cpu_disable_scheduler(unsigned int cpu) >>> if ( cpumask_empty(&online_affinity) && >>> cpumask_test_cpu(cpu, v->cpu_hard_affinity) ) >>> { >>> + if ( v->affinity_broken ) >>> + { >>> + /* The vcpu is temporarily pinned, can't move it. */ >>> + vcpu_schedule_unlock_irqrestore(lock, flags, v); >>> + ret = -EBUSY; >>> + break; >>> + } >> >> Does this mean that if the user closes the laptop lid while one of these >> drivers has vcpu0 pinned, that Xen will crash (see >> xen/arch/x86/smpboot.c:__cpu_disable())? Or is it the OS's job to make >> sure that all temporary pins are removed before suspending? >> >> Also -- have you actually tested the "cpupool move while pinned" >> functionality to make sure it actually works? There's a weird bit in >> cpupool_unassign_cpu_helper() where after calling >> cpu_disable_scheduler(cpu), it unconditionally sets the cpu bit in the >> cpupool_free_cpus mask, even if it returns an error. That can't be >> right, even for the existing -EAGAIN case, can it? >> >> I see that you have a loop to retry this call several times in the next >> patch; but what if it fails every time -- what state is the system in? >> >> And, in general, what happens if the device driver gets mixed up and >> forgets to unpin the vcpu? Is the only recourse to reboot your host (or >> deal with the fact that you can't reconfigure your cpupools)? > > (I should say, I think this probably is the best solution to this > problem; I just want to make sure we think about the error cases carefully.) I guess in the worst case there could be a utility or xl command doing the missing unpin in such an emergency? Jan
On 01/03/16 16:52, George Dunlap wrote: > On 01/03/16 09:02, Juergen Gross wrote: >> Some hardware (e.g. Dell studio 1555 laptops) require SMIs to be >> called on physical cpu 0 only. Linux drivers like dcdbas or i8k try >> to achieve this by pinning the running thread to cpu 0, but in Dom0 >> this is not enough: the vcpu must be pinned to physical cpu 0 via >> Xen, too. >> >> Add a stable hypercall option SCHEDOP_pin_temp to the sched_op >> hypercall to achieve this. It is taking a physical cpu number as >> parameter. If pinning is possible (the calling domain has the >> privilege to make the call and the cpu is available in the domain's >> cpupool) the calling vcpu is pinned to the specified cpu. The old >> cpu affinity is saved. To undo the temporary pinning a cpu -1 is >> specified. This will restore the original cpu affinity for the vcpu. >> >> Signed-off-by: Juergen Gross <jgross@suse.com> >> --- >> V2: - limit operation to hardware domain as suggested by Jan Beulich >> - some style issues corrected as requested by Jan Beulich >> - use fixed width types in interface as requested by Jan Beulich >> - add compat layer checking as requested by Jan Beulich >> --- >> xen/common/compat/schedule.c | 4 ++ >> xen/common/schedule.c | 92 +++++++++++++++++++++++++++++++++++++++++--- >> xen/include/public/sched.h | 17 ++++++++ >> xen/include/xlat.lst | 1 + >> 4 files changed, 109 insertions(+), 5 deletions(-) >> >> diff --git a/xen/common/compat/schedule.c b/xen/common/compat/schedule.c >> index 812c550..73b0f01 100644 >> --- a/xen/common/compat/schedule.c >> +++ b/xen/common/compat/schedule.c >> @@ -10,6 +10,10 @@ >> >> #define do_sched_op compat_sched_op >> >> +#define xen_sched_pin_temp sched_pin_temp >> +CHECK_sched_pin_temp; >> +#undef xen_sched_pin_temp >> + >> #define xen_sched_shutdown sched_shutdown >> CHECK_sched_shutdown; >> #undef xen_sched_shutdown >> diff --git a/xen/common/schedule.c b/xen/common/schedule.c >> index b0d4b18..653f852 100644 >> --- a/xen/common/schedule.c >> +++ b/xen/common/schedule.c >> @@ -271,6 +271,12 @@ int sched_move_domain(struct domain *d, struct cpupool *c) >> struct scheduler *old_ops; >> void *old_domdata; >> >> + for_each_vcpu ( d, v ) >> + { >> + if ( v->affinity_broken ) >> + return -EBUSY; >> + } >> + >> domdata = SCHED_OP(c->sched, alloc_domdata, d); >> if ( domdata == NULL ) >> return -ENOMEM; >> @@ -669,6 +675,14 @@ int cpu_disable_scheduler(unsigned int cpu) >> if ( cpumask_empty(&online_affinity) && >> cpumask_test_cpu(cpu, v->cpu_hard_affinity) ) >> { >> + if ( v->affinity_broken ) >> + { >> + /* The vcpu is temporarily pinned, can't move it. */ >> + vcpu_schedule_unlock_irqrestore(lock, flags, v); >> + ret = -EBUSY; >> + break; >> + } > > Does this mean that if the user closes the laptop lid while one of these > drivers has vcpu0 pinned, that Xen will crash (see > xen/arch/x86/smpboot.c:__cpu_disable())? Or is it the OS's job to make > sure that all temporary pins are removed before suspending? Yes, this must be ensured by the OS. > Also -- have you actually tested the "cpupool move while pinned" > functionality to make sure it actually works? There's a weird bit in > cpupool_unassign_cpu_helper() where after calling > cpu_disable_scheduler(cpu), it unconditionally sets the cpu bit in the > cpupool_free_cpus mask, even if it returns an error. That can't be > right, even for the existing -EAGAIN case, can it? That should be no problem. Such a failure can be repaired easily by adding the cpu to the cpupool again. Adding a comment seems to be a good idea. :-) What is wrong and even worse, schedule_cpu_switch() returning an error will leak domlist_read_lock. I'll write another patch to correct this issue. > I see that you have a loop to retry this call several times in the next > patch; but what if it fails every time -- what state is the system in? The cpu can be added to the original cpupool via "xl cpupool-add" again. > And, in general, what happens if the device driver gets mixed up and > forgets to unpin the vcpu? Is the only recourse to reboot your host (or > deal with the fact that you can't reconfigure your cpupools)? Unless we add a "forced" option to "xl vcpu-pin", yes. Thanks for the thorough review, Juergen
On Wed, 2016-03-02 at 08:14 +0100, Juergen Gross wrote: > On 01/03/16 16:52, George Dunlap wrote: > > > > > > Also -- have you actually tested the "cpupool move while pinned" > > functionality to make sure it actually works? There's a weird bit > > in > > cpupool_unassign_cpu_helper() where after calling > > cpu_disable_scheduler(cpu), it unconditionally sets the cpu bit in > > the > > cpupool_free_cpus mask, even if it returns an error. That can't be > > right, even for the existing -EAGAIN case, can it? > That should be no problem. Such a failure can be repaired easily by > adding the cpu to the cpupool again. > And there's not much else one can do, I would say. When we are in cpu_disable_scheduler(), coming from cpupool_unassign_cpu()-->cpupool_unassign_cpu() we're already halfway through removing the cpu from the pool (e.g., we already cleared the relevant bit from the cpupool's cpu_valid mask). And we don't actually want to revert that, as doing so would allow the scheduler to start again moving vcpus to that cpu (and the following attempts will risk failing with EAGAIN again :-D). FWIW, I've also found that part rather weird for quite some time... But it does indeed makes sense, IMO. > Adding a comment seems to be a > good idea. :-) > Yep. Should we also add an error message for the user to be able to see it, even if she can't read the comment in the source code? (Not necessarily right there, if that would make it trigger too much... just in a place where it can be seen in the case the user actually need to do something). > What is wrong and even worse, schedule_cpu_switch() returning an > error > will leak domlist_read_lock. > Indeed, good catch. :-) > > And, in general, what happens if the device driver gets mixed up > > and > > forgets to unpin the vcpu? Is the only recourse to reboot your > > host (or > > deal with the fact that you can't reconfigure your cpupools)? > Unless we add a "forced" option to "xl vcpu-pin", yes. > Which would be fine to have, IMO. I'm not sure if it would better be an `xl vcpu-pin' flag, or a separate utility (as Jan is also saying). A separate utility would fit better the "emergency nature" of the thing, avoiding having to clobber xl for that (as this will be the only, pretty uncommon, case where such flag would be needed). However, an xl flag is easier to add, easier to document and easier and more natural to find, from the point of view of an user that really needs it. And perhaps it could turn out useful for other situations in future. So, I guess I'd say: - yes, let's add that - let's do it as a "force flag" of `xl vcpu-pin'. Regards, Dario
On 02/03/16 10:27, Dario Faggioli wrote: > On Wed, 2016-03-02 at 08:14 +0100, Juergen Gross wrote: >> On 01/03/16 16:52, George Dunlap wrote: >>> >>> >>> Also -- have you actually tested the "cpupool move while pinned" >>> functionality to make sure it actually works? There's a weird bit >>> in >>> cpupool_unassign_cpu_helper() where after calling >>> cpu_disable_scheduler(cpu), it unconditionally sets the cpu bit in >>> the >>> cpupool_free_cpus mask, even if it returns an error. That can't be >>> right, even for the existing -EAGAIN case, can it? >> That should be no problem. Such a failure can be repaired easily by >> adding the cpu to the cpupool again. >> > And there's not much else one can do, I would say. When we are in > cpu_disable_scheduler(), coming from > cpupool_unassign_cpu()-->cpupool_unassign_cpu() we're already halfway > through removing the cpu from the pool (e.g., we already cleared the > relevant bit from the cpupool's cpu_valid mask). > > And we don't actually want to revert that, as doing so would allow the > scheduler to start again moving vcpus to that cpu (and the following > attempts will risk failing with EAGAIN again :-D). Yep. > > FWIW, I've also found that part rather weird for quite some time... But > it does indeed makes sense, IMO. > >> Adding a comment seems to be a >> good idea. :-) >> > Yep. Should we also add an error message for the user to be able to see > it, even if she can't read the comment in the source code? (Not > necessarily right there, if that would make it trigger too much... just > in a place where it can be seen in the case the user actually need to > do something). I'd rather add the error message to xl. That's where the user will see it and where he can react at once. The message can even tell the user the correct command, which would be a very strange thing to do in the hypervisor. Another patch, I guess. :-) > >> What is wrong and even worse, schedule_cpu_switch() returning an >> error >> will leak domlist_read_lock. >> > Indeed, good catch. :-) > >>> And, in general, what happens if the device driver gets mixed up >>> and >>> forgets to unpin the vcpu? Is the only recourse to reboot your >>> host (or >>> deal with the fact that you can't reconfigure your cpupools)? >> Unless we add a "forced" option to "xl vcpu-pin", yes. >> > Which would be fine to have, IMO. I'm not sure if it would better be an > `xl vcpu-pin' flag, or a separate utility (as Jan is also saying). > > A separate utility would fit better the "emergency nature" of the > thing, avoiding having to clobber xl for that (as this will be the > only, pretty uncommon, case where such flag would be needed). > > However, an xl flag is easier to add, easier to document and easier and > more natural to find, from the point of view of an user that really > needs it. And perhaps it could turn out useful for other situations in > future. So, I guess I'd say: > - yes, let's add that > - let's do it as a "force flag" of `xl vcpu-pin'. Okay, patch will follow... Juergen
On Wed, 2016-03-02 at 12:19 +0100, Juergen Gross wrote: > On 02/03/16 10:27, Dario Faggioli wrote: > > > > Yep. Should we also add an error message for the user to be able to > > see > > it, even if she can't read the comment in the source code? (Not > > necessarily right there, if that would make it trigger too much... > > just > > in a place where it can be seen in the case the user actually need > > to > > do something). > I'd rather add the error message to xl. That's where the user will > see > it and where he can react at once. The message can even tell the user > the correct command, which would be a very strange thing to do in the > hypervisor. > Sure, wherever it's most useful. > Another patch, I guess. :-) > Yeah, sorry. :-) Regards, Dario
On 02/03/16 12:49, Dario Faggioli wrote: > On Wed, 2016-03-02 at 12:19 +0100, Juergen Gross wrote: >> On 02/03/16 10:27, Dario Faggioli wrote: >>> >>> Yep. Should we also add an error message for the user to be able to >>> see >>> it, even if she can't read the comment in the source code? (Not >>> necessarily right there, if that would make it trigger too much... >>> just >>> in a place where it can be seen in the case the user actually need >>> to >>> do something). >> I'd rather add the error message to xl. That's where the user will >> see >> it and where he can react at once. The message can even tell the user >> the correct command, which would be a very strange thing to do in the >> hypervisor. >> > Sure, wherever it's most useful. > >> Another patch, I guess. :-) >> > Yeah, sorry. :-) Sarcio ergo sum! :-) Juergen
On 02/03/16 10:27, Dario Faggioli wrote: > On Wed, 2016-03-02 at 08:14 +0100, Juergen Gross wrote: >> On 01/03/16 16:52, George Dunlap wrote: >>> >>> >>> Also -- have you actually tested the "cpupool move while pinned" >>> functionality to make sure it actually works? There's a weird bit >>> in >>> cpupool_unassign_cpu_helper() where after calling >>> cpu_disable_scheduler(cpu), it unconditionally sets the cpu bit in >>> the >>> cpupool_free_cpus mask, even if it returns an error. That can't be >>> right, even for the existing -EAGAIN case, can it? >> That should be no problem. Such a failure can be repaired easily by >> adding the cpu to the cpupool again. >> > And there's not much else one can do, I would say. When we are in > cpu_disable_scheduler(), coming from > cpupool_unassign_cpu()-->cpupool_unassign_cpu() we're already halfway > through removing the cpu from the pool (e.g., we already cleared the > relevant bit from the cpupool's cpu_valid mask). > > And we don't actually want to revert that, as doing so would allow the > scheduler to start again moving vcpus to that cpu (and the following > attempts will risk failing with EAGAIN again :-D). > > FWIW, I've also found that part rather weird for quite some time... But > it does indeed makes sense, IMO. > >> Adding a comment seems to be a >> good idea. :-) >> > Yep. Should we also add an error message for the user to be able to see > it, even if she can't read the comment in the source code? (Not > necessarily right there, if that would make it trigger too much... just > in a place where it can be seen in the case the user actually need to > do something). > >> What is wrong and even worse, schedule_cpu_switch() returning an >> error >> will leak domlist_read_lock. >> > Indeed, good catch. :-) > >>> And, in general, what happens if the device driver gets mixed up >>> and >>> forgets to unpin the vcpu? Is the only recourse to reboot your >>> host (or >>> deal with the fact that you can't reconfigure your cpupools)? >> Unless we add a "forced" option to "xl vcpu-pin", yes. >> > Which would be fine to have, IMO. I'm not sure if it would better be an > `xl vcpu-pin' flag, or a separate utility (as Jan is also saying). > > A separate utility would fit better the "emergency nature" of the > thing, avoiding having to clobber xl for that (as this will be the > only, pretty uncommon, case where such flag would be needed). > > However, an xl flag is easier to add, easier to document and easier and > more natural to find, from the point of view of an user that really > needs it. And perhaps it could turn out useful for other situations in > future. So, I guess I'd say: > - yes, let's add that > - let's do it as a "force flag" of `xl vcpu-pin'. Which raises the question: how to do that on the libxl level? a) expand libxl_set_vcpuaffinity() with another parameter (is this even possible? I could do some ifdeffery, but the API would change...) b) add a libxl_set_vcpuaffinity_force() variant c) imply the force flag by specifying both hard and soft maps as NULL (it _is_ basically just that: keep both affinity sets), implying that it makes no sense to specify any affinities with the -f flag (which renders the "force" meaning rather strange, would be more a "restore" now). Juergen > > Regards, > Dario >
On Wed, 2016-03-02 at 16:34 +0100, Juergen Gross wrote: > On 02/03/16 10:27, Dario Faggioli wrote: > > > > However, an xl flag is easier to add, easier to document and easier > > and > > more natural to find, from the point of view of an user that really > > needs it. And perhaps it could turn out useful for other situations > > in > > future. So, I guess I'd say: > > - yes, let's add that > > - let's do it as a "force flag" of `xl vcpu-pin'. > Which raises the question: how to do that on the libxl level? > Ah, right. > a) expand libxl_set_vcpuaffinity() with another parameter (is this > even > possible? I could do some ifdeffery, but the API would change...) > > b) add a libxl_set_vcpuaffinity_force() variant > > c) imply the force flag by specifying both hard and soft maps as NULL > (it _is_ basically just that: keep both affinity sets), implying > that > it makes no sense to specify any affinities with the -f flag > (which > renders the "force" meaning rather strange, would be more a > "restore" > now). > Eheh, tools' maintainers' call. My preference would be b). I don't like a), mostly because that would mean everyone will need to specify a parameter that it is really only necessary in special cases. I could live with c), but it indeed makes the semantic too convoluted for my taste. I guess, however, that even if going for b), we need to decide whether to require a cpumask or not, and what to do if one passes NULL. Maybe we can have a cpumask parameter and, - if it is not NULL, force affinity to that, - if it is NULL, just 'restore'; what do you think? Actually, at Xen level, the override only acts on hard affinity... should libxl take only one cpumask (for hard affinity only), or both hard and soft? I'd say just one for hard is enough, unless we want to make space for a potential future situation where we will want to break and restore soft affinity as well... Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
On 02/03/16 17:03, Dario Faggioli wrote: > On Wed, 2016-03-02 at 16:34 +0100, Juergen Gross wrote: >> On 02/03/16 10:27, Dario Faggioli wrote: >>> >>> However, an xl flag is easier to add, easier to document and easier >>> and >>> more natural to find, from the point of view of an user that really >>> needs it. And perhaps it could turn out useful for other situations >>> in >>> future. So, I guess I'd say: >>> - yes, let's add that >>> - let's do it as a "force flag" of `xl vcpu-pin'. >> Which raises the question: how to do that on the libxl level? >> > Ah, right. > >> a) expand libxl_set_vcpuaffinity() with another parameter (is this >> even >> possible? I could do some ifdeffery, but the API would change...) >> >> b) add a libxl_set_vcpuaffinity_force() variant >> >> c) imply the force flag by specifying both hard and soft maps as NULL >> (it _is_ basically just that: keep both affinity sets), implying >> that >> it makes no sense to specify any affinities with the -f flag >> (which >> renders the "force" meaning rather strange, would be more a >> "restore" >> now). >> > Eheh, tools' maintainers' call. My preference would be b). > > I don't like a), mostly because that would mean everyone will need to > specify a parameter that it is really only necessary in special cases. > > I could live with c), but it indeed makes the semantic too convoluted > for my taste. > > I guess, however, that even if going for b), we need to decide whether > to require a cpumask or not, and what to do if one passes NULL. Maybe > we can have a cpumask parameter and, > - if it is not NULL, force affinity to that, > - if it is NULL, just 'restore'; > what do you think? I would just let the force flag restore the old setting (thus clearing the affinity_broken flag) and then apply the normal affinity settings. > Actually, at Xen level, the override only acts on hard affinity... > should libxl take only one cpumask (for hard affinity only), or both > hard and soft? Just as the user is specifying: 0, 1 or 2. > I'd say just one for hard is enough, unless we want to make space for a > potential future situation where we will want to break and restore soft > affinity as well... The force flag would be just an add-on. That's rather easy in the hypervisor and in the tools. Juergen
Hi, -----Original Message----- From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of George Dunlap Sent: 01 March 2016 15:53 To: Juergen Gross <jgross@suse.com>; xen-devel@lists.xen.org Cc: Wei Liu <wei.liu2@citrix.com>; Stefano Stabellini <Stefano.Stabellini@citrix.com>; George Dunlap <George.Dunlap@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Dario Faggioli <dario.faggioli@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; David Vrabel <david.vrabel@citrix.com>; jbeulich@suse.com Subject: Re: [Xen-devel] [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu On 01/03/16 09:02, Juergen Gross wrote: > Some hardware (e.g. Dell studio 1555 laptops) require SMIs to be > called on physical cpu 0 only. Linux drivers like dcdbas or i8k try to > achieve this by pinning the running thread to cpu 0, but in Dom0 this > is not enough: the vcpu must be pinned to physical cpu 0 via Xen, too. > > Add a stable hypercall option SCHEDOP_pin_temp to the sched_op > hypercall to achieve this. It is taking a physical cpu number as > parameter. If pinning is possible (the calling domain has the > privilege to make the call and the cpu is available in the domain's > cpupool) the calling vcpu is pinned to the specified cpu. The old cpu > affinity is saved. To undo the temporary pinning a cpu -1 is > specified. This will restore the original cpu affinity for the vcpu. > > Signed-off-by: Juergen Gross <jgross@suse.com> > --- > V2: - limit operation to hardware domain as suggested by Jan Beulich > - some style issues corrected as requested by Jan Beulich > - use fixed width types in interface as requested by Jan Beulich > - add compat layer checking as requested by Jan Beulich > --- > xen/common/compat/schedule.c | 4 ++ > xen/common/schedule.c | 92 +++++++++++++++++++++++++++++++++++++++++--- > xen/include/public/sched.h | 17 ++++++++ > xen/include/xlat.lst | 1 + > 4 files changed, 109 insertions(+), 5 deletions(-) > > diff --git a/xen/common/compat/schedule.c > b/xen/common/compat/schedule.c index 812c550..73b0f01 100644 > --- a/xen/common/compat/schedule.c > +++ b/xen/common/compat/schedule.c > @@ -10,6 +10,10 @@ > > #define do_sched_op compat_sched_op > > +#define xen_sched_pin_temp sched_pin_temp CHECK_sched_pin_temp; > +#undef xen_sched_pin_temp > + > #define xen_sched_shutdown sched_shutdown CHECK_sched_shutdown; > #undef xen_sched_shutdown diff --git a/xen/common/schedule.c > b/xen/common/schedule.c index b0d4b18..653f852 100644 > --- a/xen/common/schedule.c > +++ b/xen/common/schedule.c > @@ -271,6 +271,12 @@ int sched_move_domain(struct domain *d, struct cpupool *c) > struct scheduler *old_ops; > void *old_domdata; > > + for_each_vcpu ( d, v ) > + { > + if ( v->affinity_broken ) > + return -EBUSY; > + } > + > domdata = SCHED_OP(c->sched, alloc_domdata, d); > if ( domdata == NULL ) > return -ENOMEM; > @@ -669,6 +675,14 @@ int cpu_disable_scheduler(unsigned int cpu) > if ( cpumask_empty(&online_affinity) && > cpumask_test_cpu(cpu, v->cpu_hard_affinity) ) > { > + if ( v->affinity_broken ) > + { > + /* The vcpu is temporarily pinned, can't move it. */ > + vcpu_schedule_unlock_irqrestore(lock, flags, v); > + ret = -EBUSY; > + break; > + } Does this mean that if the user closes the laptop lid while one of these drivers has vcpu0 pinned, that Xen will crash (see xen/arch/x86/smpboot.c:__cpu_disable())? Or is it the OS's job to make sure that all temporary pins are removed before suspending? Also -- have you actually tested the "cpupool move while pinned" functionality to make sure it actually works? There's a weird bit in cpupool_unassign_cpu_helper() where after calling cpu_disable_scheduler(cpu), it unconditionally sets the cpu bit in the cpupool_free_cpus mask, even if it returns an error. That can't be right, even for the existing -EAGAIN case, can it? I see that you have a loop to retry this call several times in the next patch; but what if it fails every time -- what state is the system in? And, in general, what happens if the device driver gets mixed up and forgets to unpin the vcpu? Is the only recourse to reboot your host (or deal with the fact that you can't reconfigure your cpupools)? -George Sorry, lost the original thread so replying at the top of mail chain. +static XSM_INLINE int xsm_schedop_pin_temp(XSM_DEFAULT_VOID) +{ + XSM_ASSERT_ACTION(XSM_PRIV); + return xsm_default_action(action, current->domain, NULL); +} Is the intention is to restrict the hypercall usage to dom0 only ? Anshul Makkar _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 02/03/16 18:21, Anshul Makkar wrote: > Hi, > > > -----Original Message----- > From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of George Dunlap > Sent: 01 March 2016 15:53 > To: Juergen Gross <jgross@suse.com>; xen-devel@lists.xen.org > Cc: Wei Liu <wei.liu2@citrix.com>; Stefano Stabellini <Stefano.Stabellini@citrix.com>; George Dunlap <George.Dunlap@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Dario Faggioli <dario.faggioli@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; David Vrabel <david.vrabel@citrix.com>; jbeulich@suse.com > Subject: Re: [Xen-devel] [PATCH v2 2/3] xen: add hypercall option to temporarily pin a vcpu > > On 01/03/16 09:02, Juergen Gross wrote: >> Some hardware (e.g. Dell studio 1555 laptops) require SMIs to be >> called on physical cpu 0 only. Linux drivers like dcdbas or i8k try to >> achieve this by pinning the running thread to cpu 0, but in Dom0 this >> is not enough: the vcpu must be pinned to physical cpu 0 via Xen, too. >> >> Add a stable hypercall option SCHEDOP_pin_temp to the sched_op >> hypercall to achieve this. It is taking a physical cpu number as >> parameter. If pinning is possible (the calling domain has the >> privilege to make the call and the cpu is available in the domain's >> cpupool) the calling vcpu is pinned to the specified cpu. The old cpu >> affinity is saved. To undo the temporary pinning a cpu -1 is >> specified. This will restore the original cpu affinity for the vcpu. >> >> Signed-off-by: Juergen Gross <jgross@suse.com> >> --- >> V2: - limit operation to hardware domain as suggested by Jan Beulich >> - some style issues corrected as requested by Jan Beulich >> - use fixed width types in interface as requested by Jan Beulich >> - add compat layer checking as requested by Jan Beulich >> --- >> xen/common/compat/schedule.c | 4 ++ >> xen/common/schedule.c | 92 +++++++++++++++++++++++++++++++++++++++++--- >> xen/include/public/sched.h | 17 ++++++++ >> xen/include/xlat.lst | 1 + >> 4 files changed, 109 insertions(+), 5 deletions(-) >> >> diff --git a/xen/common/compat/schedule.c >> b/xen/common/compat/schedule.c index 812c550..73b0f01 100644 >> --- a/xen/common/compat/schedule.c >> +++ b/xen/common/compat/schedule.c >> @@ -10,6 +10,10 @@ >> >> #define do_sched_op compat_sched_op >> >> +#define xen_sched_pin_temp sched_pin_temp CHECK_sched_pin_temp; >> +#undef xen_sched_pin_temp >> + >> #define xen_sched_shutdown sched_shutdown CHECK_sched_shutdown; >> #undef xen_sched_shutdown diff --git a/xen/common/schedule.c >> b/xen/common/schedule.c index b0d4b18..653f852 100644 >> --- a/xen/common/schedule.c >> +++ b/xen/common/schedule.c >> @@ -271,6 +271,12 @@ int sched_move_domain(struct domain *d, struct cpupool *c) >> struct scheduler *old_ops; >> void *old_domdata; >> >> + for_each_vcpu ( d, v ) >> + { >> + if ( v->affinity_broken ) >> + return -EBUSY; >> + } >> + >> domdata = SCHED_OP(c->sched, alloc_domdata, d); >> if ( domdata == NULL ) >> return -ENOMEM; >> @@ -669,6 +675,14 @@ int cpu_disable_scheduler(unsigned int cpu) >> if ( cpumask_empty(&online_affinity) && >> cpumask_test_cpu(cpu, v->cpu_hard_affinity) ) >> { >> + if ( v->affinity_broken ) >> + { >> + /* The vcpu is temporarily pinned, can't move it. */ >> + vcpu_schedule_unlock_irqrestore(lock, flags, v); >> + ret = -EBUSY; >> + break; >> + } > > Does this mean that if the user closes the laptop lid while one of these drivers has vcpu0 pinned, that Xen will crash (see xen/arch/x86/smpboot.c:__cpu_disable())? Or is it the OS's job to make sure that all temporary pins are removed before suspending? > > Also -- have you actually tested the "cpupool move while pinned" > functionality to make sure it actually works? There's a weird bit in > cpupool_unassign_cpu_helper() where after calling cpu_disable_scheduler(cpu), it unconditionally sets the cpu bit in the cpupool_free_cpus mask, even if it returns an error. That can't be right, even for the existing -EAGAIN case, can it? > > I see that you have a loop to retry this call several times in the next patch; but what if it fails every time -- what state is the system in? > > And, in general, what happens if the device driver gets mixed up and forgets to unpin the vcpu? Is the only recourse to reboot your host (or deal with the fact that you can't reconfigure your cpupools)? > > -George > > Sorry, lost the original thread so replying at the top of mail chain. > > +static XSM_INLINE int xsm_schedop_pin_temp(XSM_DEFAULT_VOID) > +{ > + XSM_ASSERT_ACTION(XSM_PRIV); > + return xsm_default_action(action, current->domain, NULL); > +} > > Is the intention is to restrict the hypercall usage to dom0 only ? To be more precise: to the hardware domain (the patch sniplet you are referencing was part of V1 of the series, it isn't existing in V2 any longer). Juergen
diff --git a/xen/common/compat/schedule.c b/xen/common/compat/schedule.c index 812c550..73b0f01 100644 --- a/xen/common/compat/schedule.c +++ b/xen/common/compat/schedule.c @@ -10,6 +10,10 @@ #define do_sched_op compat_sched_op +#define xen_sched_pin_temp sched_pin_temp +CHECK_sched_pin_temp; +#undef xen_sched_pin_temp + #define xen_sched_shutdown sched_shutdown CHECK_sched_shutdown; #undef xen_sched_shutdown diff --git a/xen/common/schedule.c b/xen/common/schedule.c index b0d4b18..653f852 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -271,6 +271,12 @@ int sched_move_domain(struct domain *d, struct cpupool *c) struct scheduler *old_ops; void *old_domdata; + for_each_vcpu ( d, v ) + { + if ( v->affinity_broken ) + return -EBUSY; + } + domdata = SCHED_OP(c->sched, alloc_domdata, d); if ( domdata == NULL ) return -ENOMEM; @@ -669,6 +675,14 @@ int cpu_disable_scheduler(unsigned int cpu) if ( cpumask_empty(&online_affinity) && cpumask_test_cpu(cpu, v->cpu_hard_affinity) ) { + if ( v->affinity_broken ) + { + /* The vcpu is temporarily pinned, can't move it. */ + vcpu_schedule_unlock_irqrestore(lock, flags, v); + ret = -EBUSY; + break; + } + if (system_state == SYS_STATE_suspend) { cpumask_copy(v->cpu_hard_affinity_saved, @@ -752,14 +766,20 @@ static int vcpu_set_affinity( struct vcpu *v, const cpumask_t *affinity, cpumask_t *which) { spinlock_t *lock; + int ret = 0; lock = vcpu_schedule_lock_irq(v); - cpumask_copy(which, affinity); + if ( v->affinity_broken ) + ret = -EBUSY; + else + { + cpumask_copy(which, affinity); - /* Always ask the scheduler to re-evaluate placement - * when changing the affinity */ - set_bit(_VPF_migrating, &v->pause_flags); + /* Always ask the scheduler to re-evaluate placement + * when changing the affinity */ + set_bit(_VPF_migrating, &v->pause_flags); + } vcpu_schedule_unlock_irq(lock, v); @@ -771,7 +791,7 @@ static int vcpu_set_affinity( vcpu_migrate(v); } - return 0; + return ret; } int vcpu_set_hard_affinity(struct vcpu *v, const cpumask_t *affinity) @@ -978,6 +998,51 @@ void watchdog_domain_destroy(struct domain *d) kill_timer(&d->watchdog_timer[i]); } +static long do_pin_temp(int cpu) +{ + struct vcpu *v = current; + spinlock_t *lock; + long ret = -EINVAL; + + lock = vcpu_schedule_lock_irq(v); + + if ( cpu < 0 ) + { + if ( v->affinity_broken ) + { + cpumask_copy(v->cpu_hard_affinity, v->cpu_hard_affinity_saved); + v->affinity_broken = 0; + set_bit(_VPF_migrating, &v->pause_flags); + ret = 0; + } + } + else if ( cpu < nr_cpu_ids ) + { + if ( v->affinity_broken ) + ret = -EBUSY; + else if ( cpumask_test_cpu(cpu, VCPU2ONLINE(v)) ) + { + cpumask_copy(v->cpu_hard_affinity_saved, v->cpu_hard_affinity); + v->affinity_broken = 1; + cpumask_copy(v->cpu_hard_affinity, cpumask_of(cpu)); + set_bit(_VPF_migrating, &v->pause_flags); + ret = 0; + } + } + + vcpu_schedule_unlock_irq(lock, v); + + domain_update_node_affinity(v->domain); + + if ( v->pause_flags & VPF_migrating ) + { + vcpu_sleep_nosync(v); + vcpu_migrate(v); + } + + return ret; +} + typedef long ret_t; #endif /* !COMPAT */ @@ -1087,6 +1152,23 @@ ret_t do_sched_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } + case SCHEDOP_pin_temp: + { + struct sched_pin_temp sched_pin_temp; + + ret = -EFAULT; + if ( copy_from_guest(&sched_pin_temp, arg, 1) ) + break; + + ret = -EPERM; + if ( !is_hardware_domain(current->domain) ) + break; + + ret = do_pin_temp(sched_pin_temp.pcpu); + + break; + } + default: ret = -ENOSYS; } diff --git a/xen/include/public/sched.h b/xen/include/public/sched.h index 2219696..a0ce5a6 100644 --- a/xen/include/public/sched.h +++ b/xen/include/public/sched.h @@ -118,6 +118,17 @@ * With id != 0 and timeout != 0, poke watchdog timer and set new timeout. */ #define SCHEDOP_watchdog 6 + +/* + * Temporarily pin the current vcpu to one physical cpu or undo that pinning. + * @arg == pointer to sched_pin_temp_t structure. + * + * Setting pcpu to -1 will undo a previous temporary pinning and restore the + * previous cpu affinity. The temporary aspect of the pinning isn't enforced + * by the hypervisor. + * This call is allowed for the hardware domain only. + */ +#define SCHEDOP_pin_temp 7 /* ` } */ struct sched_shutdown { @@ -148,6 +159,12 @@ struct sched_watchdog { typedef struct sched_watchdog sched_watchdog_t; DEFINE_XEN_GUEST_HANDLE(sched_watchdog_t); +struct sched_pin_temp { + int32_t pcpu; +}; +typedef struct sched_pin_temp sched_pin_temp_t; +DEFINE_XEN_GUEST_HANDLE(sched_pin_temp_t); + /* * Reason codes for SCHEDOP_shutdown. These may be interpreted by control * software to determine the appropriate action. For the most part, Xen does diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst index fda1137..52c7233 100644 --- a/xen/include/xlat.lst +++ b/xen/include/xlat.lst @@ -104,6 +104,7 @@ ? pmu_data pmu.h ? pmu_params pmu.h ! sched_poll sched.h +? sched_pin_temp sched.h ? sched_remote_shutdown sched.h ? sched_shutdown sched.h ? tmem_oid tmem.h
Some hardware (e.g. Dell studio 1555 laptops) require SMIs to be called on physical cpu 0 only. Linux drivers like dcdbas or i8k try to achieve this by pinning the running thread to cpu 0, but in Dom0 this is not enough: the vcpu must be pinned to physical cpu 0 via Xen, too. Add a stable hypercall option SCHEDOP_pin_temp to the sched_op hypercall to achieve this. It is taking a physical cpu number as parameter. If pinning is possible (the calling domain has the privilege to make the call and the cpu is available in the domain's cpupool) the calling vcpu is pinned to the specified cpu. The old cpu affinity is saved. To undo the temporary pinning a cpu -1 is specified. This will restore the original cpu affinity for the vcpu. Signed-off-by: Juergen Gross <jgross@suse.com> --- V2: - limit operation to hardware domain as suggested by Jan Beulich - some style issues corrected as requested by Jan Beulich - use fixed width types in interface as requested by Jan Beulich - add compat layer checking as requested by Jan Beulich --- xen/common/compat/schedule.c | 4 ++ xen/common/schedule.c | 92 +++++++++++++++++++++++++++++++++++++++++--- xen/include/public/sched.h | 17 ++++++++ xen/include/xlat.lst | 1 + 4 files changed, 109 insertions(+), 5 deletions(-)