Message ID | 1457433240.3102.161.camel@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 08.03.16 at 11:34, <dario.faggioli@citrix.com> wrote: > On Tue, 2016-03-08 at 02:10 -0700, Jan Beulich wrote: >> > > > On 07.03.16 at 18:53, <dario.faggioli@citrix.com> wrote: >> > On Mon, 2016-03-07 at 09:40 -0700, Jan Beulich wrote: >> > > >> > IIRC, I was looking at how XEN_SYSCTL_pcitopoinfo is handled, for >> > reference, and that has some guest_handle_is_null()==>EINVAL >> > sainity >> > checking (in xen/common/sysctl.c), which, when I thought about it, >> > made >> > sense to me. >> > >> > My reasoning was, sort of: >> > 1. if the handle is NULL, no point getting into the somewhat >> > complicated logic of the while, >> > 2. more accurate error reporting: as being passed a NULL handler >> > looked something we could identify and call invalid, rather >> > than >> > waiting for the copy to fault. >> I think the XEN_SYSCTL_pcitopoinfo was misguided in this respect, >> cloning non applicable logic here which returns the number of needed >> (array) elements in such a case for a few other operations. >> > Sorry, I'm not sure I am getting: are you saying that, for _these_ > domctls, we should consider the handle being NULL as a way of the > caller to ask for the size of the array? No, I've tried to point out that _when_ such behavior is intended, the special casing of a null handle is warranted. But not (normally) in other cases. >> > About the structure of the code, as said above, I do like >> > how XEN_SYSCTL_pcitopoinfo ended up being handled, I think it is a >> > great fit for this specific case and, comparing at both this and >> > previous version, I do think this one is (bugs apart) looking >> > better. >> > >> > I'm sure I said this --long ago-- when discussing v4 (and maybe >> > even >> > previous versions), as well as more recently, when reviewing v5, >> > and >> > that's why Chong (finally! :-D) did it. >> > >> > So, with the comment in place (and with bugs fixed :-)), are you >> > (Jan) >> > ok with this being done this way? >> >> Well, this _might_ be acceptable for "get" (since the caller >> abandoning the sequence of calls prematurely is no problem), >> but for "set" it looks less suitable, as similar abandoning would >> leave the guest in some inconsistent / unintended state. >> > Are you referring to the fact that, with this interface, the caller has > the chance to leave intentionally, or that it may happen that not all > vcpus are updated because of some bug (still in the caller)? > > Well, if it's intentional, or even if the caller is buggy in the sense > that the code is written in a way that it misses updating some vcpus > (and if the interface and the behavior is well documented, as you > request), then one gets what he "wants" (and, in the latter case, it > wouldn't be too hard to debug and figure out the issue, I think). Intentional or buggy doesn't matter much - if we can avoid making ourselves dependent upon user mode code behaving well, I think we should. > If it's for bugs (still in the caller) like copy_from_guest_offset() > faulting because the array is too small, that can happen if using > continuation too, can't it? And it would still leave the guest in > similar inconsistent or unintended state, IMO... True, albeit that's what error indications are for. > One last point. Of course, since we are talking about bugs, the final > status is not the one desired, but it's not inconsistent in the sense > that the guest can't continue running, or crashes, or anything like > that. It's something like: > - you wants all the 32 vcpus of guest A to have these new parameters > - due to a bug, you're (for instance) passing me an array with only > 16 vcpus parameters > - result: onlyt 16 vcpus will have the new parameters. That was my understanding, yes. >> The >> issue with XEN_SYSCTL_pcitopoinfo was, iirc, the lack of a >> good way of encoding the continuation information, and while >> that would seem applicable here too I'm not sure now whether >> doing it the way it was done was the best choice. >> > As far as I can remember and see, it was being done by means of an > additional dedicated parameter in the handle (called ti->first_dev in > that case). Then at some point, you said: > > http://lists.xenproject.org/archives/html/xen-devel/2015-03/msg02623.html > "Considering this is a tools only interface, enforcing a not too high > limit on num_devs would seem better than this not really clean > continuation mechanism. The (tool stack) caller(s) can be made > iterate." > > With which I did agree (and I still do :-)), as well as I agree on the > fact that we basically are in the same situation here. > > Chong tried doing things with continuations for a few rounds, including > v5, which is here: > http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00817.html > > and he also used an additional field (vcpu_index). > > So, all this being said, my preference stays for the way the code looks > like in this version (with all the due commenting added). Of course, > it's your preference that really matters here, me not being the > maintainer of this code. :-) > > So, how do you prefer Chong to continue doing this? Well, modeling this after the other sysctl makes it something I can't reasonably reject. Hence what I've been saying so far were merely suggestions, as - other than you - I'm not convinced anymore the model used there was a good one. The more that here - afaict - no extra fields would be needed: The continuation encoding would simply consist of op->u.v.nr_vcpus getting suitably decreased and op->u.v.vcpus suitably bumped. >> Clearly >> stating (in the public interface header) that certain normally >> input-only fields are volatile would allow the continuation to >> be handled without tool stack assistance afaict. >> > Which (sorry, I'm not getting again) I guess is something > different/more than what was done in v5 (the relevant hunks of which > I'm pasting at the bottom of this email, for convenience)? The hunks you had included didn't cover the public interface header changes. My point here is: Generally we demand that fields in public interface structures not documented as OUT would not get modified by a hypercall. Since I'm suggesting to use both fields to encode continuation information, that rule would get violated, which therefore needs spelling out in a comment in the header. Jan
diff --git a/xen/common/domctl.c b/xen/common/domctl.c index 46b967e..b294221 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -847,9 +847,14 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) } case XEN_DOMCTL_scheduler_op: + { ret = sched_adjust(d, &op->u.scheduler_op); + if ( ret == -ERESTART ) + ret = hypercall_create_continuation( + __HYPERVISOR_domctl, "h", u_domctl); copyback = 1; break; + } case XEN_DOMCTL_getdomaininfo: { diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c index 3f1d047..34ae48d 100644 --- a/xen/common/sched_rt.c +++ b/xen/common/sched_rt.c @@ -1163,6 +1168,94 @@ rt_dom_cntl( } spin_unlock_irqrestore(&prv->lock, flags); break; + case XEN_DOMCTL_SCHEDOP_getvcpuinfo: + for ( index = op->u.v.vcpu_index; index < op->u.v.nr_vcpus; index++ ) + { + spin_lock_irqsave(&prv->lock, flags); + if ( copy_from_guest_offset(&local_sched, + op->u.v.vcpus, index, 1) ) + { + rc = -EFAULT; + spin_unlock_irqrestore(&prv->lock, flags); + break; + } + if ( local_sched.vcpuid >= d->max_vcpus || + d->vcpu[local_sched.vcpuid] == NULL ) + { + rc = -EINVAL; + spin_unlock_irqrestore(&prv->lock, flags); + break; + } + svc = rt_vcpu(d->vcpu[local_sched.vcpuid]); + + local_sched.s.rtds.budget = svc->budget / MICROSECS(1); + local_sched.s.rtds.period = svc->period / MICROSECS(1); + + if ( __copy_to_guest_offset(op->u.v.vcpus, index, + &local_sched, 1) ) + { + rc = -EFAULT; + spin_unlock_irqrestore(&prv->lock, flags); + break; + } + spin_unlock_irqrestore(&prv->lock, flags); + if ( hypercall_preempt_check() ) + { + op->u.v.vcpu_index = index + 1; + /* hypercall (after preemption) will continue at vcpu_index */ + rc = -ERESTART; + break; + } + } + break; + case XEN_DOMCTL_SCHEDOP_putvcpuinfo: + for ( index = op->u.v.vcpu_index; index < op->u.v.nr_vcpus; index++ ) + { + spin_lock_irqsave(&prv->lock, flags); + if ( copy_from_guest_offset(&local_sched, + op->u.v.vcpus, index, 1) ) + { + rc = -EFAULT; + spin_unlock_irqrestore(&prv->lock, flags); + break; + } + if ( local_sched.vcpuid >= d->max_vcpus || + d->vcpu[local_sched.vcpuid] == NULL ) + { + rc = -EINVAL; + spin_unlock_irqrestore(&prv->lock, flags); + break; + } + svc = rt_vcpu(d->vcpu[local_sched.vcpuid]); + period = MICROSECS(local_sched.s.rtds.period); + budget = MICROSECS(local_sched.s.rtds.budget); + if ( period > RTDS_MAX_PERIOD || budget < RTDS_MIN_BUDGET || + budget > period ) + { + rc = -EINVAL; + spin_unlock_irqrestore(&prv->lock, flags); + break; + } + + /* + * We accept period/budget less than 100 us, but will warn users about + * the large scheduling overhead due to it + */ + if ( period < MICROSECS(100) || budget < MICROSECS(100) ) + printk("Warning: period/budget less than 100 micro-secs " + "results in large scheduling overhead.\n"); + + svc->period = period; + svc->budget = budget; + spin_unlock_irqrestore(&prv->lock, flags); + if ( hypercall_preempt_check() ) + { + op->u.v.vcpu_index = index + 1; + rc = -ERESTART; + break; + } + } + break; } return rc;