From patchwork Tue Mar 8 10:34:00 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Dario Faggioli X-Patchwork-Id: 8531471 Return-Path: X-Original-To: patchwork-xen-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 942C39F2B4 for ; Tue, 8 Mar 2016 10:36:49 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 43CE12013A for ; Tue, 8 Mar 2016 10:36:47 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6E7432015E for ; Tue, 8 Mar 2016 10:36:45 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xen.org with esmtp (Exim 4.84) (envelope-from ) id 1adExt-0002oV-RM; Tue, 08 Mar 2016 10:34:09 +0000 Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.84) (envelope-from ) id 1adExs-0002oO-GR for xen-devel@lists.xen.org; Tue, 08 Mar 2016 10:34:08 +0000 Received: from [85.158.139.211] by server-17.bemta-5.messagelabs.com id 8B/CE-10213-F9AAED65; Tue, 08 Mar 2016 10:34:07 +0000 X-Env-Sender: prvs=86888a0d9=dario.faggioli@citrix.com X-Msg-Ref: server-16.tower-206.messagelabs.com!1457433244!13135786!1 X-Originating-IP: [66.165.176.63] X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor: VHJ1c3RlZCBJUDogNjYuMTY1LjE3Ni42MyA9PiAzMDYwNDg=\n, received_headers: No Received headers X-StarScan-Received: X-StarScan-Version: 8.11; banners=-,-,- X-VirusChecked: Checked Received: (qmail 47587 invoked from network); 8 Mar 2016 10:34:06 -0000 Received: from smtp02.citrix.com (HELO SMTP02.CITRIX.COM) (66.165.176.63) by server-16.tower-206.messagelabs.com with RC4-SHA encrypted SMTP; 8 Mar 2016 10:34:06 -0000 X-IronPort-AV: E=Sophos;i="5.22,556,1449532800"; d="asc'?scan'208";a="343982095" Message-ID: <1457433240.3102.161.camel@citrix.com> From: Dario Faggioli To: Jan Beulich , Chong Li Date: Tue, 8 Mar 2016 11:34:00 +0100 In-Reply-To: <56DEA50402000078000DA42D@prv-mh.provo.novell.com> References: <1457286958-5427-1-git-send-email-lichong659@gmail.com> <1457286958-5427-2-git-send-email-lichong659@gmail.com> <56DD894802000078000D9F84@prv-mh.provo.novell.com> <56DDBCFF02000078000DA1A0@prv-mh.provo.novell.com> <1457373181.3102.74.camel@citrix.com> <56DEA50402000078000DA42D@prv-mh.provo.novell.com> Organization: Citrix Inc. X-Mailer: Evolution 3.18.5.1 (3.18.5.1-1.fc23) MIME-Version: 1.0 X-DLP: MIA2 Cc: Chong Li , Sisu Xi , George Dunlap , xen-devel , Meng Xu , Dagaen Golomb Subject: Re: [Xen-devel] [PATCH v6 for Xen 4.7 1/4] xen: enable per-VCPU parameter settings for RTDS scheduler X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Tue, 2016-03-08 at 02:10 -0700, Jan Beulich wrote: > > > > On 07.03.16 at 18:53, wrote: > > On Mon, 2016-03-07 at 09:40 -0700, Jan Beulich wrote: > > >  > > IIRC, I was looking at how XEN_SYSCTL_pcitopoinfo is handled, for > > reference, and that has some guest_handle_is_null()==>EINVAL > > sainity > > checking (in xen/common/sysctl.c), which, when I thought about it, > > made > > sense to me. > > > > My reasoning was, sort of: > >  1. if the handle is NULL, no point getting into the somewhat  > >     complicated logic of the while, > >  2. more accurate error reporting: as being passed a NULL handler  > >     looked something we could identify and call invalid, rather > > than  > >     waiting for the copy to fault. > I think the XEN_SYSCTL_pcitopoinfo was misguided in this respect, > cloning non applicable logic here which returns the number of needed > (array) elements in such a case for a few other operations. > Sorry, I'm not sure I am getting: are you saying that, for _these_ domctls, we should consider the handle being NULL as a way of the caller to ask for the size of the array? *If* yes, well, that is "just" the number of vcpus of the guest, but, nevertheless, that, FWIW, looks fine to me. > > About the structure of the code, as said above, I do like > > how XEN_SYSCTL_pcitopoinfo ended up being handled, I think it is a > > great fit for this specific case and, comparing at both this and > > previous version, I do think this one is (bugs apart) looking > > better. > > > > I'm sure I said this --long ago-- when discussing v4 (and maybe > > even > > previous versions), as well as more recently, when reviewing v5, > > and > > that's why Chong (finally! :-D) did it. > > > > So, with the comment in place (and with bugs fixed :-)), are you > > (Jan) > > ok with this being done this way? > > Well, this _might_ be acceptable for "get" (since the caller > abandoning the sequence of calls prematurely is no problem), > but for "set" it looks less suitable, as similar abandoning would > leave the guest in some inconsistent / unintended state. > Are you referring to the fact that, with this interface, the caller has the chance to leave intentionally, or that it may happen that not all vcpus are updated because of some bug (still in the caller)? Well, if it's intentional, or even if the caller is buggy in the sense that the code is written in a way that it misses updating some vcpus (and if the interface and the behavior is well documented, as you request), then one gets what he "wants" (and, in the latter case, it wouldn't be too hard to debug and figure out the issue, I think). If it's for bugs (still in the caller) like copy_from_guest_offset() faulting because the array is too small, that can happen if using continuation too, can't it? And it would still leave the guest in similar inconsistent or unintended state, IMO... One last point. Of course, since we are talking about bugs, the final status is not the one desired, but it's not inconsistent in the sense that the guest can't continue running, or crashes, or anything like that. It's something like:  - you wants all the 32 vcpus of guest A to have these new parameters  - due to a bug, you're (for instance) passing me an array with only     16 vcpus parameters  - result: onlyt 16 vcpus will have the new parameters. > The > issue with XEN_SYSCTL_pcitopoinfo was, iirc, the lack of a > good way of encoding the continuation information, and while > that would seem applicable here too I'm not sure now whether > doing it the way it was done was the best choice. > As far as I can remember and see, it was being done by means of an additional dedicated parameter in the handle (called ti->first_dev in that case). Then at some point, you said: http://lists.xenproject.org/archives/html/xen-devel/2015-03/msg02623.html "Considering this is a tools only interface, enforcing a not too high  limit on num_devs would seem better than this not really clean  continuation mechanism. The (tool stack) caller(s) can be made  iterate." With which I did agree (and I still do :-)), as well as I agree on the fact that we basically are in the same situation here. Chong tried doing things with continuations for a few rounds, including v5, which is here: http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00817.html and he also used an additional field (vcpu_index). So, all this being said, my preference stays for the way the code looks like in this version (with all the due commenting added). Of course, it's your preference that really matters here, me not being the maintainer of this code. :-) So, how do you prefer Chong to continue doing this? > Clearly > stating (in the public interface header) that certain normally > input-only fields are volatile would allow the continuation to > be handled without tool stack assistance afaict. > Which (sorry, I'm not getting again) I guess is something different/more than what was done in v5 (the relevant hunks of which I'm pasting at the bottom of this email, for convenience)? > > BTW, Chong, I'm not sure this has to do with what Jan is saying, > > but > > looking again at XEN_SYSCTL_pcitopoinfo, it looks to me you're > > missing > > copying nr_vcpus back up to the guest (which is actually what makes > > libxc knows whether all vcpus have been processed or now). > Indeed that is why the conditional makes sense there, but not here. > And the copying back is already being taken care of by the caller of > sched_adjust(). > Indeed. So your original point was: we're always copying back, so it's fine to always update the field, with the only exception of errors having occurred. I do get it now. Regards, Dario diff --git a/xen/common/domctl.c b/xen/common/domctl.c index 46b967e..b294221 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -847,9 +847,14 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) } case XEN_DOMCTL_scheduler_op: + { ret = sched_adjust(d, &op->u.scheduler_op); + if ( ret == -ERESTART ) + ret = hypercall_create_continuation( + __HYPERVISOR_domctl, "h", u_domctl); copyback = 1; break; + } case XEN_DOMCTL_getdomaininfo: { diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c index 3f1d047..34ae48d 100644 --- a/xen/common/sched_rt.c +++ b/xen/common/sched_rt.c @@ -1163,6 +1168,94 @@ rt_dom_cntl(          }          spin_unlock_irqrestore(&prv->lock, flags);          break; +    case XEN_DOMCTL_SCHEDOP_getvcpuinfo:  +        for ( index = op->u.v.vcpu_index; index < op->u.v.nr_vcpus; index++ ) +        { +            spin_lock_irqsave(&prv->lock, flags); +            if ( copy_from_guest_offset(&local_sched, +                          op->u.v.vcpus, index, 1) ) +            { +                rc = -EFAULT; +                spin_unlock_irqrestore(&prv->lock, flags); +                break; +            } +            if ( local_sched.vcpuid >= d->max_vcpus || +                          d->vcpu[local_sched.vcpuid] == NULL ) +            { +                rc = -EINVAL; +                spin_unlock_irqrestore(&prv->lock, flags); +                break; +            } +            svc = rt_vcpu(d->vcpu[local_sched.vcpuid]); + +            local_sched.s.rtds.budget = svc->budget / MICROSECS(1); +            local_sched.s.rtds.period = svc->period / MICROSECS(1); + +            if ( __copy_to_guest_offset(op->u.v.vcpus, index, +                    &local_sched, 1) ) +            { +                rc = -EFAULT; +                spin_unlock_irqrestore(&prv->lock, flags); +                break; +            } +            spin_unlock_irqrestore(&prv->lock, flags); +            if ( hypercall_preempt_check() ) +            { +                op->u.v.vcpu_index = index + 1; +                /* hypercall (after preemption) will continue at vcpu_index */ +                rc = -ERESTART; +                break; +            } +        } +        break; +    case XEN_DOMCTL_SCHEDOP_putvcpuinfo: +        for ( index = op->u.v.vcpu_index; index < op->u.v.nr_vcpus; index++ ) +        { +            spin_lock_irqsave(&prv->lock, flags); +            if ( copy_from_guest_offset(&local_sched, +                          op->u.v.vcpus, index, 1) ) +            { +                rc = -EFAULT; +                spin_unlock_irqrestore(&prv->lock, flags); +                break; +            } +            if ( local_sched.vcpuid >= d->max_vcpus || +                          d->vcpu[local_sched.vcpuid] == NULL ) +            { +                rc = -EINVAL; +                spin_unlock_irqrestore(&prv->lock, flags); +                break; +            } +            svc = rt_vcpu(d->vcpu[local_sched.vcpuid]); +            period = MICROSECS(local_sched.s.rtds.period); +            budget = MICROSECS(local_sched.s.rtds.budget); +            if ( period > RTDS_MAX_PERIOD || budget < RTDS_MIN_BUDGET || +                          budget > period ) +            { +                rc = -EINVAL; +                spin_unlock_irqrestore(&prv->lock, flags); +                break; +            } + +            /*  +             * We accept period/budget less than 100 us, but will warn users about +             * the large scheduling overhead due to it +             */ +            if ( period < MICROSECS(100) || budget < MICROSECS(100) ) +                printk("Warning: period/budget less than 100 micro-secs " +                       "results in large scheduling overhead.\n"); + +            svc->period = period; +            svc->budget = budget; +            spin_unlock_irqrestore(&prv->lock, flags); +            if ( hypercall_preempt_check() ) +            { +                op->u.v.vcpu_index = index + 1; +                rc = -ERESTART; +                break; +            } +        } +        break;      }        return rc;