[RFC,v1,3/6] sched, credit2: improve scheduler fairness

Message ID	20200612002205.174295-4-volodymyr_babchuk@epam.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=NwDn=7Z=lists.xenproject.org=xen-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2AE4820835 From: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> To: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org> Subject: [RFC PATCH v1 3/6] sched, credit2: improve scheduler fairness Thread-Topic: [RFC PATCH v1 3/6] sched, credit2: improve scheduler fairness Thread-Index: AQHWQE+TBfX1MPdkGkC6vJ7ur/uY5A== Date: Fri, 12 Jun 2020 00:22:36 +0000 Message-ID: <20200612002205.174295-4-volodymyr_babchuk@epam.com> References: <20200612002205.174295-1-volodymyr_babchuk@epam.com> In-Reply-To: <20200612002205.174295-1-volodymyr_babchuk@epam.com> Accept-Language: en-US Content-Language: en-US Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Precedence: list Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>, George Dunlap <george.dunlap@citrix.com>, Dario Faggioli <dfaggioli@suse.com> Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
Series	Fair scheduling \| expand [RFC,v1,0/6] Fair scheduling [RFC,v1,1/6] sched: track time spent in IRQ handler [RFC,v1,2/6] sched: track time spent in hypervisor tasks [RFC,v1,3/6] sched, credit2: improve scheduler fairness [RFC,v1,4/6] xentop: collect IRQ and HYP time statistics. [RFC,v1,5/6] tools: xentop: show time spent in IRQ and HYP states. [RFC,v1,6/6] trace: add fair scheduling trace events

Message ID

20200612002205.174295-4-volodymyr_babchuk@epam.com (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2AE4820835
From: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
To: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Subject: [RFC PATCH v1 3/6] sched, credit2: improve scheduler fairness
Thread-Topic: [RFC PATCH v1 3/6] sched, credit2: improve scheduler fairness
Thread-Index: AQHWQE+TBfX1MPdkGkC6vJ7ur/uY5A==
Date: Fri, 12 Jun 2020 00:22:36 +0000
Message-ID: <20200612002205.174295-4-volodymyr_babchuk@epam.com>
References: <20200612002205.174295-1-volodymyr_babchuk@epam.com>
In-Reply-To: <20200612002205.174295-1-volodymyr_babchuk@epam.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 de4db258-52dd-4a6d-14f8-08d80e66b5e6
X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Jun 2020 00:22:36.9255 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: b41b72d0-4e9f-4c26-8a69-f949f367c91d
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: 
 GA0TCH7Ztw0IguOTKlSCcOSNU3bchUtqDPJEDwkZedpLtOh/BDZlWtFmD6uF3QJrq+Dw7I5O9/SDdZgIxeKg0t75I5qvP2A58UeyLYNPkg0=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0302MB2654
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>,
 George Dunlap <george.dunlap@citrix.com>, Dario Faggioli <dfaggioli@suse.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

Series

Fair scheduling | expand

Commit Message

Volodymyr Babchuk June 12, 2020, 12:22 a.m. UTC

Now we can make corrections for scheduling unit run time, based on
data gathered in previous patches. This includes time spent in IRQ
handlers and time spent for hypervisor housekeeping tasks. Those time
spans needs to be deduced from a total run time.

This patch adds sched_get_time_correction() function which returns
time correction value. This value should be subtracted by a scheduler
implementation from a total vCPU/shced_unit run time.

TODO: Make the corresponding changes to all other schedulers.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
---
 xen/common/sched/core.c    | 23 +++++++++++++++++++++++
 xen/common/sched/credit2.c |  2 +-
 xen/common/sched/private.h | 10 ++++++++++
 3 files changed, 34 insertions(+), 1 deletion(-)

Comments

Jürgen Groß June 12, 2020, 4:51 a.m. UTC | #1

On 12.06.20 02:22, Volodymyr Babchuk wrote:
> Now we can make corrections for scheduling unit run time, based on
> data gathered in previous patches. This includes time spent in IRQ
> handlers and time spent for hypervisor housekeeping tasks. Those time
> spans needs to be deduced from a total run time.
> 
> This patch adds sched_get_time_correction() function which returns
> time correction value. This value should be subtracted by a scheduler
> implementation from a total vCPU/shced_unit run time.
> 
> TODO: Make the corresponding changes to all other schedulers.
> 
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
>   xen/common/sched/core.c    | 23 +++++++++++++++++++++++
>   xen/common/sched/credit2.c |  2 +-
>   xen/common/sched/private.h | 10 ++++++++++
>   3 files changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> index d597811fef..a7294ff5c3 100644
> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -974,6 +974,29 @@ void vcpu_end_hyp_task(struct vcpu *v)
>   #ifndef NDEBUG
>       v->in_hyp_task = false;
>   #endif
> +
> +s_time_t sched_get_time_correction(struct sched_unit *u)
> +{
> +    unsigned long flags;
> +    int irq, hyp;

Using "irq" for a time value is misleading IMO.

> +
> +    while ( true )
> +    {
> +        irq = atomic_read(&u->irq_time);
> +        if ( likely( irq == atomic_cmpxchg(&u->irq_time, irq, 0)) )
> +            break;
> +    }

Just use atomic_xchg()?

> +
> +    while ( true )
> +    {
> +        hyp = atomic_read(&u->hyp_time);
> +        if ( likely( hyp == atomic_cmpxchg(&u->hyp_time, hyp, 0)) )
> +            break;
> +    }
> +
> +    return irq + hyp;

Ah, I didn't look into this patch until now.

You can replace my comments about overflow of an int for patches 1 and 2
with:

   Please modify the comment about not overflowing hinting to the value
   being reset when making scheduling decisions.

And this (of course) needs to be handled in all other schedulers, too.


Juergen

Volodymyr Babchuk June 12, 2020, 11:38 a.m. UTC | #2

On Fri, 2020-06-12 at 06:51 +0200, Jürgen Groß wrote:
> On 12.06.20 02:22, Volodymyr Babchuk wrote:
> > Now we can make corrections for scheduling unit run time, based on
> > data gathered in previous patches. This includes time spent in IRQ
> > handlers and time spent for hypervisor housekeeping tasks. Those time
> > spans needs to be deduced from a total run time.
> > 
> > This patch adds sched_get_time_correction() function which returns
> > time correction value. This value should be subtracted by a scheduler
> > implementation from a total vCPU/shced_unit run time.
> > 
> > TODO: Make the corresponding changes to all other schedulers.
> > 
> > Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> > ---
> >   xen/common/sched/core.c    | 23 +++++++++++++++++++++++
> >   xen/common/sched/credit2.c |  2 +-
> >   xen/common/sched/private.h | 10 ++++++++++
> >   3 files changed, 34 insertions(+), 1 deletion(-)
> > 
> > diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> > index d597811fef..a7294ff5c3 100644
> > --- a/xen/common/sched/core.c
> > +++ b/xen/common/sched/core.c
> > @@ -974,6 +974,29 @@ void vcpu_end_hyp_task(struct vcpu *v)
> >   #ifndef NDEBUG
> >       v->in_hyp_task = false;
> >   #endif
> > +
> > +s_time_t sched_get_time_correction(struct sched_unit *u)
> > +{
> > +    unsigned long flags;
> > +    int irq, hyp;
> 
> Using "irq" for a time value is misleading IMO.

Yes, you are right. I'll rename this variables to irq_time and
hyp_time. 

> > +
> > +    while ( true )
> > +    {
> > +        irq = atomic_read(&u->irq_time);
> > +        if ( likely( irq == atomic_cmpxchg(&u->irq_time, irq, 0)) )
> > +            break;
> > +    }
> 
> Just use atomic_xchg()?

Thanks. I somehow missed this macro.

> > +
> > +    while ( true )
> > +    {
> > +        hyp = atomic_read(&u->hyp_time);
> > +        if ( likely( hyp == atomic_cmpxchg(&u->hyp_time, hyp, 0)) )
> > +            break;
> > +    }
> > +
> > +    return irq + hyp;
> 
> Ah, I didn't look into this patch until now.
> 
> You can replace my comments about overflow of an int for patches 1 and 2
> with:
> 
>    Please modify the comment about not overflowing hinting to the value
>    being reset when making scheduling decisions.

Will do.

> And this (of course) needs to be handled in all other schedulers, too.
> 

Yes, the plan is to call this function in all schedulers. I skipped
this in RFC, because I wanted to discuss the general approch. I'll add
support for all other schedulers in the next version.

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index d597811fef..a7294ff5c3 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -974,6 +974,29 @@  void vcpu_end_hyp_task(struct vcpu *v)
 #ifndef NDEBUG
     v->in_hyp_task = false;
 #endif
+
+s_time_t sched_get_time_correction(struct sched_unit *u)
+{
+    unsigned long flags;
+    int irq, hyp;
+
+    while ( true )
+    {
+        irq = atomic_read(&u->irq_time);
+        if ( likely( irq == atomic_cmpxchg(&u->irq_time, irq, 0)) )
+            break;
+    }
+
+    while ( true )
+    {
+        hyp = atomic_read(&u->hyp_time);
+        if ( likely( hyp == atomic_cmpxchg(&u->hyp_time, hyp, 0)) )
+            break;
+    }
+
+    return irq + hyp;
+}
+
 }
 
 /*
diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
index 34f05c3e2a..7a0aca078b 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -1722,7 +1722,7 @@  void burn_credits(struct csched2_runqueue_data *rqd,
         return;
     }
 
-    delta = now - svc->start_time;
+    delta = now - svc->start_time - sched_get_time_correction(svc->unit);
 
     if ( unlikely(delta <= 0) )
     {
diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
index b9a5b4c01c..3f4859ce23 100644
--- a/xen/common/sched/private.h
+++ b/xen/common/sched/private.h
@@ -604,4 +604,14 @@  void cpupool_put(struct cpupool *pool);
 int cpupool_add_domain(struct domain *d, int poolid);
 void cpupool_rm_domain(struct domain *d);
 
+/*
+ * Get amount of time spent doing non-guest related work on
+ * current scheduling unit. This includes time spent in soft IRQs
+ * and in hardware interrupt handlers.
+ *
+ * Call to this function resets the counters, so it is supposed to
+ * be called when scheduler calculates time used by the scheduling
+ * unit.
+ */
+s_time_t sched_get_time_correction(struct sched_unit *u);
 #endif /* __XEN_SCHED_IF_H__ */

[RFC,v1,3/6] sched, credit2: improve scheduler fairness

Commit Message

Comments

Patch