diff mbox

[v2] features: declare the Credit2 scheduler as Supported.

Message ID 147808213853.22655.18440819543503331735.stgit@Solace.fritz.box (mailing list archive)
State New, archived
Headers show

Commit Message

Dario Faggioli Nov. 2, 2016, 10:22 a.m. UTC
Credit2 is available in tree as an "Experimental" scheduler since
a few years. Recently, effort started for making it production ready
and, eventually, the new Xen's default scheduler. As a consequence of
that, it has undergone a greatd deal of development, testing and
benchmarking.

In fact, Credit2's much more modern (wrt Credit1) design and cleaner
code makes it a lot easier to understand what the scheduler is doing,
fix scheduling issues that may come up, and implement new and more
advanced features, in future.

In some more details:

 - key features that were missing (pinning and context switching
   rate-limiting) have now been implemented, and more (soft affinity,
   caps and reservations) are about to come. The gap wrt Credit1 is
   therefore closing. In particular, with pinning and rate-limiting
   available, the scheduler can be considered usable.

 - Credit2 is tested by OSSTest since long time. Furthermore, as a
   part of recent efforts, stress tests and benchmarks have been run
   and shown no bugs or stability issues.

 - A number of different benchmarks have been run, most of them
   comparing Credit2 with Credit1. Some of the results were posted on
   xen-devel, some others have been illustrated during a talk at 2016
   edition of Xen-Project Developer Summit. In general, performance
   look promising --if not better than Credit1 already, in some of
   the cases.

It therefore appears that we are ready to mark the Credit2 scheduler
as a 'Supported' feature, and ask users to look at it and try it, if
they think it suits their needs.

Of course, declaring something 'Supported' has security implications.
So here it is how the situation looks like from a security standpoint:

1) Is guest->host privilege escalation possible?

The only interfaces exposed to unprivileged guests are the SCHEDOP
hypercalls, and timers. None of those hypercalls contain any pointers,
and they don't look to contain any privilege escalation path. Also,
they're not specific to Credit2, as they're "used" by all schedulers
(ingluding the current default, Credit1), so anything about these
interfaces would be a security concern already.


2) Is guest user->guest kernel escalation possible?

The guest kernel is not really relying on anything from the scheduler
to protect itself or any data in any way.


3) Is there any information leakage?

The only information which the scheduler exposes to unprivileged
guests is the timing information.  This may be able to be used for
side-channel attacks to probabilistically infer things about other
vcpus running on the same system; but this has not traditionally
been considered within the security boundary. And, again, this is
possible with all schedulers.

The control domain can issue DOMCTL_SCHEDOP and SYSCTL_SCHEDOP
hypercalls. Auditing such code, nothing that looks like a security
risk has been found (E.g., there's no risk of leaking content of
the hypervisor stack, as no buffer/local variables is returned).


4) Can a Denial-of-Service be triggered?

This is a risk, with schedulers, and one that's hard to foresee.
For instance, it _did_ happen on Credit1, in the past (a vcpu
could "game the system" by sleeping at particular times to gain
BOOST priority and monopolize 95% of the cpu). In that case, it
was possible because of the probabilistic nature of accounting
in Credit1 (which was then fixed). Well, Credit2:
 - already do accurate, rather than probabilistic, accounting;
 - does not have any BOOST or, in general, any way for a vcpu to
   become 'more important' than the others: they're all subjected
   to the same crediting algorithm.

Also note that, the accounting and the crediting algorithm are a lot
simpler than in Credit1, and hence a lot easier to understand, debug
and audit.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Lars Kurth <lars.kurth@citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tim Deegan <tim@xen.org>
Cc: security@xenproject.org
---
 docs/features/sched_credit2.pandoc |    2 +-
 xen/common/sched_credit2.c         |    2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

Comments

Andrew Cooper Nov. 2, 2016, 10:37 a.m. UTC | #1
On 02/11/16 10:22, Dario Faggioli wrote:

> ---
>  docs/features/sched_credit2.pandoc |    2 +-
>  xen/common/sched_credit2.c         |    2 --
>  2 files changed, 1 insertion(+), 3 deletions(-)

There is also text in Kconfig which should be adjusted.

We should probably have a checklist of items to check/consider/alter
when trying to declare something as supported.

~Andrew
Dario Faggioli Nov. 2, 2016, 10:50 a.m. UTC | #2
On Wed, 2016-11-02 at 10:37 +0000, Andrew Cooper wrote:
> On 02/11/16 10:22, Dario Faggioli wrote:
> 
> > 
> > ---
> >  docs/features/sched_credit2.pandoc |    2 +-
> >  xen/common/sched_credit2.c         |    2 --
> >  2 files changed, 1 insertion(+), 3 deletions(-)
> 
> There is also text in Kconfig which should be adjusted.
> 
Damn! I thought about it, but then forgot to do before sending.

Well, I certainly can send v3 with that done... Lemme see if there's
other feedback that I'll need to incorporate.

> We should probably have a checklist of items to check/consider/alter
> when trying to declare something as supported.
> 
Yes, if we document the process of how something becomes supported,
that could well be included.

Thanks and Regards,
Dario
Jan Beulich Nov. 2, 2016, 10:59 a.m. UTC | #3
>>> On 02.11.16 at 11:22, <dario.faggioli@citrix.com> wrote:
> 3) Is there any information leakage?
> 
> The only information which the scheduler exposes to unprivileged
> guests is the timing information.  This may be able to be used for
> side-channel attacks to probabilistically infer things about other
> vcpus running on the same system; but this has not traditionally
> been considered within the security boundary. And, again, this is
> possible with all schedulers.
> 
> The control domain can issue DOMCTL_SCHEDOP and SYSCTL_SCHEDOP
> hypercalls. Auditing such code, nothing that looks like a security
> risk has been found (E.g., there's no risk of leaking content of
> the hypervisor stack, as no buffer/local variables is returned).

There certainly are buffers being returned here. Namely in the
credit2 case there's also a 32-bit padding field in the domctl
interface structure (and uniformly for all schedulers there's one
in the sysctl structure), which provides the fundamental means
to leak stack data. However, none of this is a problem, both
because iirc leaking stack data to Dom0 is not really considered
a security issue, and because of the way the structures get
dealt with. Nevertheless I think the above paragraph should be
re-worded.

Jan
Dario Faggioli Nov. 2, 2016, 11:22 a.m. UTC | #4
On Wed, 2016-11-02 at 04:59 -0600, Jan Beulich wrote:
> > > > On 02.11.16 at 11:22, <dario.faggioli@citrix.com> wrote:
> > The control domain can issue DOMCTL_SCHEDOP and SYSCTL_SCHEDOP
> > hypercalls. Auditing such code, nothing that looks like a security
> > risk has been found (E.g., there's no risk of leaking content of
> > the hypervisor stack, as no buffer/local variables is returned).
> 
> There certainly are buffers being returned here. Namely in the
> credit2 case there's also a 32-bit padding field in the domctl
> interface structure (and uniformly for all schedulers there's one
> in the sysctl structure), which provides the fundamental means
> to leak stack data. However, none of this is a problem, both
> because iirc leaking stack data to Dom0 is not really considered
> a security issue, and because of the way the structures get
> dealt with. 
>
Right, what I meant is really "none of this is a problem [...] because
of the way the structures get dealt with".

I.e., there is nothing like what made e0e3b8f64730f3ee necessary.

> Nevertheless I think the above paragraph should be
> re-worded.
> 
Yep, I certainly could have said it better. But if leaking to Dom0 is
not worth being considered, I guess I can just remove the paragraph
entirely, can't I?

Thanks and Regards,
Dario
Jan Beulich Nov. 2, 2016, 11:29 a.m. UTC | #5
>>> On 02.11.16 at 12:22, <dario.faggioli@citrix.com> wrote:
> On Wed, 2016-11-02 at 04:59 -0600, Jan Beulich wrote:
>> > > > On 02.11.16 at 11:22, <dario.faggioli@citrix.com> wrote:
>> > The control domain can issue DOMCTL_SCHEDOP and SYSCTL_SCHEDOP
>> > hypercalls. Auditing such code, nothing that looks like a security
>> > risk has been found (E.g., there's no risk of leaking content of
>> > the hypervisor stack, as no buffer/local variables is returned).
>> 
>> There certainly are buffers being returned here. Namely in the
>> credit2 case there's also a 32-bit padding field in the domctl
>> interface structure (and uniformly for all schedulers there's one
>> in the sysctl structure), which provides the fundamental means
>> to leak stack data. However, none of this is a problem, both
>> because iirc leaking stack data to Dom0 is not really considered
>> a security issue, and because of the way the structures get
>> dealt with. 
>>
> Right, what I meant is really "none of this is a problem [...] because
> of the way the structures get dealt with".
> 
> I.e., there is nothing like what made e0e3b8f64730f3ee necessary.
> 
>> Nevertheless I think the above paragraph should be
>> re-worded.
>> 
> Yep, I certainly could have said it better. But if leaking to Dom0 is
> not worth being considered, I guess I can just remove the paragraph
> entirely, can't I?

Well, stating this fact explicitly is imo better than omitting it just
for someone to later ask "And what about ..." Arguably this could
be said in one short sentence, though.

Jan
diff mbox

Patch

diff --git a/docs/features/sched_credit2.pandoc b/docs/features/sched_credit2.pandoc
index 8609d9c..9c8e15b 100644
--- a/docs/features/sched_credit2.pandoc
+++ b/docs/features/sched_credit2.pandoc
@@ -5,7 +5,7 @@ 
 
 # Basics
 ---------------- ----------------------------------------------------
-         Status: **Experimental**
+         Status: **Supported**
 
       Component: Hypervisor
 ---------------- ----------------------------------------------------
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index fe46e80..1f26553 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -2954,8 +2954,6 @@  csched2_init(struct scheduler *ops)
     struct csched2_private *prv;
 
     printk("Initializing Credit2 scheduler\n");
-    printk(" WARNING: This is experimental software in development.\n" \
-           " Use at your own risk.\n");
 
     printk(XENLOG_INFO " load_precision_shift: %d\n"
            XENLOG_INFO " load_window_shift: %d\n"