diff mbox

xen/hvm: fix hypervisor crash with hvm_save_one()

Message ID 1493731539-31798-1-git-send-email-rcojocaru@bitdefender.com (mailing list archive)
State New, archived
Headers show

Commit Message

Razvan Cojocaru May 2, 2017, 1:25 p.m. UTC
hvm_save_cpu_ctxt() does a memset(&ctxt, 0, sizeof(ctxt)), which
can lead to ctxt.cur being 0. This can then crash the hypervisor
(with FATAL PAGE FAULT) in hvm_save_one() via the
"off < (ctxt.cur - sizeof(*desc))" for() test. This has happened
in practice with a Linux VM queried around shutdown:

(XEN) hvm.c:1595:d3v0 All CPUs offline -- powering off.
(XEN) ----[ Xen-4.9-rc  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    5
(XEN) RIP:    e008:[<ffff82d0802496d2>] hvm_save_one+0x145/0x1fd
(XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor (d0v2)
(XEN) rax: ffff830492cbb445   rbx: 0000000000000000   rcx: ffff83039343b400
(XEN) rdx: 00000000ff88004d   rsi: fffffffffffffff8   rdi: 0000000000000000
(XEN) rbp: ffff8304103e7c88   rsp: ffff8304103e7c48   r8:  0000000000000001
(XEN) r9:  deadbeefdeadf00d   r10: 0000000000000000   r11: 0000000000000282
(XEN) r12: 00007f43a3b14004   r13: 00000000fffffffe   r14: 0000000000000000
(XEN) r15: ffff830400c41000   cr0: 0000000080050033   cr4: 00000000001526e0
(XEN) cr3: 0000000402e13000   cr2: ffff830492cbb447
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d0802496d2> (hvm_save_one+0x145/0x1fd):
(XEN)  00 00 48 01 c8 83 c2 08 <66> 39 58 02 75 64 eb 08 48 89 c8 ba 08 00 00 00
(XEN) Xen stack trace from rsp=ffff8304103e7c48:
(XEN)    0000041000000000 ffff83039343b400 ffff8304103e7c70 ffff8304103e7da8
(XEN)    ffff830400c41000 00007f43a3b13004 ffff8304103b7000 ffffffffffffffea
(XEN)    ffff8304103e7d48 ffff82d0802683d4 ffff8300d19fd000 ffff82d0802320d8
(XEN)    ffff830400c41000 0000000000000000 ffff8304103e7cd8 ffff82d08026ff3d
(XEN)    0000000000000000 ffff8300d19fd000 ffff8304103e7cf8 ffff82d080232142
(XEN)    0000000000000000 ffff8300d19fd000 ffff8304103e7d28 ffff82d080207051
(XEN)    ffff8304103e7d18 ffff830400c41000 0000000000000202 ffff830400c41000
(XEN)    0000000000000000 00007f43a3b13004 0000000000000000 deadbeefdeadf00d
(XEN)    ffff8304103e7e68 ffff82d080206c47 0700000000000000 ffff830410375bd0
(XEN)    0000000000000296 ffff830410375c78 ffff830410375c80 0000000000000003
(XEN)    ffff8304103e7e68 ffff8304103b67c0 ffff8304103b7000 ffff8304103b67c0
(XEN)    0000000d00000037 0000000000000003 0000000000000002 00007f43a3b14004
(XEN)    00007ffd5d925590 0000000000000000 0000000100000000 0000000000000000
(XEN)    00000000ea8f8000 0000000000000000 00007ffd00000000 0000000000000000
(XEN)    00007f43a276f557 0000000000000000 00000000ea8f8000 0000000000000000
(XEN)    00007ffd5d9255e0 00007f43a23280b2 00007ffd5d926058 ffff8304103e7f18
(XEN)    ffff8300d19fe000 0000000000000024 ffff82d0802053e5 deadbeefdeadf00d
(XEN)    ffff8304103e7f08 ffff82d080351565 010000003fffffff 00007f43a3b13004
(XEN)    deadbeefdeadf00d deadbeefdeadf00d deadbeefdeadf00d deadbeefdeadf00d
(XEN)    ffff8800781425c0 ffff88007ce94300 ffff8304103e7ed8 ffff82d0802719ec
(XEN) Xen call trace:
(XEN)    [<ffff82d0802496d2>] hvm_save_one+0x145/0x1fd
(XEN)    [<ffff82d0802683d4>] arch_do_domctl+0xa7a/0x259f
(XEN)    [<ffff82d080206c47>] do_domctl+0x1862/0x1b7b
(XEN)    [<ffff82d080351565>] pv_hypercall+0x1ef/0x42c
(XEN)    [<ffff82d080355106>] entry.o#test_all_events+0/0x30
(XEN)
(XEN) Pagetable walk from ffff830492cbb447:
(XEN)  L4[0x106] = 00000000dbc36063 ffffffffffffffff
(XEN)  L3[0x012] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 5:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: ffff830492cbb447
(XEN) ****************************************

Reported-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Tested-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
---
 xen/common/hvm/save.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Andrew Cooper May 2, 2017, 1:39 p.m. UTC | #1
On 02/05/17 14:25, Razvan Cojocaru wrote:
> hvm_save_cpu_ctxt() does a memset(&ctxt, 0, sizeof(ctxt)), which
> can lead to ctxt.cur being 0.

Unfortunately, different objects both named ctxt.

>  This can then crash the hypervisor
> (with FATAL PAGE FAULT) in hvm_save_one() via the
> "off < (ctxt.cur - sizeof(*desc))" for() test. This has happened
> in practice with a Linux VM queried around shutdown:

The problem is that hvm_save_cpu_ctxt() returns success without writing
any data into hvm_domain_context_t in the case that all CPUs are offline.

This then causes an underflow in the range calculation in hvm_save_one()
which causes Xen to wander over uninitialised data and off the end of
the buffer.

It is probably worth saying that this has been broken since Xen 4.4 (c/s
e019c606f59 specifically)

>
> (XEN) hvm.c:1595:d3v0 All CPUs offline -- powering off.
> (XEN) ----[ Xen-4.9-rc  x86_64  debug=y   Not tainted ]----
> (XEN) CPU:    5
> (XEN) RIP:    e008:[<ffff82d0802496d2>] hvm_save_one+0x145/0x1fd
> (XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor (d0v2)
> (XEN) rax: ffff830492cbb445   rbx: 0000000000000000   rcx: ffff83039343b400
> (XEN) rdx: 00000000ff88004d   rsi: fffffffffffffff8   rdi: 0000000000000000
> (XEN) rbp: ffff8304103e7c88   rsp: ffff8304103e7c48   r8:  0000000000000001
> (XEN) r9:  deadbeefdeadf00d   r10: 0000000000000000   r11: 0000000000000282
> (XEN) r12: 00007f43a3b14004   r13: 00000000fffffffe   r14: 0000000000000000
> (XEN) r15: ffff830400c41000   cr0: 0000000080050033   cr4: 00000000001526e0
> (XEN) cr3: 0000000402e13000   cr2: ffff830492cbb447
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen code around <ffff82d0802496d2> (hvm_save_one+0x145/0x1fd):
> (XEN)  00 00 48 01 c8 83 c2 08 <66> 39 58 02 75 64 eb 08 48 89 c8 ba 08 00 00 00
> (XEN) Xen stack trace from rsp=ffff8304103e7c48:
> (XEN)    0000041000000000 ffff83039343b400 ffff8304103e7c70 ffff8304103e7da8
> (XEN)    ffff830400c41000 00007f43a3b13004 ffff8304103b7000 ffffffffffffffea
> (XEN)    ffff8304103e7d48 ffff82d0802683d4 ffff8300d19fd000 ffff82d0802320d8
> (XEN)    ffff830400c41000 0000000000000000 ffff8304103e7cd8 ffff82d08026ff3d
> (XEN)    0000000000000000 ffff8300d19fd000 ffff8304103e7cf8 ffff82d080232142
> (XEN)    0000000000000000 ffff8300d19fd000 ffff8304103e7d28 ffff82d080207051
> (XEN)    ffff8304103e7d18 ffff830400c41000 0000000000000202 ffff830400c41000
> (XEN)    0000000000000000 00007f43a3b13004 0000000000000000 deadbeefdeadf00d
> (XEN)    ffff8304103e7e68 ffff82d080206c47 0700000000000000 ffff830410375bd0
> (XEN)    0000000000000296 ffff830410375c78 ffff830410375c80 0000000000000003
> (XEN)    ffff8304103e7e68 ffff8304103b67c0 ffff8304103b7000 ffff8304103b67c0
> (XEN)    0000000d00000037 0000000000000003 0000000000000002 00007f43a3b14004
> (XEN)    00007ffd5d925590 0000000000000000 0000000100000000 0000000000000000
> (XEN)    00000000ea8f8000 0000000000000000 00007ffd00000000 0000000000000000
> (XEN)    00007f43a276f557 0000000000000000 00000000ea8f8000 0000000000000000
> (XEN)    00007ffd5d9255e0 00007f43a23280b2 00007ffd5d926058 ffff8304103e7f18
> (XEN)    ffff8300d19fe000 0000000000000024 ffff82d0802053e5 deadbeefdeadf00d
> (XEN)    ffff8304103e7f08 ffff82d080351565 010000003fffffff 00007f43a3b13004
> (XEN)    deadbeefdeadf00d deadbeefdeadf00d deadbeefdeadf00d deadbeefdeadf00d
> (XEN)    ffff8800781425c0 ffff88007ce94300 ffff8304103e7ed8 ffff82d0802719ec
> (XEN) Xen call trace:
> (XEN)    [<ffff82d0802496d2>] hvm_save_one+0x145/0x1fd
> (XEN)    [<ffff82d0802683d4>] arch_do_domctl+0xa7a/0x259f
> (XEN)    [<ffff82d080206c47>] do_domctl+0x1862/0x1b7b
> (XEN)    [<ffff82d080351565>] pv_hypercall+0x1ef/0x42c
> (XEN)    [<ffff82d080355106>] entry.o#test_all_events+0/0x30
> (XEN)
> (XEN) Pagetable walk from ffff830492cbb447:
> (XEN)  L4[0x106] = 00000000dbc36063 ffffffffffffffff
> (XEN)  L3[0x012] = 0000000000000000 ffffffffffffffff
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 5:
> (XEN) FATAL PAGE FAULT
> (XEN) [error_code=0000]
> (XEN) Faulting linear address: ffff830492cbb447
> (XEN) ****************************************
>
> Reported-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
> Tested-by: Razvan Cojocaru <rcojocaru@bitdefender.com>

CC'ing Julien for inclusion into 4.9

~Andrew

> ---
>  xen/common/hvm/save.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/xen/common/hvm/save.c b/xen/common/hvm/save.c
> index 78706f5..ea2e251 100644
> --- a/xen/common/hvm/save.c
> +++ b/xen/common/hvm/save.c
> @@ -113,7 +113,7 @@ int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t instance,
>          const struct hvm_save_descriptor *desc;
>  
>          rv = -ENOENT;
> -        for ( off = 0; off < (ctxt.cur - sizeof(*desc)); off += desc->length )
> +        for ( off = 0; (off + sizeof(*desc)) < ctxt.cur; off += desc->length )
>          {
>              desc = (void *)(ctxt.data + off);
>              /* Move past header */
Tim Deegan May 2, 2017, 1:41 p.m. UTC | #2
Hi,

At 16:25 +0300 on 02 May (1493742339), Razvan Cojocaru wrote:
> hvm_save_cpu_ctxt() does a memset(&ctxt, 0, sizeof(ctxt)), which
> can lead to ctxt.cur being 0. This can then crash the hypervisor
> (with FATAL PAGE FAULT) in hvm_save_one() via the
> "off < (ctxt.cur - sizeof(*desc))" for() test. This has happened
> in practice with a Linux VM queried around shutdown:

Good fix, thanks!  But I think that memset is innocent -- it's
clearing a local "struct hvm_hw_cpu ctxt", not the caller's
"hvm_domain_context_t ctxt".  AFAICS the underflow happens when the
per-type handler returns no data at all (because all VCPUs are
offline).

With the commit message fixed,

Reviewed-by: Tim Deegan <tim@xen.org>
Razvan Cojocaru May 2, 2017, 1:42 p.m. UTC | #3
On 05/02/17 16:41, Tim Deegan wrote:
> Hi,
> 
> At 16:25 +0300 on 02 May (1493742339), Razvan Cojocaru wrote:
>> hvm_save_cpu_ctxt() does a memset(&ctxt, 0, sizeof(ctxt)), which
>> can lead to ctxt.cur being 0. This can then crash the hypervisor
>> (with FATAL PAGE FAULT) in hvm_save_one() via the
>> "off < (ctxt.cur - sizeof(*desc))" for() test. This has happened
>> in practice with a Linux VM queried around shutdown:
> 
> Good fix, thanks!  But I think that memset is innocent -- it's
> clearing a local "struct hvm_hw_cpu ctxt", not the caller's
> "hvm_domain_context_t ctxt".  AFAICS the underflow happens when the
> per-type handler returns no data at all (because all VCPUs are
> offline).
> 
> With the commit message fixed,
> 
> Reviewed-by: Tim Deegan <tim@xen.org>

Indeed, sorry about the misunderstanding. I'll fix the commit message
and resend V2.


Thanks,
Razvan
Jan Beulich May 2, 2017, 1:48 p.m. UTC | #4
>>> On 02.05.17 at 15:25, <rcojocaru@bitdefender.com> wrote:
> hvm_save_cpu_ctxt() does a memset(&ctxt, 0, sizeof(ctxt)), which
> can lead to ctxt.cur being 0. This can then crash the hypervisor
> (with FATAL PAGE FAULT) in hvm_save_one() via the
> "off < (ctxt.cur - sizeof(*desc))" for() test. This has happened
> in practice with a Linux VM queried around shutdown:

While a fix is indeed needed here, I can't connect your description
with the actual crash: The memset() you refer to affects a local
variable only, which happens to have the same name as a different
variable in the caller. You also don't make clear at all why this would
be an issue after guest shutdown was initiated already. Aiui the
problem is that hvm_save_cpu_ctxt() might find all vCPU-s having
VPF_down set, in which case it wouldn't put anything into the passed
in buffer.

> --- a/xen/common/hvm/save.c
> +++ b/xen/common/hvm/save.c
> @@ -113,7 +113,7 @@ int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t instance,
>          const struct hvm_save_descriptor *desc;
>  
>          rv = -ENOENT;
> -        for ( off = 0; off < (ctxt.cur - sizeof(*desc)); off += desc->length )
> +        for ( off = 0; (off + sizeof(*desc)) < ctxt.cur; off += desc->length )
>          {
>              desc = (void *)(ctxt.data + off);
>              /* Move past header */

I don't think this is an appropriate fix. Instead I think the function
should check whether it got back any data at all, prior to entering
the loop. Furthermore it might be worth considering to (also)
refuse doing anything here if the domain's is_dying marker has
already been set.

Jan
Razvan Cojocaru May 2, 2017, 1:54 p.m. UTC | #5
On 05/02/17 16:48, Jan Beulich wrote:
>>>> On 02.05.17 at 15:25, <rcojocaru@bitdefender.com> wrote:
>> hvm_save_cpu_ctxt() does a memset(&ctxt, 0, sizeof(ctxt)), which
>> can lead to ctxt.cur being 0. This can then crash the hypervisor
>> (with FATAL PAGE FAULT) in hvm_save_one() via the
>> "off < (ctxt.cur - sizeof(*desc))" for() test. This has happened
>> in practice with a Linux VM queried around shutdown:
> 
> While a fix is indeed needed here, I can't connect your description
> with the actual crash: The memset() you refer to affects a local
> variable only, which happens to have the same name as a different
> variable in the caller. You also don't make clear at all why this would
> be an issue after guest shutdown was initiated already. Aiui the
> problem is that hvm_save_cpu_ctxt() might find all vCPU-s having
> VPF_down set, in which case it wouldn't put anything into the passed
> in buffer.

That is indeed the case, I've misunderstood the cause of the issue.

>> --- a/xen/common/hvm/save.c
>> +++ b/xen/common/hvm/save.c
>> @@ -113,7 +113,7 @@ int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t instance,
>>          const struct hvm_save_descriptor *desc;
>>  
>>          rv = -ENOENT;
>> -        for ( off = 0; off < (ctxt.cur - sizeof(*desc)); off += desc->length )
>> +        for ( off = 0; (off + sizeof(*desc)) < ctxt.cur; off += desc->length )
>>          {
>>              desc = (void *)(ctxt.data + off);
>>              /* Move past header */
> 
> I don't think this is an appropriate fix. Instead I think the function
> should check whether it got back any data at all, prior to entering
> the loop. Furthermore it might be worth considering to (also)
> refuse doing anything here if the domain's is_dying marker has
> already been set.

hvm_save_one() already checks is_dying:

 77 /* Extract a single instance of a save record, by marshalling all
 78  * records of that type and copying out the one we need. */
 79 int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t
instance,
 80                  XEN_GUEST_HANDLE_64(uint8) handle)
 81 {
 82     int rv = 0;
 83     size_t sz = 0;
 84     struct vcpu *v;
 85     hvm_domain_context_t ctxt = { 0, };
 86
 87     if ( d->is_dying
 88          || typecode > HVM_SAVE_CODE_MAX
 89          || hvm_sr_handlers[typecode].size < sizeof(struct
hvm_save_descriptor)
 90          || hvm_sr_handlers[typecode].save == NULL )
 91         return -EINVAL;

As for checking whether the handler wrote any data, I believe that
Andrew has checked and none of the handlers report when no data is being
passed on.


Thanks,
Razvan
Andrew Cooper May 2, 2017, 1:55 p.m. UTC | #6
On 02/05/17 14:48, Jan Beulich wrote:
>>>> On 02.05.17 at 15:25, <rcojocaru@bitdefender.com> wrote:
>> hvm_save_cpu_ctxt() does a memset(&ctxt, 0, sizeof(ctxt)), which
>> can lead to ctxt.cur being 0. This can then crash the hypervisor
>> (with FATAL PAGE FAULT) in hvm_save_one() via the
>> "off < (ctxt.cur - sizeof(*desc))" for() test. This has happened
>> in practice with a Linux VM queried around shutdown:
> While a fix is indeed needed here, I can't connect your description
> with the actual crash: The memset() you refer to affects a local
> variable only, which happens to have the same name as a different
> variable in the caller. You also don't make clear at all why this would
> be an issue after guest shutdown was initiated already. Aiui the
> problem is that hvm_save_cpu_ctxt() might find all vCPU-s having
> VPF_down set, in which case it wouldn't put anything into the passed
> in buffer.
>
>> --- a/xen/common/hvm/save.c
>> +++ b/xen/common/hvm/save.c
>> @@ -113,7 +113,7 @@ int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t instance,
>>          const struct hvm_save_descriptor *desc;
>>  
>>          rv = -ENOENT;
>> -        for ( off = 0; off < (ctxt.cur - sizeof(*desc)); off += desc->length )
>> +        for ( off = 0; (off + sizeof(*desc)) < ctxt.cur; off += desc->length )
>>          {
>>              desc = (void *)(ctxt.data + off);
>>              /* Move past header */
> I don't think this is an appropriate fix. Instead I think the function
> should check whether it got back any data at all, prior to entering
> the loop. Furthermore it might be worth considering to (also)
> refuse doing anything here if the domain's is_dying marker has
> already been set.

This is only one example where no data would be returned.  There are
other examples (e.g. xsave) which can manifest zero data during normal
runtime.

Furthermore, is_dying shouldn't be checked, because it will break the
use of tools such as xen-hvmctx on domains which have been preserved on
crash.  (Although this usecase highlights another rats nest of problems,
when using such tools at the same moment that the toolstack decides to
finally clean up a domain.)

~Andrew
Jan Beulich May 2, 2017, 2:09 p.m. UTC | #7
>>> On 02.05.17 at 15:54, <rcojocaru@bitdefender.com> wrote:
> On 05/02/17 16:48, Jan Beulich wrote:
>>>>> On 02.05.17 at 15:25, <rcojocaru@bitdefender.com> wrote:
>>> --- a/xen/common/hvm/save.c
>>> +++ b/xen/common/hvm/save.c
>>> @@ -113,7 +113,7 @@ int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t instance,
>>>          const struct hvm_save_descriptor *desc;
>>>  
>>>          rv = -ENOENT;
>>> -        for ( off = 0; off < (ctxt.cur - sizeof(*desc)); off += desc->length 
> )
>>> +        for ( off = 0; (off + sizeof(*desc)) < ctxt.cur; off += desc->length 
> )
>>>          {
>>>              desc = (void *)(ctxt.data + off);
>>>              /* Move past header */
>> 
>> I don't think this is an appropriate fix. Instead I think the function
>> should check whether it got back any data at all, prior to entering
>> the loop. Furthermore it might be worth considering to (also)
>> refuse doing anything here if the domain's is_dying marker has
>> already been set.
> 
> hvm_save_one() already checks is_dying:
> 
>  77 /* Extract a single instance of a save record, by marshalling all
>  78  * records of that type and copying out the one we need. */
>  79 int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t
> instance,
>  80                  XEN_GUEST_HANDLE_64(uint8) handle)
>  81 {
>  82     int rv = 0;
>  83     size_t sz = 0;
>  84     struct vcpu *v;
>  85     hvm_domain_context_t ctxt = { 0, };
>  86
>  87     if ( d->is_dying
>  88          || typecode > HVM_SAVE_CODE_MAX
>  89          || hvm_sr_handlers[typecode].size < sizeof(struct
> hvm_save_descriptor)
>  90          || hvm_sr_handlers[typecode].save == NULL )
>  91         return -EINVAL;

Hmm, interesting. The timing window to see is_dying clear here,
bit no vCPU-s left there should be pretty small, so I wonder how
you've managed to hit it. But anyway ...

> As for checking whether the handler wrote any data, I believe that
> Andrew has checked and none of the handlers report when no data is being
> passed on.

... that's not what I've read out of his replies. I don't think the
handlers need to report anything special. It is the caller which
should check whether, despite having got back "success" there's
no data in the buffer.

Jan
diff mbox

Patch

diff --git a/xen/common/hvm/save.c b/xen/common/hvm/save.c
index 78706f5..ea2e251 100644
--- a/xen/common/hvm/save.c
+++ b/xen/common/hvm/save.c
@@ -113,7 +113,7 @@  int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t instance,
         const struct hvm_save_descriptor *desc;
 
         rv = -ENOENT;
-        for ( off = 0; off < (ctxt.cur - sizeof(*desc)); off += desc->length )
+        for ( off = 0; (off + sizeof(*desc)) < ctxt.cur; off += desc->length )
         {
             desc = (void *)(ctxt.data + off);
             /* Move past header */