Message ID | 1493731539-31798-1-git-send-email-rcojocaru@bitdefender.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 02/05/17 14:25, Razvan Cojocaru wrote: > hvm_save_cpu_ctxt() does a memset(&ctxt, 0, sizeof(ctxt)), which > can lead to ctxt.cur being 0. Unfortunately, different objects both named ctxt. > This can then crash the hypervisor > (with FATAL PAGE FAULT) in hvm_save_one() via the > "off < (ctxt.cur - sizeof(*desc))" for() test. This has happened > in practice with a Linux VM queried around shutdown: The problem is that hvm_save_cpu_ctxt() returns success without writing any data into hvm_domain_context_t in the case that all CPUs are offline. This then causes an underflow in the range calculation in hvm_save_one() which causes Xen to wander over uninitialised data and off the end of the buffer. It is probably worth saying that this has been broken since Xen 4.4 (c/s e019c606f59 specifically) > > (XEN) hvm.c:1595:d3v0 All CPUs offline -- powering off. > (XEN) ----[ Xen-4.9-rc x86_64 debug=y Not tainted ]---- > (XEN) CPU: 5 > (XEN) RIP: e008:[<ffff82d0802496d2>] hvm_save_one+0x145/0x1fd > (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor (d0v2) > (XEN) rax: ffff830492cbb445 rbx: 0000000000000000 rcx: ffff83039343b400 > (XEN) rdx: 00000000ff88004d rsi: fffffffffffffff8 rdi: 0000000000000000 > (XEN) rbp: ffff8304103e7c88 rsp: ffff8304103e7c48 r8: 0000000000000001 > (XEN) r9: deadbeefdeadf00d r10: 0000000000000000 r11: 0000000000000282 > (XEN) r12: 00007f43a3b14004 r13: 00000000fffffffe r14: 0000000000000000 > (XEN) r15: ffff830400c41000 cr0: 0000000080050033 cr4: 00000000001526e0 > (XEN) cr3: 0000000402e13000 cr2: ffff830492cbb447 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 > (XEN) Xen code around <ffff82d0802496d2> (hvm_save_one+0x145/0x1fd): > (XEN) 00 00 48 01 c8 83 c2 08 <66> 39 58 02 75 64 eb 08 48 89 c8 ba 08 00 00 00 > (XEN) Xen stack trace from rsp=ffff8304103e7c48: > (XEN) 0000041000000000 ffff83039343b400 ffff8304103e7c70 ffff8304103e7da8 > (XEN) ffff830400c41000 00007f43a3b13004 ffff8304103b7000 ffffffffffffffea > (XEN) ffff8304103e7d48 ffff82d0802683d4 ffff8300d19fd000 ffff82d0802320d8 > (XEN) ffff830400c41000 0000000000000000 ffff8304103e7cd8 ffff82d08026ff3d > (XEN) 0000000000000000 ffff8300d19fd000 ffff8304103e7cf8 ffff82d080232142 > (XEN) 0000000000000000 ffff8300d19fd000 ffff8304103e7d28 ffff82d080207051 > (XEN) ffff8304103e7d18 ffff830400c41000 0000000000000202 ffff830400c41000 > (XEN) 0000000000000000 00007f43a3b13004 0000000000000000 deadbeefdeadf00d > (XEN) ffff8304103e7e68 ffff82d080206c47 0700000000000000 ffff830410375bd0 > (XEN) 0000000000000296 ffff830410375c78 ffff830410375c80 0000000000000003 > (XEN) ffff8304103e7e68 ffff8304103b67c0 ffff8304103b7000 ffff8304103b67c0 > (XEN) 0000000d00000037 0000000000000003 0000000000000002 00007f43a3b14004 > (XEN) 00007ffd5d925590 0000000000000000 0000000100000000 0000000000000000 > (XEN) 00000000ea8f8000 0000000000000000 00007ffd00000000 0000000000000000 > (XEN) 00007f43a276f557 0000000000000000 00000000ea8f8000 0000000000000000 > (XEN) 00007ffd5d9255e0 00007f43a23280b2 00007ffd5d926058 ffff8304103e7f18 > (XEN) ffff8300d19fe000 0000000000000024 ffff82d0802053e5 deadbeefdeadf00d > (XEN) ffff8304103e7f08 ffff82d080351565 010000003fffffff 00007f43a3b13004 > (XEN) deadbeefdeadf00d deadbeefdeadf00d deadbeefdeadf00d deadbeefdeadf00d > (XEN) ffff8800781425c0 ffff88007ce94300 ffff8304103e7ed8 ffff82d0802719ec > (XEN) Xen call trace: > (XEN) [<ffff82d0802496d2>] hvm_save_one+0x145/0x1fd > (XEN) [<ffff82d0802683d4>] arch_do_domctl+0xa7a/0x259f > (XEN) [<ffff82d080206c47>] do_domctl+0x1862/0x1b7b > (XEN) [<ffff82d080351565>] pv_hypercall+0x1ef/0x42c > (XEN) [<ffff82d080355106>] entry.o#test_all_events+0/0x30 > (XEN) > (XEN) Pagetable walk from ffff830492cbb447: > (XEN) L4[0x106] = 00000000dbc36063 ffffffffffffffff > (XEN) L3[0x012] = 0000000000000000 ffffffffffffffff > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 5: > (XEN) FATAL PAGE FAULT > (XEN) [error_code=0000] > (XEN) Faulting linear address: ffff830492cbb447 > (XEN) **************************************** > > Reported-by: Razvan Cojocaru <rcojocaru@bitdefender.com> > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com> > Tested-by: Razvan Cojocaru <rcojocaru@bitdefender.com> CC'ing Julien for inclusion into 4.9 ~Andrew > --- > xen/common/hvm/save.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/xen/common/hvm/save.c b/xen/common/hvm/save.c > index 78706f5..ea2e251 100644 > --- a/xen/common/hvm/save.c > +++ b/xen/common/hvm/save.c > @@ -113,7 +113,7 @@ int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t instance, > const struct hvm_save_descriptor *desc; > > rv = -ENOENT; > - for ( off = 0; off < (ctxt.cur - sizeof(*desc)); off += desc->length ) > + for ( off = 0; (off + sizeof(*desc)) < ctxt.cur; off += desc->length ) > { > desc = (void *)(ctxt.data + off); > /* Move past header */
Hi, At 16:25 +0300 on 02 May (1493742339), Razvan Cojocaru wrote: > hvm_save_cpu_ctxt() does a memset(&ctxt, 0, sizeof(ctxt)), which > can lead to ctxt.cur being 0. This can then crash the hypervisor > (with FATAL PAGE FAULT) in hvm_save_one() via the > "off < (ctxt.cur - sizeof(*desc))" for() test. This has happened > in practice with a Linux VM queried around shutdown: Good fix, thanks! But I think that memset is innocent -- it's clearing a local "struct hvm_hw_cpu ctxt", not the caller's "hvm_domain_context_t ctxt". AFAICS the underflow happens when the per-type handler returns no data at all (because all VCPUs are offline). With the commit message fixed, Reviewed-by: Tim Deegan <tim@xen.org>
On 05/02/17 16:41, Tim Deegan wrote: > Hi, > > At 16:25 +0300 on 02 May (1493742339), Razvan Cojocaru wrote: >> hvm_save_cpu_ctxt() does a memset(&ctxt, 0, sizeof(ctxt)), which >> can lead to ctxt.cur being 0. This can then crash the hypervisor >> (with FATAL PAGE FAULT) in hvm_save_one() via the >> "off < (ctxt.cur - sizeof(*desc))" for() test. This has happened >> in practice with a Linux VM queried around shutdown: > > Good fix, thanks! But I think that memset is innocent -- it's > clearing a local "struct hvm_hw_cpu ctxt", not the caller's > "hvm_domain_context_t ctxt". AFAICS the underflow happens when the > per-type handler returns no data at all (because all VCPUs are > offline). > > With the commit message fixed, > > Reviewed-by: Tim Deegan <tim@xen.org> Indeed, sorry about the misunderstanding. I'll fix the commit message and resend V2. Thanks, Razvan
>>> On 02.05.17 at 15:25, <rcojocaru@bitdefender.com> wrote: > hvm_save_cpu_ctxt() does a memset(&ctxt, 0, sizeof(ctxt)), which > can lead to ctxt.cur being 0. This can then crash the hypervisor > (with FATAL PAGE FAULT) in hvm_save_one() via the > "off < (ctxt.cur - sizeof(*desc))" for() test. This has happened > in practice with a Linux VM queried around shutdown: While a fix is indeed needed here, I can't connect your description with the actual crash: The memset() you refer to affects a local variable only, which happens to have the same name as a different variable in the caller. You also don't make clear at all why this would be an issue after guest shutdown was initiated already. Aiui the problem is that hvm_save_cpu_ctxt() might find all vCPU-s having VPF_down set, in which case it wouldn't put anything into the passed in buffer. > --- a/xen/common/hvm/save.c > +++ b/xen/common/hvm/save.c > @@ -113,7 +113,7 @@ int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t instance, > const struct hvm_save_descriptor *desc; > > rv = -ENOENT; > - for ( off = 0; off < (ctxt.cur - sizeof(*desc)); off += desc->length ) > + for ( off = 0; (off + sizeof(*desc)) < ctxt.cur; off += desc->length ) > { > desc = (void *)(ctxt.data + off); > /* Move past header */ I don't think this is an appropriate fix. Instead I think the function should check whether it got back any data at all, prior to entering the loop. Furthermore it might be worth considering to (also) refuse doing anything here if the domain's is_dying marker has already been set. Jan
On 05/02/17 16:48, Jan Beulich wrote: >>>> On 02.05.17 at 15:25, <rcojocaru@bitdefender.com> wrote: >> hvm_save_cpu_ctxt() does a memset(&ctxt, 0, sizeof(ctxt)), which >> can lead to ctxt.cur being 0. This can then crash the hypervisor >> (with FATAL PAGE FAULT) in hvm_save_one() via the >> "off < (ctxt.cur - sizeof(*desc))" for() test. This has happened >> in practice with a Linux VM queried around shutdown: > > While a fix is indeed needed here, I can't connect your description > with the actual crash: The memset() you refer to affects a local > variable only, which happens to have the same name as a different > variable in the caller. You also don't make clear at all why this would > be an issue after guest shutdown was initiated already. Aiui the > problem is that hvm_save_cpu_ctxt() might find all vCPU-s having > VPF_down set, in which case it wouldn't put anything into the passed > in buffer. That is indeed the case, I've misunderstood the cause of the issue. >> --- a/xen/common/hvm/save.c >> +++ b/xen/common/hvm/save.c >> @@ -113,7 +113,7 @@ int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t instance, >> const struct hvm_save_descriptor *desc; >> >> rv = -ENOENT; >> - for ( off = 0; off < (ctxt.cur - sizeof(*desc)); off += desc->length ) >> + for ( off = 0; (off + sizeof(*desc)) < ctxt.cur; off += desc->length ) >> { >> desc = (void *)(ctxt.data + off); >> /* Move past header */ > > I don't think this is an appropriate fix. Instead I think the function > should check whether it got back any data at all, prior to entering > the loop. Furthermore it might be worth considering to (also) > refuse doing anything here if the domain's is_dying marker has > already been set. hvm_save_one() already checks is_dying: 77 /* Extract a single instance of a save record, by marshalling all 78 * records of that type and copying out the one we need. */ 79 int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t instance, 80 XEN_GUEST_HANDLE_64(uint8) handle) 81 { 82 int rv = 0; 83 size_t sz = 0; 84 struct vcpu *v; 85 hvm_domain_context_t ctxt = { 0, }; 86 87 if ( d->is_dying 88 || typecode > HVM_SAVE_CODE_MAX 89 || hvm_sr_handlers[typecode].size < sizeof(struct hvm_save_descriptor) 90 || hvm_sr_handlers[typecode].save == NULL ) 91 return -EINVAL; As for checking whether the handler wrote any data, I believe that Andrew has checked and none of the handlers report when no data is being passed on. Thanks, Razvan
On 02/05/17 14:48, Jan Beulich wrote: >>>> On 02.05.17 at 15:25, <rcojocaru@bitdefender.com> wrote: >> hvm_save_cpu_ctxt() does a memset(&ctxt, 0, sizeof(ctxt)), which >> can lead to ctxt.cur being 0. This can then crash the hypervisor >> (with FATAL PAGE FAULT) in hvm_save_one() via the >> "off < (ctxt.cur - sizeof(*desc))" for() test. This has happened >> in practice with a Linux VM queried around shutdown: > While a fix is indeed needed here, I can't connect your description > with the actual crash: The memset() you refer to affects a local > variable only, which happens to have the same name as a different > variable in the caller. You also don't make clear at all why this would > be an issue after guest shutdown was initiated already. Aiui the > problem is that hvm_save_cpu_ctxt() might find all vCPU-s having > VPF_down set, in which case it wouldn't put anything into the passed > in buffer. > >> --- a/xen/common/hvm/save.c >> +++ b/xen/common/hvm/save.c >> @@ -113,7 +113,7 @@ int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t instance, >> const struct hvm_save_descriptor *desc; >> >> rv = -ENOENT; >> - for ( off = 0; off < (ctxt.cur - sizeof(*desc)); off += desc->length ) >> + for ( off = 0; (off + sizeof(*desc)) < ctxt.cur; off += desc->length ) >> { >> desc = (void *)(ctxt.data + off); >> /* Move past header */ > I don't think this is an appropriate fix. Instead I think the function > should check whether it got back any data at all, prior to entering > the loop. Furthermore it might be worth considering to (also) > refuse doing anything here if the domain's is_dying marker has > already been set. This is only one example where no data would be returned. There are other examples (e.g. xsave) which can manifest zero data during normal runtime. Furthermore, is_dying shouldn't be checked, because it will break the use of tools such as xen-hvmctx on domains which have been preserved on crash. (Although this usecase highlights another rats nest of problems, when using such tools at the same moment that the toolstack decides to finally clean up a domain.) ~Andrew
>>> On 02.05.17 at 15:54, <rcojocaru@bitdefender.com> wrote: > On 05/02/17 16:48, Jan Beulich wrote: >>>>> On 02.05.17 at 15:25, <rcojocaru@bitdefender.com> wrote: >>> --- a/xen/common/hvm/save.c >>> +++ b/xen/common/hvm/save.c >>> @@ -113,7 +113,7 @@ int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t instance, >>> const struct hvm_save_descriptor *desc; >>> >>> rv = -ENOENT; >>> - for ( off = 0; off < (ctxt.cur - sizeof(*desc)); off += desc->length > ) >>> + for ( off = 0; (off + sizeof(*desc)) < ctxt.cur; off += desc->length > ) >>> { >>> desc = (void *)(ctxt.data + off); >>> /* Move past header */ >> >> I don't think this is an appropriate fix. Instead I think the function >> should check whether it got back any data at all, prior to entering >> the loop. Furthermore it might be worth considering to (also) >> refuse doing anything here if the domain's is_dying marker has >> already been set. > > hvm_save_one() already checks is_dying: > > 77 /* Extract a single instance of a save record, by marshalling all > 78 * records of that type and copying out the one we need. */ > 79 int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t > instance, > 80 XEN_GUEST_HANDLE_64(uint8) handle) > 81 { > 82 int rv = 0; > 83 size_t sz = 0; > 84 struct vcpu *v; > 85 hvm_domain_context_t ctxt = { 0, }; > 86 > 87 if ( d->is_dying > 88 || typecode > HVM_SAVE_CODE_MAX > 89 || hvm_sr_handlers[typecode].size < sizeof(struct > hvm_save_descriptor) > 90 || hvm_sr_handlers[typecode].save == NULL ) > 91 return -EINVAL; Hmm, interesting. The timing window to see is_dying clear here, bit no vCPU-s left there should be pretty small, so I wonder how you've managed to hit it. But anyway ... > As for checking whether the handler wrote any data, I believe that > Andrew has checked and none of the handlers report when no data is being > passed on. ... that's not what I've read out of his replies. I don't think the handlers need to report anything special. It is the caller which should check whether, despite having got back "success" there's no data in the buffer. Jan
diff --git a/xen/common/hvm/save.c b/xen/common/hvm/save.c index 78706f5..ea2e251 100644 --- a/xen/common/hvm/save.c +++ b/xen/common/hvm/save.c @@ -113,7 +113,7 @@ int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t instance, const struct hvm_save_descriptor *desc; rv = -ENOENT; - for ( off = 0; off < (ctxt.cur - sizeof(*desc)); off += desc->length ) + for ( off = 0; (off + sizeof(*desc)) < ctxt.cur; off += desc->length ) { desc = (void *)(ctxt.data + off); /* Move past header */