Message ID | 1454349419-18430-1-git-send-email-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 2/1/2016 7:56 PM, Andrew Cooper wrote: > c/s 428607a "x86: shrink 'struct domain', was already PAGE_SIZE" introduced a > use-after-free error during domain destruction, because of the order in which > timers are torn down. > > (XEN) Xen call trace: > (XEN) [<ffff82d08013344e>] spinlock.c#check_lock+0x1e/0x40 > (XEN) [<ffff82d08013349b>] _spin_lock+0x11/0x52 > (XEN) [<ffff82d0801e8076>] vpt.c#pt_lock+0x24/0x40 > (XEN) [<ffff82d0801e88f4>] destroy_periodic_time+0x18/0x81 > (XEN) [<ffff82d0801e1089>] rtc_deinit+0x53/0x78 > (XEN) [<ffff82d0801d1e5a>] hvm_domain_destroy+0x52/0x69 > (XEN) [<ffff82d08016a758>] arch_domain_destroy+0x1a/0x98 > (XEN) [<ffff82d080107cd5>] domain.c#complete_domain_destroy+0x6f/0x182 > (XEN) [<ffff82d080126a19>] rcupdate.c#rcu_process_callbacks+0x144/0x1a6 > (XEN) [<ffff82d080132c52>] softirq.c#__do_softirq+0x82/0x8d > (XEN) [<ffff82d080132caa>] do_softirq+0x13/0x15 > (XEN) [<ffff82d080248ae1>] entry.o#process_softirqs+0x21/0x30 > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 3: > (XEN) GENERAL PROTECTION FAULT > (XEN) [error_code=0000] > (XEN) **************************************** > > Defer the freeing of d->arch.hvm_domain.pl_time until all timers have been > destroyed. > > For safety, NULL out the pointers after freeing them, in an attempt to make > mistakes more obvious in the future. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > --- > CC: Jan Beulich <JBeulich@suse.com> > CC: Corneliu ZUZU <czuzu@bitdefender.com> > --- > xen/arch/x86/hvm/hvm.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c > index f24400d..38c65b3 100644 > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -1674,8 +1674,10 @@ void hvm_domain_relinquish_resources(struct domain *d) > void hvm_domain_destroy(struct domain *d) > { > xfree(d->arch.hvm_domain.io_handler); > + d->arch.hvm_domain.io_handler = NULL; > + > xfree(d->arch.hvm_domain.params); > - xfree(d->arch.hvm_domain.pl_time); > + d->arch.hvm_domain.params = NULL; > > hvm_destroy_cacheattr_region_list(d); > > @@ -1686,6 +1688,9 @@ void hvm_domain_destroy(struct domain *d) > rtc_deinit(d); > stdvga_deinit(d); > vioapic_deinit(d); > + > + xfree(d->arch.hvm_domain.pl_time); > + d->arch.hvm_domain.pl_time = NULL; > } > > static int hvm_save_tsc_adjust(struct domain *d, hvm_domain_context_t *h) Ups, really sorry, ashamed to say I've done the mistake of not actually testing this on a machine (working on ARM here). Won't happen again, if I'm to send another patch I'll be sure to actually setup an X86 Dom0 & at least do some basic tests. Regarding the "set to NULL after free" practice, wouldn't it be wise to define an xfreeandnull(void**) macro and use the "unsafe" xfree only when it makes sense to (proly most of the time it won't)? Corneliu.
On 02/02/16 08:00, Corneliu ZUZU wrote: > On 2/1/2016 7:56 PM, Andrew Cooper wrote: >> c/s 428607a "x86: shrink 'struct domain', was already PAGE_SIZE" >> introduced a >> use-after-free error during domain destruction, because of the order >> in which >> timers are torn down. >> >> (XEN) Xen call trace: >> (XEN) [<ffff82d08013344e>] spinlock.c#check_lock+0x1e/0x40 >> (XEN) [<ffff82d08013349b>] _spin_lock+0x11/0x52 >> (XEN) [<ffff82d0801e8076>] vpt.c#pt_lock+0x24/0x40 >> (XEN) [<ffff82d0801e88f4>] destroy_periodic_time+0x18/0x81 >> (XEN) [<ffff82d0801e1089>] rtc_deinit+0x53/0x78 >> (XEN) [<ffff82d0801d1e5a>] hvm_domain_destroy+0x52/0x69 >> (XEN) [<ffff82d08016a758>] arch_domain_destroy+0x1a/0x98 >> (XEN) [<ffff82d080107cd5>] >> domain.c#complete_domain_destroy+0x6f/0x182 >> (XEN) [<ffff82d080126a19>] >> rcupdate.c#rcu_process_callbacks+0x144/0x1a6 >> (XEN) [<ffff82d080132c52>] softirq.c#__do_softirq+0x82/0x8d >> (XEN) [<ffff82d080132caa>] do_softirq+0x13/0x15 >> (XEN) [<ffff82d080248ae1>] entry.o#process_softirqs+0x21/0x30 >> (XEN) >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 3: >> (XEN) GENERAL PROTECTION FAULT >> (XEN) [error_code=0000] >> (XEN) **************************************** >> >> Defer the freeing of d->arch.hvm_domain.pl_time until all timers have >> been >> destroyed. >> >> For safety, NULL out the pointers after freeing them, in an attempt >> to make >> mistakes more obvious in the future. >> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> --- >> CC: Jan Beulich <JBeulich@suse.com> >> CC: Corneliu ZUZU <czuzu@bitdefender.com> >> --- >> xen/arch/x86/hvm/hvm.c | 7 ++++++- >> 1 file changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c >> index f24400d..38c65b3 100644 >> --- a/xen/arch/x86/hvm/hvm.c >> +++ b/xen/arch/x86/hvm/hvm.c >> @@ -1674,8 +1674,10 @@ void hvm_domain_relinquish_resources(struct >> domain *d) >> void hvm_domain_destroy(struct domain *d) >> { >> xfree(d->arch.hvm_domain.io_handler); >> + d->arch.hvm_domain.io_handler = NULL; >> + >> xfree(d->arch.hvm_domain.params); >> - xfree(d->arch.hvm_domain.pl_time); >> + d->arch.hvm_domain.params = NULL; >> hvm_destroy_cacheattr_region_list(d); >> @@ -1686,6 +1688,9 @@ void hvm_domain_destroy(struct domain *d) >> rtc_deinit(d); >> stdvga_deinit(d); >> vioapic_deinit(d); >> + >> + xfree(d->arch.hvm_domain.pl_time); >> + d->arch.hvm_domain.pl_time = NULL; >> } >> static int hvm_save_tsc_adjust(struct domain *d, >> hvm_domain_context_t *h) > > Ups, really sorry, ashamed to say I've done the mistake of not > actually testing this on a machine (working on ARM here). Won't happen > again, if I'm to send another patch I'll be sure to actually setup an > X86 Dom0 & at least do some basic tests. At least a boot test would certainly be nice. In this case however, the build passed all but a single one of the XenServer sanity tests, and even the failure here was just chance that the region got reused in a way which would be noticed. This is, after all, precisely the reason why development branches are not generally known to be stable. > Regarding the "set to NULL after free" practice, wouldn't it be wise > to define an xfreeandnull(void**) macro and use the "unsafe" xfree > only when it makes sense to (proly most of the time it won't)? This idea has been suggested in the past. I can't remember what became of it. ~Andrew
>>> On 01.02.16 at 18:56, <andrew.cooper3@citrix.com> wrote: > For safety, NULL out the pointers after freeing them, in an attempt to make > mistakes more obvious in the future. Except that NULLing isn't really adding that much safety, and we'd be better off poisoning such pointers. Nevertheless ... > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
On 02/02/16 10:43, Jan Beulich wrote: >>>> On 01.02.16 at 18:56, <andrew.cooper3@citrix.com> wrote: >> For safety, NULL out the pointers after freeing them, in an attempt to make >> mistakes more obvious in the future. > Except that NULLing isn't really adding that much safety, and we'd > be better off poisoning such pointers. Nevertheless ... NULLing the pointers would cause things like rtc_deinit() to always blow up when it followed the NULL pointer. IMO, we should unconditionally always NULL pointers when freeing a pointer which isn't in local scope. It would make issues such as these completely obvious. ~Andrew
>>> On 02.02.16 at 11:48, <andrew.cooper3@citrix.com> wrote: > On 02/02/16 10:43, Jan Beulich wrote: >>>>> On 01.02.16 at 18:56, <andrew.cooper3@citrix.com> wrote: >>> For safety, NULL out the pointers after freeing them, in an attempt to make >>> mistakes more obvious in the future. >> Except that NULLing isn't really adding that much safety, and we'd >> be better off poisoning such pointers. Nevertheless ... > > NULLing the pointers would cause things like rtc_deinit() to always blow > up when it followed the NULL pointer. > > IMO, we should unconditionally always NULL pointers when freeing a > pointer which isn't in local scope. It would make issues such as these > completely obvious. As would poisoning the pointers, yet poisoning has the advantage of not allowing PV guests to control what the hypervisor might access when erroneously de-referencing such a pointer. Jan
On 02/02/16 10:52, Jan Beulich wrote: >>>> On 02.02.16 at 11:48, <andrew.cooper3@citrix.com> wrote: >> On 02/02/16 10:43, Jan Beulich wrote: >>>>>> On 01.02.16 at 18:56, <andrew.cooper3@citrix.com> wrote: >>>> For safety, NULL out the pointers after freeing them, in an attempt to make >>>> mistakes more obvious in the future. >>> Except that NULLing isn't really adding that much safety, and we'd >>> be better off poisoning such pointers. Nevertheless ... >> NULLing the pointers would cause things like rtc_deinit() to always blow >> up when it followed the NULL pointer. >> >> IMO, we should unconditionally always NULL pointers when freeing a >> pointer which isn't in local scope. It would make issues such as these >> completely obvious. > As would poisoning the pointers, yet poisoning has the advantage > of not allowing PV guests to control what the hypervisor might > access when erroneously de-referencing such a pointer. Hmm. If we taught xfree() about this poisoned value and it treated it just as it would NULL, then this would work. I will put it on my todo list, unless anyone else beats me to it. ~Andrew
On 2/2/2016 12:52 PM, Jan Beulich wrote: >> NULLing the pointers would cause things like rtc_deinit() to always blow >> up when it followed the NULL pointer. >> >> IMO, we should unconditionally always NULL pointers when freeing a >> pointer which isn't in local scope. It would make issues such as these >> completely obvious. > As would poisoning the pointers, yet poisoning has the advantage > of not allowing PV guests to control what the hypervisor might > access when erroneously de-referencing such a pointer. > > Jan Jan, that sounds interesting. I hope I'm not intruding, but when you have the time, could you please expand on this? Besides distinguishing a nuked pointer from zeroed-out memory, I did not know of any other advantage of 0xDEADBEEF pointer poisoning (generally or specifically). How could possibly setting a pointer to NULL allow a PV guest to control what the hypervisor might access, if the hypervisor *can't access* a NULL pointer? And can a PV guest write data @ *hypervisor's* 0 page (virtual and/or physical)? Corneliu.
>>> On 02.02.16 at 12:39, <czuzu@bitdefender.com> wrote: > On 2/2/2016 12:52 PM, Jan Beulich wrote: >>> NULLing the pointers would cause things like rtc_deinit() to always blow >>> up when it followed the NULL pointer. >>> >>> IMO, we should unconditionally always NULL pointers when freeing a >>> pointer which isn't in local scope. It would make issues such as these >>> completely obvious. >> As would poisoning the pointers, yet poisoning has the advantage >> of not allowing PV guests to control what the hypervisor might >> access when erroneously de-referencing such a pointer. > > Jan, that sounds interesting. I hope I'm not intruding, but when you > have the time, could you please expand on this? > Besides distinguishing a nuked pointer from zeroed-out memory, I did not > know of any other advantage of 0xDEADBEEF pointer poisoning (generally > or specifically). > How could possibly setting a pointer to NULL allow a PV guest to control > what the hypervisor might access, if the hypervisor *can't access* a > NULL pointer? > And can a PV guest write data @ *hypervisor's* 0 page (virtual and/or > physical)? Since the answer to this last question is "yes" (for the virtual page 0 only of course), I suppose the rest is obvious. Jan
On 02/02/16 11:39, Corneliu ZUZU wrote: > On 2/2/2016 12:52 PM, Jan Beulich wrote: >>> NULLing the pointers would cause things like rtc_deinit() to always >>> blow >>> up when it followed the NULL pointer. >>> >>> IMO, we should unconditionally always NULL pointers when freeing a >>> pointer which isn't in local scope. It would make issues such as these >>> completely obvious. >> As would poisoning the pointers, yet poisoning has the advantage >> of not allowing PV guests to control what the hypervisor might >> access when erroneously de-referencing such a pointer. >> >> Jan > > Jan, that sounds interesting. I hope I'm not intruding, but when you > have the time, could you please expand on this? > Besides distinguishing a nuked pointer from zeroed-out memory, I did > not know of any other advantage of 0xDEADBEEF pointer poisoning > (generally or specifically). Xen is 64bit only these days. Bit 47 of a 64bit pointer must be signed extended, or a #GP fault occurs because of the use of a non-canonical address. Therefore, a pointer such as 0xdeadc0de00000000ULL will unconditionally cause a #GP fault if Xen accidentally used it. > How could possibly setting a pointer to NULL allow a PV guest to > control what the hypervisor might access, if the hypervisor *can't > access* a NULL pointer? > And can a PV guest write data @ *hypervisor's* 0 page (virtual and/or > physical)? Xen and PV guests share the virtual address space, in exactly the same way as a native kernel and its userspace. PV guests can map pages at 0. Therefore, if Xen were to accidentally follow a NULL pointer, it may not result in a pagefault. (Hardware mechanisms such as SMEP and SMAP are added protection against this, but don't work on older hardware) ~Andrew
On 2/2/2016 2:05 PM, Andrew Cooper wrote: > Xen and PV guests share the virtual address space, in exactly the same > way as a native kernel and its userspace. PV guests can map pages at > 0. Therefore, if Xen were to accidentally follow a NULL pointer, it > may not result in a pagefault. (Hardware mechanisms such as SMEP and > SMAP are added protection against this, but don't work on older hardware) > ~Andrew Thank you, I finally got it. (I also read http://wiki.xenproject.org/wiki/X86_Paravirtualised_Memory_Management , cleared things up) Corneliu.
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index f24400d..38c65b3 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -1674,8 +1674,10 @@ void hvm_domain_relinquish_resources(struct domain *d) void hvm_domain_destroy(struct domain *d) { xfree(d->arch.hvm_domain.io_handler); + d->arch.hvm_domain.io_handler = NULL; + xfree(d->arch.hvm_domain.params); - xfree(d->arch.hvm_domain.pl_time); + d->arch.hvm_domain.params = NULL; hvm_destroy_cacheattr_region_list(d); @@ -1686,6 +1688,9 @@ void hvm_domain_destroy(struct domain *d) rtc_deinit(d); stdvga_deinit(d); vioapic_deinit(d); + + xfree(d->arch.hvm_domain.pl_time); + d->arch.hvm_domain.pl_time = NULL; } static int hvm_save_tsc_adjust(struct domain *d, hvm_domain_context_t *h)
c/s 428607a "x86: shrink 'struct domain', was already PAGE_SIZE" introduced a use-after-free error during domain destruction, because of the order in which timers are torn down. (XEN) Xen call trace: (XEN) [<ffff82d08013344e>] spinlock.c#check_lock+0x1e/0x40 (XEN) [<ffff82d08013349b>] _spin_lock+0x11/0x52 (XEN) [<ffff82d0801e8076>] vpt.c#pt_lock+0x24/0x40 (XEN) [<ffff82d0801e88f4>] destroy_periodic_time+0x18/0x81 (XEN) [<ffff82d0801e1089>] rtc_deinit+0x53/0x78 (XEN) [<ffff82d0801d1e5a>] hvm_domain_destroy+0x52/0x69 (XEN) [<ffff82d08016a758>] arch_domain_destroy+0x1a/0x98 (XEN) [<ffff82d080107cd5>] domain.c#complete_domain_destroy+0x6f/0x182 (XEN) [<ffff82d080126a19>] rcupdate.c#rcu_process_callbacks+0x144/0x1a6 (XEN) [<ffff82d080132c52>] softirq.c#__do_softirq+0x82/0x8d (XEN) [<ffff82d080132caa>] do_softirq+0x13/0x15 (XEN) [<ffff82d080248ae1>] entry.o#process_softirqs+0x21/0x30 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 3: (XEN) GENERAL PROTECTION FAULT (XEN) [error_code=0000] (XEN) **************************************** Defer the freeing of d->arch.hvm_domain.pl_time until all timers have been destroyed. For safety, NULL out the pointers after freeing them, in an attempt to make mistakes more obvious in the future. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Corneliu ZUZU <czuzu@bitdefender.com> --- xen/arch/x86/hvm/hvm.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)