x86/hvm: Fix use-after-free introduced by c/s 428607a

Message ID	1454349419-18430-1-git-send-email-andrew.cooper3@citrix.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xen.org> From: Andrew Cooper <andrew.cooper3@citrix.com> To: Xen-devel <xen-devel@lists.xen.org> Date: Mon, 1 Feb 2016 17:56:59 +0000 Message-ID: <1454349419-18430-1-git-send-email-andrew.cooper3@citrix.com> MIME-Version: 1.0 Cc: Andrew Cooper <andrew.cooper3@citrix.com>, Jan Beulich <JBeulich@suse.com>, Corneliu ZUZU <czuzu@bitdefender.com> Subject: [Xen-devel] [PATCH] x86/hvm: Fix use-after-free introduced by c/s 428607a Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org

Andrew Cooper Feb. 1, 2016, 5:56 p.m. UTC

c/s 428607a "x86: shrink 'struct domain', was already PAGE_SIZE" introduced a
use-after-free error during domain destruction, because of the order in which
timers are torn down.

  (XEN) Xen call trace:
  (XEN)    [<ffff82d08013344e>] spinlock.c#check_lock+0x1e/0x40
  (XEN)    [<ffff82d08013349b>] _spin_lock+0x11/0x52
  (XEN)    [<ffff82d0801e8076>] vpt.c#pt_lock+0x24/0x40
  (XEN)    [<ffff82d0801e88f4>] destroy_periodic_time+0x18/0x81
  (XEN)    [<ffff82d0801e1089>] rtc_deinit+0x53/0x78
  (XEN)    [<ffff82d0801d1e5a>] hvm_domain_destroy+0x52/0x69
  (XEN)    [<ffff82d08016a758>] arch_domain_destroy+0x1a/0x98
  (XEN)    [<ffff82d080107cd5>] domain.c#complete_domain_destroy+0x6f/0x182
  (XEN)    [<ffff82d080126a19>] rcupdate.c#rcu_process_callbacks+0x144/0x1a6
  (XEN)    [<ffff82d080132c52>] softirq.c#__do_softirq+0x82/0x8d
  (XEN)    [<ffff82d080132caa>] do_softirq+0x13/0x15
  (XEN)    [<ffff82d080248ae1>] entry.o#process_softirqs+0x21/0x30
  (XEN)
  (XEN)
  (XEN) ****************************************
  (XEN) Panic on CPU 3:
  (XEN) GENERAL PROTECTION FAULT
  (XEN) [error_code=0000]
  (XEN) ****************************************

Defer the freeing of d->arch.hvm_domain.pl_time until all timers have been
destroyed.

For safety, NULL out the pointers after freeing them, in an attempt to make
mistakes more obvious in the future.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Corneliu ZUZU <czuzu@bitdefender.com>
---
 xen/arch/x86/hvm/hvm.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Corneliu ZUZU Feb. 2, 2016, 8 a.m. UTC | #1

On 2/1/2016 7:56 PM, Andrew Cooper wrote:
> c/s 428607a "x86: shrink 'struct domain', was already PAGE_SIZE" introduced a
> use-after-free error during domain destruction, because of the order in which
> timers are torn down.
>
>    (XEN) Xen call trace:
>    (XEN)    [<ffff82d08013344e>] spinlock.c#check_lock+0x1e/0x40
>    (XEN)    [<ffff82d08013349b>] _spin_lock+0x11/0x52
>    (XEN)    [<ffff82d0801e8076>] vpt.c#pt_lock+0x24/0x40
>    (XEN)    [<ffff82d0801e88f4>] destroy_periodic_time+0x18/0x81
>    (XEN)    [<ffff82d0801e1089>] rtc_deinit+0x53/0x78
>    (XEN)    [<ffff82d0801d1e5a>] hvm_domain_destroy+0x52/0x69
>    (XEN)    [<ffff82d08016a758>] arch_domain_destroy+0x1a/0x98
>    (XEN)    [<ffff82d080107cd5>] domain.c#complete_domain_destroy+0x6f/0x182
>    (XEN)    [<ffff82d080126a19>] rcupdate.c#rcu_process_callbacks+0x144/0x1a6
>    (XEN)    [<ffff82d080132c52>] softirq.c#__do_softirq+0x82/0x8d
>    (XEN)    [<ffff82d080132caa>] do_softirq+0x13/0x15
>    (XEN)    [<ffff82d080248ae1>] entry.o#process_softirqs+0x21/0x30
>    (XEN)
>    (XEN)
>    (XEN) ****************************************
>    (XEN) Panic on CPU 3:
>    (XEN) GENERAL PROTECTION FAULT
>    (XEN) [error_code=0000]
>    (XEN) ****************************************
>
> Defer the freeing of d->arch.hvm_domain.pl_time until all timers have been
> destroyed.
>
> For safety, NULL out the pointers after freeing them, in an attempt to make
> mistakes more obvious in the future.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Corneliu ZUZU <czuzu@bitdefender.com>
> ---
>   xen/arch/x86/hvm/hvm.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index f24400d..38c65b3 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -1674,8 +1674,10 @@ void hvm_domain_relinquish_resources(struct domain *d)
>   void hvm_domain_destroy(struct domain *d)
>   {
>       xfree(d->arch.hvm_domain.io_handler);
> +    d->arch.hvm_domain.io_handler = NULL;
> +
>       xfree(d->arch.hvm_domain.params);
> -    xfree(d->arch.hvm_domain.pl_time);
> +    d->arch.hvm_domain.params = NULL;
>   
>       hvm_destroy_cacheattr_region_list(d);
>   
> @@ -1686,6 +1688,9 @@ void hvm_domain_destroy(struct domain *d)
>       rtc_deinit(d);
>       stdvga_deinit(d);
>       vioapic_deinit(d);
> +
> +    xfree(d->arch.hvm_domain.pl_time);
> +    d->arch.hvm_domain.pl_time = NULL;
>   }
>   
>   static int hvm_save_tsc_adjust(struct domain *d, hvm_domain_context_t *h)

Ups, really sorry, ashamed to say I've done the mistake of not actually 
testing this on a machine (working on ARM here). Won't happen again, if 
I'm to send another patch I'll be sure to actually setup an X86 Dom0 & 
at least do some basic tests.
Regarding the "set to NULL after free" practice, wouldn't it be wise to 
define an xfreeandnull(void**) macro and use the "unsafe" xfree only 
when it makes sense to (proly most of the time it won't)?

Corneliu.

Andrew Cooper Feb. 2, 2016, 10:04 a.m. UTC | #2

On 02/02/16 08:00, Corneliu ZUZU wrote:
> On 2/1/2016 7:56 PM, Andrew Cooper wrote:
>> c/s 428607a "x86: shrink 'struct domain', was already PAGE_SIZE"
>> introduced a
>> use-after-free error during domain destruction, because of the order
>> in which
>> timers are torn down.
>>
>>    (XEN) Xen call trace:
>>    (XEN)    [<ffff82d08013344e>] spinlock.c#check_lock+0x1e/0x40
>>    (XEN)    [<ffff82d08013349b>] _spin_lock+0x11/0x52
>>    (XEN)    [<ffff82d0801e8076>] vpt.c#pt_lock+0x24/0x40
>>    (XEN)    [<ffff82d0801e88f4>] destroy_periodic_time+0x18/0x81
>>    (XEN)    [<ffff82d0801e1089>] rtc_deinit+0x53/0x78
>>    (XEN)    [<ffff82d0801d1e5a>] hvm_domain_destroy+0x52/0x69
>>    (XEN)    [<ffff82d08016a758>] arch_domain_destroy+0x1a/0x98
>>    (XEN)    [<ffff82d080107cd5>]
>> domain.c#complete_domain_destroy+0x6f/0x182
>>    (XEN)    [<ffff82d080126a19>]
>> rcupdate.c#rcu_process_callbacks+0x144/0x1a6
>>    (XEN)    [<ffff82d080132c52>] softirq.c#__do_softirq+0x82/0x8d
>>    (XEN)    [<ffff82d080132caa>] do_softirq+0x13/0x15
>>    (XEN)    [<ffff82d080248ae1>] entry.o#process_softirqs+0x21/0x30
>>    (XEN)
>>    (XEN)
>>    (XEN) ****************************************
>>    (XEN) Panic on CPU 3:
>>    (XEN) GENERAL PROTECTION FAULT
>>    (XEN) [error_code=0000]
>>    (XEN) ****************************************
>>
>> Defer the freeing of d->arch.hvm_domain.pl_time until all timers have
>> been
>> destroyed.
>>
>> For safety, NULL out the pointers after freeing them, in an attempt
>> to make
>> mistakes more obvious in the future.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Corneliu ZUZU <czuzu@bitdefender.com>
>> ---
>>   xen/arch/x86/hvm/hvm.c | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index f24400d..38c65b3 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -1674,8 +1674,10 @@ void hvm_domain_relinquish_resources(struct
>> domain *d)
>>   void hvm_domain_destroy(struct domain *d)
>>   {
>>       xfree(d->arch.hvm_domain.io_handler);
>> +    d->arch.hvm_domain.io_handler = NULL;
>> +
>>       xfree(d->arch.hvm_domain.params);
>> -    xfree(d->arch.hvm_domain.pl_time);
>> +    d->arch.hvm_domain.params = NULL;
>>         hvm_destroy_cacheattr_region_list(d);
>>   @@ -1686,6 +1688,9 @@ void hvm_domain_destroy(struct domain *d)
>>       rtc_deinit(d);
>>       stdvga_deinit(d);
>>       vioapic_deinit(d);
>> +
>> +    xfree(d->arch.hvm_domain.pl_time);
>> +    d->arch.hvm_domain.pl_time = NULL;
>>   }
>>     static int hvm_save_tsc_adjust(struct domain *d,
>> hvm_domain_context_t *h)
>
> Ups, really sorry, ashamed to say I've done the mistake of not
> actually testing this on a machine (working on ARM here). Won't happen
> again, if I'm to send another patch I'll be sure to actually setup an
> X86 Dom0 & at least do some basic tests.

At least a boot test would certainly be nice.  In this case however, the
build passed all but a single one of the XenServer sanity tests, and
even the failure here was just chance that the region got reused in a
way which would be noticed.

This is, after all, precisely the reason why development branches are
not generally known to be stable.

> Regarding the "set to NULL after free" practice, wouldn't it be wise
> to define an xfreeandnull(void**) macro and use the "unsafe" xfree
> only when it makes sense to (proly most of the time it won't)?

This idea has been suggested in the past.  I can't remember what became
of it.

~Andrew

Jan Beulich Feb. 2, 2016, 10:43 a.m. UTC | #3

>>> On 01.02.16 at 18:56, <andrew.cooper3@citrix.com> wrote:
> For safety, NULL out the pointers after freeing them, in an attempt to make
> mistakes more obvious in the future.

Except that NULLing isn't really adding that much safety, and we'd
be better off poisoning such pointers. Nevertheless ...

> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

Andrew Cooper Feb. 2, 2016, 10:48 a.m. UTC | #4

On 02/02/16 10:43, Jan Beulich wrote:
>>>> On 01.02.16 at 18:56, <andrew.cooper3@citrix.com> wrote:
>> For safety, NULL out the pointers after freeing them, in an attempt to make
>> mistakes more obvious in the future.
> Except that NULLing isn't really adding that much safety, and we'd
> be better off poisoning such pointers. Nevertheless ...

NULLing the pointers would cause things like rtc_deinit() to always blow
up when it followed the NULL pointer.

IMO, we should unconditionally always NULL pointers when freeing a
pointer which isn't in local scope.  It would make issues such as these
completely obvious.

~Andrew

Jan Beulich Feb. 2, 2016, 10:52 a.m. UTC | #5

>>> On 02.02.16 at 11:48, <andrew.cooper3@citrix.com> wrote:
> On 02/02/16 10:43, Jan Beulich wrote:
>>>>> On 01.02.16 at 18:56, <andrew.cooper3@citrix.com> wrote:
>>> For safety, NULL out the pointers after freeing them, in an attempt to make
>>> mistakes more obvious in the future.
>> Except that NULLing isn't really adding that much safety, and we'd
>> be better off poisoning such pointers. Nevertheless ...
> 
> NULLing the pointers would cause things like rtc_deinit() to always blow
> up when it followed the NULL pointer.
> 
> IMO, we should unconditionally always NULL pointers when freeing a
> pointer which isn't in local scope.  It would make issues such as these
> completely obvious.

As would poisoning the pointers, yet poisoning has the advantage
of not allowing PV guests to control what the hypervisor might
access when erroneously de-referencing such a pointer.

Jan

Andrew Cooper Feb. 2, 2016, 10:57 a.m. UTC | #6

On 02/02/16 10:52, Jan Beulich wrote:
>>>> On 02.02.16 at 11:48, <andrew.cooper3@citrix.com> wrote:
>> On 02/02/16 10:43, Jan Beulich wrote:
>>>>>> On 01.02.16 at 18:56, <andrew.cooper3@citrix.com> wrote:
>>>> For safety, NULL out the pointers after freeing them, in an attempt to make
>>>> mistakes more obvious in the future.
>>> Except that NULLing isn't really adding that much safety, and we'd
>>> be better off poisoning such pointers. Nevertheless ...
>> NULLing the pointers would cause things like rtc_deinit() to always blow
>> up when it followed the NULL pointer.
>>
>> IMO, we should unconditionally always NULL pointers when freeing a
>> pointer which isn't in local scope.  It would make issues such as these
>> completely obvious.
> As would poisoning the pointers, yet poisoning has the advantage
> of not allowing PV guests to control what the hypervisor might
> access when erroneously de-referencing such a pointer.

Hmm.  If we taught xfree() about this poisoned value and it treated it
just as it would NULL, then this would work.

I will put it on my todo list, unless anyone else beats me to it.

~Andrew

Corneliu ZUZU Feb. 2, 2016, 11:39 a.m. UTC | #7

On 2/2/2016 12:52 PM, Jan Beulich wrote:
>> NULLing the pointers would cause things like rtc_deinit() to always blow
>> up when it followed the NULL pointer.
>>
>> IMO, we should unconditionally always NULL pointers when freeing a
>> pointer which isn't in local scope.  It would make issues such as these
>> completely obvious.
> As would poisoning the pointers, yet poisoning has the advantage
> of not allowing PV guests to control what the hypervisor might
> access when erroneously de-referencing such a pointer.
>
> Jan

Jan, that sounds interesting. I hope I'm not intruding, but when you 
have the time, could you please expand on this?
Besides distinguishing a nuked pointer from zeroed-out memory, I did not 
know of any other advantage of 0xDEADBEEF pointer poisoning (generally 
or specifically).
How could possibly setting a pointer to NULL allow a PV guest to control 
what the hypervisor might access, if the hypervisor *can't access* a 
NULL pointer?
And can a PV guest write data @ *hypervisor's* 0 page  (virtual and/or 
physical)?

Corneliu.

Jan Beulich Feb. 2, 2016, 11:44 a.m. UTC | #8

>>> On 02.02.16 at 12:39, <czuzu@bitdefender.com> wrote:
> On 2/2/2016 12:52 PM, Jan Beulich wrote:
>>> NULLing the pointers would cause things like rtc_deinit() to always blow
>>> up when it followed the NULL pointer.
>>>
>>> IMO, we should unconditionally always NULL pointers when freeing a
>>> pointer which isn't in local scope.  It would make issues such as these
>>> completely obvious.
>> As would poisoning the pointers, yet poisoning has the advantage
>> of not allowing PV guests to control what the hypervisor might
>> access when erroneously de-referencing such a pointer.
> 
> Jan, that sounds interesting. I hope I'm not intruding, but when you 
> have the time, could you please expand on this?
> Besides distinguishing a nuked pointer from zeroed-out memory, I did not 
> know of any other advantage of 0xDEADBEEF pointer poisoning (generally 
> or specifically).
> How could possibly setting a pointer to NULL allow a PV guest to control 
> what the hypervisor might access, if the hypervisor *can't access* a 
> NULL pointer?
> And can a PV guest write data @ *hypervisor's* 0 page  (virtual and/or 
> physical)?

Since the answer to this last question is "yes" (for the virtual page 0
only of course), I suppose the rest is obvious.

Jan

Andrew Cooper Feb. 2, 2016, 12:05 p.m. UTC | #9

On 02/02/16 11:39, Corneliu ZUZU wrote:
> On 2/2/2016 12:52 PM, Jan Beulich wrote:
>>> NULLing the pointers would cause things like rtc_deinit() to always
>>> blow
>>> up when it followed the NULL pointer.
>>>
>>> IMO, we should unconditionally always NULL pointers when freeing a
>>> pointer which isn't in local scope.  It would make issues such as these
>>> completely obvious.
>> As would poisoning the pointers, yet poisoning has the advantage
>> of not allowing PV guests to control what the hypervisor might
>> access when erroneously de-referencing such a pointer.
>>
>> Jan
>
> Jan, that sounds interesting. I hope I'm not intruding, but when you
> have the time, could you please expand on this?
> Besides distinguishing a nuked pointer from zeroed-out memory, I did
> not know of any other advantage of 0xDEADBEEF pointer poisoning
> (generally or specifically).

Xen is 64bit only these days.  Bit 47 of a 64bit pointer must be signed
extended, or a #GP fault occurs because of the use of a non-canonical
address.

Therefore, a pointer such as 0xdeadc0de00000000ULL will unconditionally
cause a #GP fault if Xen accidentally used it.

> How could possibly setting a pointer to NULL allow a PV guest to
> control what the hypervisor might access, if the hypervisor *can't
> access* a NULL pointer?
> And can a PV guest write data @ *hypervisor's* 0 page  (virtual and/or
> physical)?

Xen and PV guests share the virtual address space, in exactly the same
way as a native kernel and its userspace.  PV guests can map pages at 0.

Therefore, if Xen were to accidentally follow a NULL pointer, it may not
result in a pagefault.  (Hardware mechanisms such as SMEP and SMAP are
added protection against this, but don't work on older hardware)

~Andrew

Corneliu ZUZU Feb. 2, 2016, 12:51 p.m. UTC | #10

On 2/2/2016 2:05 PM, Andrew Cooper wrote:
> Xen and PV guests share the virtual address space, in exactly the same 
> way as a native kernel and its userspace. PV guests can map pages at 
> 0. Therefore, if Xen were to accidentally follow a NULL pointer, it 
> may not result in a pagefault. (Hardware mechanisms such as SMEP and 
> SMAP are added protection against this, but don't work on older hardware)
> ~Andrew 

Thank you, I finally got it.
(I also read 
http://wiki.xenproject.org/wiki/X86_Paravirtualised_Memory_Management , 
cleared things up)

Corneliu.

x86/hvm: Fix use-after-free introduced by c/s 428607a

Commit Message

Comments

Patch