diff mbox

[v2,for-4.9] x86/mm: Fix incorrect unmapping of 2MB and 1GB pages

Message ID d21f3b6d-d1c5-be00-6a1d-1944099b4bbc@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Boris Ostrovsky May 23, 2017, 1:40 p.m. UTC
On 05/23/2017 09:17 AM, Jan Beulich wrote:
>>>> On 23.05.17 at 15:00, <boris.ostrovsky@oracle.com> wrote:
>> (d1) Testing HVM environment:
>> (XEN) d1v0 Triple fault - invoking HVM shutdown action 1
>> (XEN) *** Dumping Dom1 vcpu#0 state: ***
>> (XEN) ----[ Xen-4.9-rc  x86_64  debug=y   Tainted:  C   ]----
>> (XEN) CPU:    11
>> (XEN) RIP:    0018:[<000000000010815c>]
>> (XEN) RFLAGS: 0000000000000086   CONTEXT: hvm guest (d1v0)
>> (XEN) rax: 0000000080000011   rbx: 000000000017c000   rcx: 0000000000003000
>> (XEN) rdx: 00000000ffefffff   rsi: 0000000000000000   rdi: 0000000000000000
>> (XEN) rbp: 0000000000136478   rsp: 0000000000136478   r8:  0000000000000000
>> (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
>> (XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
>> (XEN) r15: 0000000000000000   cr0: 0000000080000011   cr4: 0000000000000000
>> (XEN) cr3: 0000000000800000   cr2: 000000000010815c
>> (XEN) ds: 0020   es: 0020   fs: 0020   gs: 0020   ss: 0020   cs: 0018
>>
>>
>> 0x10815c is tools/firmware/hvmloader/tests.c:start_paging(), the 'jmp'
>> after we enable paging (load cr0 with bit 31 set).
> Odd. Suggests page tables are completely screwed.
>
>> root@ovs101> xl create -c ~/virt/pvh.conf
>> Parsing config from /root/virt/pvh.conf
>> libxl: notice: libxl_numa.c:518:libxl__get_numa_candidate: NUMA
>> placement failed, performance might be affected
>> S3 disabled
>> S4 disabled
>> CONV disabled
>> xc: error: panic: xc_dom_boot.c:178: xc_dom_boot_domU_map: failed to
>> mmap domU pages 0x1000+0x1062 [mmap, errno=22 (Invalid argument)]:
>> Internal error
> This is even more strange. I can't seem to make a connection to
> the changes in said commit at all. And I did go through p2m-pt.c's
> relevant code another time this morning, without spotting any
> possible oversight by Igor. IOW I'm now really curious what it is
> that I'm not seeing (and that's apparently NPT-specific).

And you haven't been able to reproduce this? I see this fail on two AMD
systems (different processor families).

What's interesting (I just noticed this) is that while PVH fails in the
same manner, HVM crashes differently. The second crash is

(d11) xen: copy e820...
(XEN) d11v0 Triple fault - invoking HVM shutdown action 1
(XEN) *** Dumping Dom11 vcpu#0 state: ***
(XEN) ----[ Xen-4.9-rc  x86_64  debug=n   Tainted:  C   ]----
(XEN) CPU:    0
(XEN) RIP:    0008:[<00000000000da54c>]
(XEN) RFLAGS: 0000000000000093   CONTEXT: hvm guest (d11v0)
(XEN) rax: 000000008ee08e90   rbx: 000000008ee08ec0   rcx: 00000000ffffffff
(XEN) rdx: 0000000000006f74   rsi: 000000009ea91ce8   rdi: 0000000000000000
(XEN) rbp: 0000000000006fd8   rsp: 0000000000006f64   r8:  0000000000000000
(XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 0000000000000011   cr4: 0000000000000000
(XEN) cr3: 0000000000000000   cr2: 0000000000000000
(XEN) ds: 0010   es: 0010   fs: 0010   gs: 0010   ss: 0010   cs: 0008


so paging is off and it dies not in hvmloader.

And this:

 
             order = (!(fn_mask & ((1ul << PAGE_ORDER_1G) - 1)) &&


makes the problem go away.

-boris

Comments

Jan Beulich May 23, 2017, 2:05 p.m. UTC | #1
>>> On 23.05.17 at 15:40, <boris.ostrovsky@oracle.com> wrote:
> And you haven't been able to reproduce this? I see this fail on two AMD
> systems (different processor families).

I didn't even have the time to try.

> And this:
> 
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -560,7 +560,7 @@ int p2m_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
>      {
>          if ( hap_enabled(d) )
>          {
> -            unsigned long fn_mask = !mfn_eq(mfn, INVALID_MFN) ?
> +            unsigned long fn_mask = (!mfn_eq(mfn, INVALID_MFN) || 1)?
>                                       (gfn | mfn_x(mfn) | todo) : (gfn | todo);
>  
>              order = (!(fn_mask & ((1ul << PAGE_ORDER_1G) - 1)) &&
> 
> 
> makes the problem go away.

Interesting. I took another look at p2m-pt.c, this time paying
particular attention to INVALID_MFN uses. And right the first one
may already provide a hint: Perhaps we now need L2 and L3
counterparts to p2m_l1e_from_pfn(). Further changes may then
be needed to the splitting of large pages (in p2m_next_level())
depending on whether INVALID_MFN entries can make it there.

Jan
Boris Ostrovsky May 23, 2017, 2:32 p.m. UTC | #2
On 05/23/2017 10:05 AM, Jan Beulich wrote:
>>>> On 23.05.17 at 15:40, <boris.ostrovsky@oracle.com> wrote:
>> And you haven't been able to reproduce this? I see this fail on two AMD
>> systems (different processor families).
> I didn't even have the time to try.
>
>> And this:
>>
>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -560,7 +560,7 @@ int p2m_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
>>      {
>>          if ( hap_enabled(d) )
>>          {
>> -            unsigned long fn_mask = !mfn_eq(mfn, INVALID_MFN) ?
>> +            unsigned long fn_mask = (!mfn_eq(mfn, INVALID_MFN) || 1)?
>>                                       (gfn | mfn_x(mfn) | todo) : (gfn | todo);
>>  
>>              order = (!(fn_mask & ((1ul << PAGE_ORDER_1G) - 1)) &&
>>
>>
>> makes the problem go away.
> Interesting. I took another look at p2m-pt.c, this time paying
> particular attention to INVALID_MFN uses. And right the first one
> may already provide a hint: Perhaps we now need L2 and L3
> counterparts to p2m_l1e_from_pfn(). 

Defining p2m_l2e_from_pfn indeed helps a bit with HVM --- the guest now
goes as far as loading balloon driver (when it crashes).


> Further changes may then
> be needed to the splitting of large pages (in p2m_next_level())
> depending on whether INVALID_MFN entries can make it there.

Let me see what I can do here.

-boris
Boris Ostrovsky May 23, 2017, 10:25 p.m. UTC | #3
On 05/23/2017 10:32 AM, Boris Ostrovsky wrote:
> On 05/23/2017 10:05 AM, Jan Beulich wrote:
>>>>> On 23.05.17 at 15:40, <boris.ostrovsky@oracle.com> wrote:
>>> And you haven't been able to reproduce this? I see this fail on two AMD
>>> systems (different processor families).
>> I didn't even have the time to try.
>>
>>> And this:
>>>
>>> --- a/xen/arch/x86/mm/p2m.c
>>> +++ b/xen/arch/x86/mm/p2m.c
>>> @@ -560,7 +560,7 @@ int p2m_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
>>>      {
>>>          if ( hap_enabled(d) )
>>>          {
>>> -            unsigned long fn_mask = !mfn_eq(mfn, INVALID_MFN) ?
>>> +            unsigned long fn_mask = (!mfn_eq(mfn, INVALID_MFN) || 1)?
>>>                                       (gfn | mfn_x(mfn) | todo) : (gfn | todo);
>>>  
>>>              order = (!(fn_mask & ((1ul << PAGE_ORDER_1G) - 1)) &&
>>>
>>>
>>> makes the problem go away.
>> Interesting. I took another look at p2m-pt.c, this time paying
>> particular attention to INVALID_MFN uses. And right the first one
>> may already provide a hint: Perhaps we now need L2 and L3
>> counterparts to p2m_l1e_from_pfn(). 
> Defining p2m_l2e_from_pfn indeed helps a bit with HVM --- the guest now
> goes as far as loading balloon driver (when it crashes).
>
>
>> Further changes may then
>> be needed to the splitting of large pages (in p2m_next_level())
>> depending on whether INVALID_MFN entries can make it there.
> Let me see what I can do here.

TBH, I don't see what needs to be done in p2m_next_level(). mfn doesn't
enter the calculation there.


-boris
Jan Beulich May 24, 2017, 6:20 a.m. UTC | #4
>>> On 24.05.17 at 00:25, <boris.ostrovsky@oracle.com> wrote:
> On 05/23/2017 10:32 AM, Boris Ostrovsky wrote:
>> On 05/23/2017 10:05 AM, Jan Beulich wrote:
>>> Further changes may then
>>> be needed to the splitting of large pages (in p2m_next_level())
>>> depending on whether INVALID_MFN entries can make it there.
>> Let me see what I can do here.
> 
> TBH, I don't see what needs to be done in p2m_next_level(). mfn doesn't
> enter the calculation there.

The relevant variables are named "pfn" there, and so far I'm only
guessing there might be an issue (or actually I meanwhile thing it
should only be a cosmetic one): What's being read out of the old
PTE is being taken and incremented for each split PTE. That'll (in
the 2Mb page case) result in 4k PTEs referencing MFNs 0xffffffffff
and 0 ... 0x1fe. But generally this should be harmless, as these
are non-present PTEs, and the frame numbers read back out of
non-present PTEs should be of no interest to anyone.

I'm pondering to convert the code to use mfn_add() (perhaps we
should also have mfn_inc()), making the helper saturate.

In any event I hope to find time later today to look into the issue
myself. I'm kind of disappointed that there was no visible attempt
from Igor so far to help the situation.

Jan
Andrew Cooper May 24, 2017, 6:53 a.m. UTC | #5
On 24/05/2017 07:20, Jan Beulich wrote:
> I'm kind of disappointed that there was no visible attempt
> from Igor so far to help the situation.

Igor is out of the office at the moment, and besides, this thread of
problem is only 36h old at this point, which is a single working day in
this timezone.

~Andrew
diff mbox

Patch

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index c6ec1a4..0051623 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -560,7 +560,7 @@  int p2m_set_entry(struct p2m_domain *p2m, unsigned
long gfn, mfn_t mfn,
     {
         if ( hap_enabled(d) )
         {
-            unsigned long fn_mask = !mfn_eq(mfn, INVALID_MFN) ?
+            unsigned long fn_mask = (!mfn_eq(mfn, INVALID_MFN) || 1)?
                                      (gfn | mfn_x(mfn) | todo) : (gfn |
todo);