kvm: Fix memory slot page alignment logic

Message ID	1415395125-18926-1-git-send-email-agraf@suse.de (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: Alexander Graf <agraf@suse.de> To: qemu-ppc@nongnu.org Cc: stuart.yoder@freescale.com, qemu-devel@nongnu.org, pbonzini@redhat.com, kvm@vger.kernel.org, qemu-stable@nongnu.org Subject: [PATCH] kvm: Fix memory slot page alignment logic Date: Fri, 7 Nov 2014 22:18:45 +0100 Message-Id: <1415395125-18926-1-git-send-email-agraf@suse.de> Sender: kvm-owner@vger.kernel.org Precedence: bulk

Alexander Graf Nov. 7, 2014, 9:18 p.m. UTC

Memory slots have to be page aligned to get entered into KVM. There
is existing logic that tries to ensure that we pad memory slots that
are not page aligned to the biggest region that would still fit in the
alignment requirements.

Unfortunately, that logic is broken. It tries to calculate the start
offset based on the region size.

Fix up the logic to do the thing it was intended to do and document it
properly in the comment above it.

With this patch applied, I can successfully run an e500 guest with more
than 3GB RAM (at which point RAM starts overlapping subpage memory regions).

Cc: qemu-stable@nongnu.org
Signed-off-by: Alexander Graf <agraf@suse.de>
---
 kvm-all.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Alexander Graf Nov. 7, 2014, 9:24 p.m. UTC | #1

On 07.11.14 22:18, Alexander Graf wrote:
> Memory slots have to be page aligned to get entered into KVM. There
> is existing logic that tries to ensure that we pad memory slots that
> are not page aligned to the biggest region that would still fit in the
> alignment requirements.
> 
> Unfortunately, that logic is broken. It tries to calculate the start
> offset based on the region size.
> 
> Fix up the logic to do the thing it was intended to do and document it
> properly in the comment above it.
> 
> With this patch applied, I can successfully run an e500 guest with more
> than 3GB RAM (at which point RAM starts overlapping subpage memory regions).
> 
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Alexander Graf <agraf@suse.de>

If everyone agrees that this patch does indeed do what the code is
intended to do (I think it's quite correct, to be 100% right it should
use getpagesize() rather than TARGET_PAGE_SIZE), this should go into 2.2
still.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Igor Mammedov Nov. 10, 2014, 12:31 p.m. UTC | #2

On Fri,  7 Nov 2014 22:18:45 +0100
Alexander Graf <agraf@suse.de> wrote:

> Memory slots have to be page aligned to get entered into KVM. There
> is existing logic that tries to ensure that we pad memory slots that
> are not page aligned to the biggest region that would still fit in the
> alignment requirements.
> 
> Unfortunately, that logic is broken. It tries to calculate the start
> offset based on the region size.
> 
> Fix up the logic to do the thing it was intended to do and document it
> properly in the comment above it.
> 
> With this patch applied, I can successfully run an e500 guest with more
> than 3GB RAM (at which point RAM starts overlapping subpage memory regions).
> 
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Alexander Graf <agraf@suse.de>
> ---
>  kvm-all.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kvm-all.c b/kvm-all.c
> index 44a5e72..596e7ce 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -634,8 +634,10 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, bool add)
>      unsigned delta;
>  
>      /* kvm works in page size chunks, but the function may be called
> -       with sub-page size and unaligned start address. */
> -    delta = TARGET_PAGE_ALIGN(size) - size;
> +       with sub-page size and unaligned start address. Pad the start
> +       address to next and truncate size to previous page boundary. */
I'm a bit confused how it works at all.
Lets assume that there is no mapped pages that include start_addr,
then if start_addr were padded to next page, kvm would map it from there
but the rest of QEMU would still use unaligned start_addr for MemoryRegion
that isn't even mapped.

It would seem that instead of padding up to the next page, start_addr
should be moved to the start of the page that includes it to make page
with original start_addr available to guest.

> +    delta = (TARGET_PAGE_SIZE - (start_addr & ~TARGET_PAGE_MASK));
> +    delta &= ~TARGET_PAGE_MASK;
>      if (delta > size) {
>          return;
>      }

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alexander Graf Nov. 10, 2014, 1:16 p.m. UTC | #3

On 10.11.14 13:31, Igor Mammedov wrote:
> On Fri,  7 Nov 2014 22:18:45 +0100
> Alexander Graf <agraf@suse.de> wrote:
> 
>> Memory slots have to be page aligned to get entered into KVM. There
>> is existing logic that tries to ensure that we pad memory slots that
>> are not page aligned to the biggest region that would still fit in the
>> alignment requirements.
>>
>> Unfortunately, that logic is broken. It tries to calculate the start
>> offset based on the region size.
>>
>> Fix up the logic to do the thing it was intended to do and document it
>> properly in the comment above it.
>>
>> With this patch applied, I can successfully run an e500 guest with more
>> than 3GB RAM (at which point RAM starts overlapping subpage memory regions).
>>
>> Cc: qemu-stable@nongnu.org
>> Signed-off-by: Alexander Graf <agraf@suse.de>
>> ---
>>  kvm-all.c | 6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/kvm-all.c b/kvm-all.c
>> index 44a5e72..596e7ce 100644
>> --- a/kvm-all.c
>> +++ b/kvm-all.c
>> @@ -634,8 +634,10 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, bool add)
>>      unsigned delta;
>>  
>>      /* kvm works in page size chunks, but the function may be called
>> -       with sub-page size and unaligned start address. */
>> -    delta = TARGET_PAGE_ALIGN(size) - size;
>> +       with sub-page size and unaligned start address. Pad the start
>> +       address to next and truncate size to previous page boundary. */
> I'm a bit confused how it works at all.
> Lets assume that there is no mapped pages that include start_addr,
> then if start_addr were padded to next page, kvm would map it from there
> but the rest of QEMU would still use unaligned start_addr for MemoryRegion
> that isn't even mapped.

Sorry, I don't understand this paragraph. Memory slots in general are
accelerations for memory access - for MMIO (RAM is usually aligned), KVM
can always exit to QEMU and just do a manual MMIO exit.

> It would seem that instead of padding up to the next page, start_addr
> should be moved to the start of the page that includes it to make page
> with original start_addr available to guest.

No, because in that case you would map something as RAM that really
isn't RAM.

Imagine you have the following memory layout:

0x1000 page size

1) 0x00000 - 0x10000 RAM
2) 0x10000 - 0x10100 MMIO
3) 0x10100 - 0x20000 RAM

Then you want to map 1) as memory slot and 4) from 0x11000 onwards as
memory slot.

You can't map the page from 0x10000 - 0x11000 as memory slot, because
part of it is MMIO.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Paolo Bonzini Nov. 10, 2014, 1:54 p.m. UTC | #4

On 10/11/2014 14:16, Alexander Graf wrote:
> No, because in that case you would map something as RAM that really
> isn't RAM.
> 
> Imagine you have the following memory layout:
> 
> 0x1000 page size
> 
> 1) 0x00000 - 0x10000 RAM
> 2) 0x10000 - 0x10100 MMIO
> 3) 0x10100 - 0x20000 RAM
> 
> Then you want to map 1) as memory slot and 4) from 0x11000 onwards as
> memory slot.
> 
> You can't map the page from 0x10000 - 0x11000 as memory slot, because
> part of it is MMIO.

Right.  The partial RAM page remains marked as MMIO as far as KVM is
concerned, so accesses are slow and you cannot run code from it.
However, it is fundamental that MMIO areas are not marked as RAM.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Peter Maydell Nov. 10, 2014, 1:55 p.m. UTC | #5

On 10 November 2014 13:16, Alexander Graf <agraf@suse.de> wrote:
> Sorry, I don't understand this paragraph. Memory slots in general are
> accelerations for memory access - for MMIO (RAM is usually aligned), KVM
> can always exit to QEMU and just do a manual MMIO exit.

...you're a bit stuck if you were hoping to execute code from
that RAM, though, so they're not *purely* acceleration, right?

-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Igor Mammedov Nov. 10, 2014, 1:55 p.m. UTC | #6

On Mon, 10 Nov 2014 14:16:58 +0100
Alexander Graf <agraf@suse.de> wrote:

> 
> 
> On 10.11.14 13:31, Igor Mammedov wrote:
> > On Fri,  7 Nov 2014 22:18:45 +0100
> > Alexander Graf <agraf@suse.de> wrote:
> > 
> >> Memory slots have to be page aligned to get entered into KVM. There
> >> is existing logic that tries to ensure that we pad memory slots that
> >> are not page aligned to the biggest region that would still fit in the
> >> alignment requirements.
> >>
> >> Unfortunately, that logic is broken. It tries to calculate the start
> >> offset based on the region size.
> >>
> >> Fix up the logic to do the thing it was intended to do and document it
> >> properly in the comment above it.
> >>
> >> With this patch applied, I can successfully run an e500 guest with more
> >> than 3GB RAM (at which point RAM starts overlapping subpage memory regions).
> >>
> >> Cc: qemu-stable@nongnu.org
> >> Signed-off-by: Alexander Graf <agraf@suse.de>
> >> ---
> >>  kvm-all.c | 6 ++++--
> >>  1 file changed, 4 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/kvm-all.c b/kvm-all.c
> >> index 44a5e72..596e7ce 100644
> >> --- a/kvm-all.c
> >> +++ b/kvm-all.c
> >> @@ -634,8 +634,10 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, bool add)
> >>      unsigned delta;
> >>  
> >>      /* kvm works in page size chunks, but the function may be called
> >> -       with sub-page size and unaligned start address. */
> >> -    delta = TARGET_PAGE_ALIGN(size) - size;
> >> +       with sub-page size and unaligned start address. Pad the start
> >> +       address to next and truncate size to previous page boundary. */
> > I'm a bit confused how it works at all.
> > Lets assume that there is no mapped pages that include start_addr,
> > then if start_addr were padded to next page, kvm would map it from there
> > but the rest of QEMU would still use unaligned start_addr for MemoryRegion
> > that isn't even mapped.
> 
> Sorry, I don't understand this paragraph. Memory slots in general are
> accelerations for memory access - for MMIO (RAM is usually aligned), KVM
> can always exit to QEMU and just do a manual MMIO exit.
> 
> > It would seem that instead of padding up to the next page, start_addr
> > should be moved to the start of the page that includes it to make page
> > with original start_addr available to guest.
> 
> No, because in that case you would map something as RAM that really
> isn't RAM.
> 
> Imagine you have the following memory layout:
> 
> 0x1000 page size
> 
> 1) 0x00000 - 0x10000 RAM
> 2) 0x10000 - 0x10100 MMIO
> 3) 0x10100 - 0x20000 RAM
> 
> Then you want to map 1) as memory slot and 4) from 0x11000 onwards as
> memory slot.
so every access to RAM 0x10100-0x11000 which is not represented as memory
slot would cause VMEXIT?

> 
> You can't map the page from 0x10000 - 0x11000 as memory slot, because
> part of it is MMIO.
> 
> 
> Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alexander Graf Nov. 10, 2014, 2:47 p.m. UTC | #7

On 10.11.14 14:55, Igor Mammedov wrote:
> On Mon, 10 Nov 2014 14:16:58 +0100
> Alexander Graf <agraf@suse.de> wrote:
> 
>>
>>
>> On 10.11.14 13:31, Igor Mammedov wrote:
>>> On Fri,  7 Nov 2014 22:18:45 +0100
>>> Alexander Graf <agraf@suse.de> wrote:
>>>
>>>> Memory slots have to be page aligned to get entered into KVM. There
>>>> is existing logic that tries to ensure that we pad memory slots that
>>>> are not page aligned to the biggest region that would still fit in the
>>>> alignment requirements.
>>>>
>>>> Unfortunately, that logic is broken. It tries to calculate the start
>>>> offset based on the region size.
>>>>
>>>> Fix up the logic to do the thing it was intended to do and document it
>>>> properly in the comment above it.
>>>>
>>>> With this patch applied, I can successfully run an e500 guest with more
>>>> than 3GB RAM (at which point RAM starts overlapping subpage memory regions).
>>>>
>>>> Cc: qemu-stable@nongnu.org
>>>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>>> ---
>>>>  kvm-all.c | 6 ++++--
>>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/kvm-all.c b/kvm-all.c
>>>> index 44a5e72..596e7ce 100644
>>>> --- a/kvm-all.c
>>>> +++ b/kvm-all.c
>>>> @@ -634,8 +634,10 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, bool add)
>>>>      unsigned delta;
>>>>  
>>>>      /* kvm works in page size chunks, but the function may be called
>>>> -       with sub-page size and unaligned start address. */
>>>> -    delta = TARGET_PAGE_ALIGN(size) - size;
>>>> +       with sub-page size and unaligned start address. Pad the start
>>>> +       address to next and truncate size to previous page boundary. */
>>> I'm a bit confused how it works at all.
>>> Lets assume that there is no mapped pages that include start_addr,
>>> then if start_addr were padded to next page, kvm would map it from there
>>> but the rest of QEMU would still use unaligned start_addr for MemoryRegion
>>> that isn't even mapped.
>>
>> Sorry, I don't understand this paragraph. Memory slots in general are
>> accelerations for memory access - for MMIO (RAM is usually aligned), KVM
>> can always exit to QEMU and just do a manual MMIO exit.
>>
>>> It would seem that instead of padding up to the next page, start_addr
>>> should be moved to the start of the page that includes it to make page
>>> with original start_addr available to guest.
>>
>> No, because in that case you would map something as RAM that really
>> isn't RAM.
>>
>> Imagine you have the following memory layout:
>>
>> 0x1000 page size
>>
>> 1) 0x00000 - 0x10000 RAM
>> 2) 0x10000 - 0x10100 MMIO
>> 3) 0x10100 - 0x20000 RAM
>>
>> Then you want to map 1) as memory slot and 4) from 0x11000 onwards as
>> memory slot.
> so every access to RAM 0x10100-0x11000 which is not represented as memory
> slot would cause VMEXIT?

Yes, there's no other way. Otherwise we wouldn't be able to trap on the
exits from 0x10000 - 0x10100. Hardware only gives us page granularity.

Usually this isn't an issue because overlapping MMIO regions are pretty
large chunks of power-of-2 size - if you see any overlapping at all. On
e500 this bites us though, because we end up with small MSI-X windows
inside our address space (which in turn might also be a bug, but that
doesn't mean that the slot mapping logic should be left as broken as it is).


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alexander Graf Nov. 10, 2014, 2:48 p.m. UTC | #8

On 10.11.14 14:55, Peter Maydell wrote:
> On 10 November 2014 13:16, Alexander Graf <agraf@suse.de> wrote:
>> Sorry, I don't understand this paragraph. Memory slots in general are
>> accelerations for memory access - for MMIO (RAM is usually aligned), KVM
>> can always exit to QEMU and just do a manual MMIO exit.
> 
> ...you're a bit stuck if you were hoping to execute code from
> that RAM, though, so they're not *purely* acceleration, right?

Yes and no. Technically, there's no reason KVM couldn't do an MMIO exit
dance to fetch the next instruction. From user space this should be
indistinguishable.

Today, I don't think it's implemented though :).


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

kvm: Fix memory slot page alignment logic

Commit Message

Comments

Patch