diff mbox

[3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots

Message ID 546A1905.6080607@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Laszlo Ersek Nov. 17, 2014, 3:49 p.m. UTC
On 11/17/14 16:29, Paolo Bonzini wrote:
> 
> 
> On 17/11/2014 15:58, Ard Biesheuvel wrote:
>> Readonly memslots are often used to implement emulation of ROMs and
>> NOR flashes, in which case the guest may legally map these regions as
>> uncached.
>> To deal with the incoherency associated with uncached guest mappings,
>> treat all readonly memslots as incoherent, and ensure that pages that
>> belong to regions tagged as such are flushed to DRAM before being passed
>> to the guest.
> 
> On x86, the processor combines the cacheability values from the two
> levels of page tables.  Is there no way to do the same on ARM?

Combining occurs on ARMv8 too. The Stage1 (guest) mapping is very strict
(Device non-Gathering, non-Reordering, no Early Write Acknowledgement --
for EFI_MEMORY_UC), which basically "overrides" the Stage2 (very lax
host) memory attributes.

When qemu writes, as part of emulating the flash programming commands,
to the RAMBlock that *otherwise* backs the flash range (as a r/o
memslot), those writes (from host userspace) tend to end up in dcache.

But, when the guest flips back the flash to romd mode, and tries to read
back the values from the flash as plain ROM, the dcache is completely
bypassed due to the strict stage1 mapping, and the guest goes directly
to DRAM.

Where qemu's earlier writes are not yet / necessarily visible.

Please see my original patch (which was incomplete) in the attachment,
it has a very verbose commit message.

Anyway, I'll let others explain; they can word it better than I can :)

FWIW,

Series
Reviewed-by: Laszlo Ersek <lersek@redhat.com>

I ported this series to a 3.17.0+ based kernel, and tested it. It works
fine. The ROM-like view of the NOR flash now reflects the previously
programmed contents.

Series
Tested-by: Laszlo Ersek <lersek@redhat.com>

Thanks!
Laszlo

Comments

Mario Smarduch Nov. 19, 2014, 11:32 p.m. UTC | #1
Hi Laszlo,

couple observations.

     I'm wondering if access from qemu and guest won't
result in mixed memory attributes and if that's acceptable
to the CPU.

Also is if you update memory from qemu you may break
dirty page logging/migration. Unless there is some other way
you keep track. Of course it may not be applicable in your
case (i.e. flash unused after boot).

- Mario

On 11/17/2014 07:49 AM, Laszlo Ersek wrote:
> On 11/17/14 16:29, Paolo Bonzini wrote:
>>
>>
>> On 17/11/2014 15:58, Ard Biesheuvel wrote:
>>> Readonly memslots are often used to implement emulation of ROMs and
>>> NOR flashes, in which case the guest may legally map these regions as
>>> uncached.
>>> To deal with the incoherency associated with uncached guest mappings,
>>> treat all readonly memslots as incoherent, and ensure that pages that
>>> belong to regions tagged as such are flushed to DRAM before being passed
>>> to the guest.
>>
>> On x86, the processor combines the cacheability values from the two
>> levels of page tables.  Is there no way to do the same on ARM?
> 
> Combining occurs on ARMv8 too. The Stage1 (guest) mapping is very strict
> (Device non-Gathering, non-Reordering, no Early Write Acknowledgement --
> for EFI_MEMORY_UC), which basically "overrides" the Stage2 (very lax
> host) memory attributes.
> 
> When qemu writes, as part of emulating the flash programming commands,
> to the RAMBlock that *otherwise* backs the flash range (as a r/o
> memslot), those writes (from host userspace) tend to end up in dcache.
> 
> But, when the guest flips back the flash to romd mode, and tries to read
> back the values from the flash as plain ROM, the dcache is completely
> bypassed due to the strict stage1 mapping, and the guest goes directly
> to DRAM.
> 
> Where qemu's earlier writes are not yet / necessarily visible.
> 
> Please see my original patch (which was incomplete) in the attachment,
> it has a very verbose commit message.
> 
> Anyway, I'll let others explain; they can word it better than I can :)
> 
> FWIW,
> 
> Series
> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
> 
> I ported this series to a 3.17.0+ based kernel, and tested it. It works
> fine. The ROM-like view of the NOR flash now reflects the previously
> programmed contents.
> 
> Series
> Tested-by: Laszlo Ersek <lersek@redhat.com>
> 
> Thanks!
> Laszlo
> 
> 
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laszlo Ersek Nov. 20, 2014, 8:08 a.m. UTC | #2
On 11/20/14 00:32, Mario Smarduch wrote:
> Hi Laszlo,
> 
> couple observations.
> 
>      I'm wondering if access from qemu and guest won't
> result in mixed memory attributes and if that's acceptable
> to the CPU.

Normally this would be a problem I think (Jon raised the topic of live
migration). However, for flash programming specifically, I think the
guest's access pattern ensures that we'll see things OK.

When the guest issues the first write access, the memslot is deleted,
and everything is forwarded to qemu, both reads and writes. In response
qemu modifies the array that *otherwise* backs the flash. These
modifications by qemu end up in the dcache mostly. When the guest is
done "programming", it writes a special command (read array mode) at
which point the memslot is recreated (as read-only) and flushed / set up
for flushing during demand paging.

So from the emulated flash POV, the memslot either doesn't exist at all
(and then qemu serves all accesses just fine), or it exists r/o, at
which point qemu (host userspace) will have stopped writing to it, and
will have set it up for flushing before and during guest read accesses.

> Also is if you update memory from qemu you may break
> dirty page logging/migration.

Very probably. Jon said the same thing.

> Unless there is some other way
> you keep track. Of course it may not be applicable in your
> case (i.e. flash unused after boot).

The flash *is* used after boot, because the UEFI runtime variable
services *are* exercised by the guest kernel. However those use the same
access pattern (it's the same set of UEFI services just called by a
different "client").

*Uncoordinated* access from guest and host in parallel will be a big
problem; but we're not that far yet, and we need to get the flash
problem sorted, so that we can at least boot and work on the basic
stuff. The flash programming dance happens to provide coordination; the
flash mode changes (which are equivalent to the teardown and the
recreation of the memslot) can be considered barriers.

I hope this is acceptable for the time being...

Thanks
Laszlo

> 
> - Mario
> 
> On 11/17/2014 07:49 AM, Laszlo Ersek wrote:
>> On 11/17/14 16:29, Paolo Bonzini wrote:
>>>
>>>
>>> On 17/11/2014 15:58, Ard Biesheuvel wrote:
>>>> Readonly memslots are often used to implement emulation of ROMs and
>>>> NOR flashes, in which case the guest may legally map these regions as
>>>> uncached.
>>>> To deal with the incoherency associated with uncached guest mappings,
>>>> treat all readonly memslots as incoherent, and ensure that pages that
>>>> belong to regions tagged as such are flushed to DRAM before being passed
>>>> to the guest.
>>>
>>> On x86, the processor combines the cacheability values from the two
>>> levels of page tables.  Is there no way to do the same on ARM?
>>
>> Combining occurs on ARMv8 too. The Stage1 (guest) mapping is very strict
>> (Device non-Gathering, non-Reordering, no Early Write Acknowledgement --
>> for EFI_MEMORY_UC), which basically "overrides" the Stage2 (very lax
>> host) memory attributes.
>>
>> When qemu writes, as part of emulating the flash programming commands,
>> to the RAMBlock that *otherwise* backs the flash range (as a r/o
>> memslot), those writes (from host userspace) tend to end up in dcache.
>>
>> But, when the guest flips back the flash to romd mode, and tries to read
>> back the values from the flash as plain ROM, the dcache is completely
>> bypassed due to the strict stage1 mapping, and the guest goes directly
>> to DRAM.
>>
>> Where qemu's earlier writes are not yet / necessarily visible.
>>
>> Please see my original patch (which was incomplete) in the attachment,
>> it has a very verbose commit message.
>>
>> Anyway, I'll let others explain; they can word it better than I can :)
>>
>> FWIW,
>>
>> Series
>> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
>>
>> I ported this series to a 3.17.0+ based kernel, and tested it. It works
>> fine. The ROM-like view of the NOR flash now reflects the previously
>> programmed contents.
>>
>> Series
>> Tested-by: Laszlo Ersek <lersek@redhat.com>
>>
>> Thanks!
>> Laszlo
>>
>>
>>
>> _______________________________________________
>> kvmarm mailing list
>> kvmarm@lists.cs.columbia.edu
>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mario Smarduch Nov. 20, 2014, 6:35 p.m. UTC | #3
On 11/20/2014 12:08 AM, Laszlo Ersek wrote:
> On 11/20/14 00:32, Mario Smarduch wrote:
>> Hi Laszlo,
>>
>> couple observations.
>>
>>      I'm wondering if access from qemu and guest won't
>> result in mixed memory attributes and if that's acceptable
>> to the CPU.
> 
> Normally this would be a problem I think (Jon raised the topic of live
> migration). However, for flash programming specifically, I think the
> guest's access pattern ensures that we'll see things OK.
> 
> When the guest issues the first write access, the memslot is deleted,
> and everything is forwarded to qemu, both reads and writes. In response
> qemu modifies the array that *otherwise* backs the flash. These
> modifications by qemu end up in the dcache mostly. When the guest is
> done "programming", it writes a special command (read array mode) at
> which point the memslot is recreated (as read-only) and flushed / set up
> for flushing during demand paging.
> 
> So from the emulated flash POV, the memslot either doesn't exist at all
> (and then qemu serves all accesses just fine), or it exists r/o, at
> which point qemu (host userspace) will have stopped writing to it, and
> will have set it up for flushing before and during guest read accesses.

I think beyond consistency, there should be no double mappings with
conflicting attributes at any time or CPU state is undefined. At least
that's what I recall for cases where identity mapping was cacheble and user
mmapp'ed regions uncacheable. Side effects like CPU hardstop or
victim invalidate of dirty cache line. With virtualization
extensions maybe behavior is different. I guess if you're not seeing
lock ups or crashes then it appears to work :) Probably more senior
folks in ARM community are in better position to address this,
but I thought I raise a flag.

> 
>> Also is if you update memory from qemu you may break
>> dirty page logging/migration.
> 
> Very probably. Jon said the same thing.
> 
>> Unless there is some other way
>> you keep track. Of course it may not be applicable in your
>> case (i.e. flash unused after boot).
> 
> The flash *is* used after boot, because the UEFI runtime variable
> services *are* exercised by the guest kernel. However those use the same
> access pattern (it's the same set of UEFI services just called by a
> different "client").
> 
> *Uncoordinated* access from guest and host in parallel will be a big
> problem; but we're not that far yet, and we need to get the flash
> problem sorted, so that we can at least boot and work on the basic
> stuff. The flash programming dance happens to provide coordination; the
> flash mode changes (which are equivalent to the teardown and the
> recreation of the memslot) can be considered barriers.
> 
> I hope this is acceptable for the time being...

Yeah I understand you have a more imediatte requirement to support,
migration
isssue is more fyi. Thanks for the details helps to understand the context.

- Mario
> 
> Thanks
> Laszlo
> 
>>
>> - Mario
>>
>> On 11/17/2014 07:49 AM, Laszlo Ersek wrote:
>>> On 11/17/14 16:29, Paolo Bonzini wrote:
>>>>
>>>>
>>>> On 17/11/2014 15:58, Ard Biesheuvel wrote:
>>>>> Readonly memslots are often used to implement emulation of ROMs and
>>>>> NOR flashes, in which case the guest may legally map these regions as
>>>>> uncached.
>>>>> To deal with the incoherency associated with uncached guest mappings,
>>>>> treat all readonly memslots as incoherent, and ensure that pages that
>>>>> belong to regions tagged as such are flushed to DRAM before being passed
>>>>> to the guest.
>>>>
>>>> On x86, the processor combines the cacheability values from the two
>>>> levels of page tables.  Is there no way to do the same on ARM?
>>>
>>> Combining occurs on ARMv8 too. The Stage1 (guest) mapping is very strict
>>> (Device non-Gathering, non-Reordering, no Early Write Acknowledgement --
>>> for EFI_MEMORY_UC), which basically "overrides" the Stage2 (very lax
>>> host) memory attributes.
>>>
>>> When qemu writes, as part of emulating the flash programming commands,
>>> to the RAMBlock that *otherwise* backs the flash range (as a r/o
>>> memslot), those writes (from host userspace) tend to end up in dcache.
>>>
>>> But, when the guest flips back the flash to romd mode, and tries to read
>>> back the values from the flash as plain ROM, the dcache is completely
>>> bypassed due to the strict stage1 mapping, and the guest goes directly
>>> to DRAM.
>>>
>>> Where qemu's earlier writes are not yet / necessarily visible.
>>>
>>> Please see my original patch (which was incomplete) in the attachment,
>>> it has a very verbose commit message.
>>>
>>> Anyway, I'll let others explain; they can word it better than I can :)
>>>
>>> FWIW,
>>>
>>> Series
>>> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
>>>
>>> I ported this series to a 3.17.0+ based kernel, and tested it. It works
>>> fine. The ROM-like view of the NOR flash now reflects the previously
>>> programmed contents.
>>>
>>> Series
>>> Tested-by: Laszlo Ersek <lersek@redhat.com>
>>>
>>> Thanks!
>>> Laszlo
>>>
>>>
>>>
>>> _______________________________________________
>>> kvmarm mailing list
>>> kvmarm@lists.cs.columbia.edu
>>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>>>
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Maydell Nov. 20, 2014, 6:40 p.m. UTC | #4
On 20 November 2014 18:35, Mario Smarduch <m.smarduch@samsung.com> wrote:
> I think beyond consistency, there should be no double mappings with
> conflicting attributes at any time or CPU state is undefined.

The situation is not so bleak as this. See section B2.9 "Mismatched
memory attributes" in the ARMv8 ARM ARM (DDI0487A.d), which lays
out in some detail what the results of mismatched attributes are
(generally, you lose ordering or coherency guarantees you might
have hoped to have). They're not pretty, but it's not as bad
as completely UNPREDICTABLE behaviour.

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mario Smarduch Nov. 20, 2014, 7:15 p.m. UTC | #5
On 11/20/2014 10:40 AM, Peter Maydell wrote:
> On 20 November 2014 18:35, Mario Smarduch <m.smarduch@samsung.com> wrote:
>> I think beyond consistency, there should be no double mappings with
>> conflicting attributes at any time or CPU state is undefined.
> 
> The situation is not so bleak as this. See section B2.9 "Mismatched
> memory attributes" in the ARMv8 ARM ARM (DDI0487A.d), which lays
> out in some detail what the results of mismatched attributes are
> (generally, you lose ordering or coherency guarantees you might
> have hoped to have). They're not pretty, but it's not as bad
> as completely UNPREDICTABLE behaviour.
> 
> thanks
> -- PMM
> 
Hi Peter,
  thanks for digging that up, quite a list to navigate
but it does provide detailed guidance.

- Mario
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jon Masters Nov. 20, 2014, 7:49 p.m. UTC | #6
On 11/20/2014 01:40 PM, Peter Maydell wrote:
> On 20 November 2014 18:35, Mario Smarduch <m.smarduch@samsung.com> wrote:
>> I think beyond consistency, there should be no double mappings with
>> conflicting attributes at any time or CPU state is undefined.
> 
> The situation is not so bleak as this. See section B2.9 "Mismatched
> memory attributes" in the ARMv8 ARM ARM (DDI0487A.d), which lays
> out in some detail what the results of mismatched attributes are
> (generally, you lose ordering or coherency guarantees you might
> have hoped to have). They're not pretty, but it's not as bad
> as completely UNPREDICTABLE behaviour.

Quick side note that I did raise exactly this issue with the ARM
Architecture team several years ago (that of missmatched memory
attributes between a guest and hypervisor) and it is a known concern.
I'm personally concerned about a couple of things that I won't go into
here but will followup on what the longer term plan might be.

Jon.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Maydell Nov. 20, 2014, 8:10 p.m. UTC | #7
On 20 November 2014 19:49, Jon Masters <jcm@redhat.com> wrote:
> On 11/20/2014 01:40 PM, Peter Maydell wrote:
>> The situation is not so bleak as this. See section B2.9 "Mismatched
>> memory attributes" in the ARMv8 ARM ARM (DDI0487A.d), which lays
>> out in some detail what the results of mismatched attributes are
>> (generally, you lose ordering or coherency guarantees you might
>> have hoped to have). They're not pretty, but it's not as bad
>> as completely UNPREDICTABLE behaviour.
>
> Quick side note that I did raise exactly this issue with the ARM
> Architecture team several years ago (that of missmatched memory
> attributes between a guest and hypervisor) and it is a known concern.

I think in practice for a well-behaved guest we can arrange
that everything is fine (roughly, the guest has to treat
DMA-capable devices as doing coherent-dma, which we can tell
them to do via DT bindings or ACPI[*], plus the special
case handling we already have for bootup), and naughty guests
will only confuse themselves. But I need to think a bit more
about it (and we should probably write down how it works
somewhere :-)).

[*] We should be able to emulate non-coherent-DMA devices but
would need an extra API from KVM so userspace can do "clean
dcache to point of coherency". And in practice I'm not sure
we need to emulate those devices...

-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laszlo Ersek Nov. 20, 2014, 9:13 p.m. UTC | #8
On 11/20/14 21:10, Peter Maydell wrote:
> On 20 November 2014 19:49, Jon Masters <jcm@redhat.com> wrote:
>> On 11/20/2014 01:40 PM, Peter Maydell wrote:
>>> The situation is not so bleak as this. See section B2.9 "Mismatched
>>> memory attributes" in the ARMv8 ARM ARM (DDI0487A.d), which lays
>>> out in some detail what the results of mismatched attributes are
>>> (generally, you lose ordering or coherency guarantees you might
>>> have hoped to have). They're not pretty, but it's not as bad
>>> as completely UNPREDICTABLE behaviour.
>>
>> Quick side note that I did raise exactly this issue with the ARM
>> Architecture team several years ago (that of missmatched memory
>> attributes between a guest and hypervisor) and it is a known concern.
> 
> I think in practice for a well-behaved guest we can arrange
> that everything is fine (roughly, the guest has to treat
> DMA-capable devices as doing coherent-dma, which we can tell
> them to do via DT bindings or ACPI[*], plus the special
> case handling we already have for bootup), and naughty guests
> will only confuse themselves. But I need to think a bit more
> about it (and we should probably write down how it works
> somewhere :-)).
> 
> [*] We should be able to emulate non-coherent-DMA devices but
> would need an extra API from KVM so userspace can do "clean
> dcache to point of coherency". And in practice I'm not sure
> we need to emulate those devices...

This basically means that virtio transfers should just use normal memory
in the guest (treating virtio transfers as "coherent DMA"), right?

We're actually doing that in the ArmVirtualizationQemu.dsc build of
edk2, and Things Work Great (TM) in guests on the Mustang.

Thanks
Laszlo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Maydell Nov. 20, 2014, 9:59 p.m. UTC | #9
On 20 November 2014 21:13, Laszlo Ersek <lersek@redhat.com> wrote:
> On 11/20/14 21:10, Peter Maydell wrote:
>> I think in practice for a well-behaved guest we can arrange
>> that everything is fine (roughly, the guest has to treat
>> DMA-capable devices as doing coherent-dma, which we can tell
>> them to do via DT bindings or ACPI[*], plus the special
>> case handling we already have for bootup), and naughty guests
>> will only confuse themselves. But I need to think a bit more
>> about it (and we should probably write down how it works
>> somewhere :-)).

> This basically means that virtio transfers should just use normal memory
> in the guest (treating virtio transfers as "coherent DMA"), right?

Normal *cacheable*, but yes.

-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoffer Dall Nov. 21, 2014, 11:19 a.m. UTC | #10
Hi Mario,

On Wed, Nov 19, 2014 at 03:32:31PM -0800, Mario Smarduch wrote:
> Hi Laszlo,
> 
> couple observations.
> 
>      I'm wondering if access from qemu and guest won't
> result in mixed memory attributes and if that's acceptable
> to the CPU.
> 
> Also is if you update memory from qemu you may break
> dirty page logging/migration. Unless there is some other way
> you keep track. Of course it may not be applicable in your
> case (i.e. flash unused after boot).
> 
I'm not concerned about this particular case; dirty page logging exists
so KVM can inform userspace when a page may have been dirtied.  If
userspace directly dirties (is that a verb?) a page, then it already
knows that it needs to migrate that page and deal with it accordingly.

Or did I miss some more subtle point here?

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mario Smarduch Nov. 22, 2014, 1:50 a.m. UTC | #11
On 11/21/2014 03:19 AM, Christoffer Dall wrote:
> Hi Mario,
> 
> On Wed, Nov 19, 2014 at 03:32:31PM -0800, Mario Smarduch wrote:
>> Hi Laszlo,
>>
>> couple observations.
>>
>>      I'm wondering if access from qemu and guest won't
>> result in mixed memory attributes and if that's acceptable
>> to the CPU.
>>
>> Also is if you update memory from qemu you may break
>> dirty page logging/migration. Unless there is some other way
>> you keep track. Of course it may not be applicable in your
>> case (i.e. flash unused after boot).
>>
> I'm not concerned about this particular case; dirty page logging exists
> so KVM can inform userspace when a page may have been dirtied.  If
> userspace directly dirties (is that a verb?) a page, 
I would think so, I rely on software too much :)
> then it already knows that it needs to migrate that page and 
> deal with it accordingly.
> 
> Or did I miss some more subtle point here

QEMU has a global migration bitmap for all regions initially set
dirty, and it's updated over iterations with KVM's dirty bitmap. Once
dirty pages are migrated bits are cleared. If QEMU updates a
memory region directly I can't see how it's reflected in  that migration
bitmap that determines what pages should be migrated as it makes
it's passes. On x86 if host updates guest memory it marks that
page dirty.

But virtio writes to guest memory directly and that appears to
work just fine. I read that code sometime back, and will need to revisit.

- Mario

> 
> -Christoffer
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoffer Dall Nov. 22, 2014, 10:18 a.m. UTC | #12
On Fri, Nov 21, 2014 at 05:50:43PM -0800, Mario Smarduch wrote:
> On 11/21/2014 03:19 AM, Christoffer Dall wrote:
> > Hi Mario,
> > 
> > On Wed, Nov 19, 2014 at 03:32:31PM -0800, Mario Smarduch wrote:
> >> Hi Laszlo,
> >>
> >> couple observations.
> >>
> >>      I'm wondering if access from qemu and guest won't
> >> result in mixed memory attributes and if that's acceptable
> >> to the CPU.
> >>
> >> Also is if you update memory from qemu you may break
> >> dirty page logging/migration. Unless there is some other way
> >> you keep track. Of course it may not be applicable in your
> >> case (i.e. flash unused after boot).
> >>
> > I'm not concerned about this particular case; dirty page logging exists
> > so KVM can inform userspace when a page may have been dirtied.  If
> > userspace directly dirties (is that a verb?) a page, 
> I would think so, I rely on software too much :)
> > then it already knows that it needs to migrate that page and 
> > deal with it accordingly.
> > 
> > Or did I miss some more subtle point here
> 
> QEMU has a global migration bitmap for all regions initially set
> dirty, and it's updated over iterations with KVM's dirty bitmap. Once
> dirty pages are migrated bits are cleared. If QEMU updates a
> memory region directly I can't see how it's reflected in  that migration
> bitmap that determines what pages should be migrated as it makes
> it's passes. On x86 if host updates guest memory it marks that
> page dirty.
> 
> But virtio writes to guest memory directly and that appears to
> work just fine. I read that code sometime back, and will need to revisit.
> 
In any case, that's a QEMU implementation issue and nothing the kernel
needs to be concerned with.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laszlo Ersek Nov. 22, 2014, 10:26 a.m. UTC | #13
On 11/22/14 11:18, Christoffer Dall wrote:
> On Fri, Nov 21, 2014 at 05:50:43PM -0800, Mario Smarduch wrote:

>> But virtio writes to guest memory directly and that appears to
>> work just fine. I read that code sometime back, and will need to revisit.
>>
> In any case, that's a QEMU implementation issue and nothing the kernel
> needs to be concerned with.

(Let's call it "qemu implementation detail" -- there's no "issue". The
reason virtio works is not by incident, it has a known cause. As
confirmed by Peter, there's no problem with the virtio buffers because
the guest doesn't mark them as uncacheable, so *all* accesses go through
the dcache.)

Thanks
Laszlo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Maydell Nov. 22, 2014, 12:27 p.m. UTC | #14
On 22 November 2014 at 01:50, Mario Smarduch <m.smarduch@samsung.com> wrote:
> QEMU has a global migration bitmap for all regions initially set
> dirty, and it's updated over iterations with KVM's dirty bitmap. Once
> dirty pages are migrated bits are cleared. If QEMU updates a
> memory region directly I can't see how it's reflected in  that migration
> bitmap that determines what pages should be migrated as it makes
> it's passes. On x86 if host updates guest memory it marks that
> page dirty.
>
> But virtio writes to guest memory directly and that appears to
> work just fine. I read that code sometime back, and will need to revisit.

All devices in QEMU that write to guest memory will do it via
a function in exec.c (possibly through wrapper functions) which
eventually will call invalidate_and_set_dirty(), which is what
is responsible for updating our dirty bitmaps. In the specific
case of virtio, the virtio device ends up calling virtqueue_fill(),
which does a cpu_physical_memory_unmap(), which just calls
address_space_unmap(), which will either directly call
invalidate_and_set_dirty() or, if using a bounce buffer, will
copy the bounce buffer to guest RAM with address_space_write(),
which calls address_space_rw(), which will do the
invalidate_and_set_dirty().

There's no cache incoherency issue for migration, because the
migration code is in the QEMU process and so it will read the
most recent thing QEMU wrote whether that data is still in the
dcache or has migrated out to real (host) RAM.

-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From a2b4da9b03f03ccdb8b0988a5cc64d1967f00398 Mon Sep 17 00:00:00 2001
From: Laszlo Ersek <lersek@redhat.com>
Date: Sun, 16 Nov 2014 01:43:11 +0100
Subject: [PATCH] arm, arm64: KVM: clean cache on page fault also when IPA is
 uncached (WIP)

This patch builds on Marc Zyngier's commit
2d58b733c87689d3d5144e4ac94ea861cc729145.

(1) The guest bypasses the cache *not only* when the VCPU's dcache is
    disabled (see bit 0 and bit 2 in SCTLR_EL1, "MMU enable" and "Cache
    enable", respectively -- vcpu_has_cache_enabled()).

    The guest bypasses the cache *also* when the Stage 1 memory attributes
    say "device memory" about the Intermediate Page Address in question,
    independently of the Stage 2 memory attributes. Refer to:

      Table D5-38
      Combining the stage 1 and stage 2 memory type assignments

    in the ARM ARM. (This is likely similar to MTRRs on x86.)

(2) In edk2 (EFI Development Kit II), the ARM NOR flash driver,

      ArmPlatformPkg/Drivers/NorFlashDxe/NorFlashFvbDxe.c

    uses the AddMemorySpace() and SetMemorySpaceAttributes() Global
    Coherency Domain Services of DXE (Driver eXecution Environment) to
    *justifiedly* set the attributes of the guest memory covering the
    flash chip to EFI_MEMORY_UC ("uncached").

    According to the AArch64 bindings for UEFI (see "2.3.6.1 Memory types"
    in the UEFI-2.4A specification), EFI_MEMORY_UC is mapped to:

                       ARM Memory Type:
                       MAIR attribute encoding    ARM Memory Type:
    EFI Memory Type    Attr<n> [7:4] [3:0]        Meaning
    ---------------    -----------------------    ------------------------
    EFI_MEMORY_UC      0000 0000                  Device-nGnRnE (Device
    (Not cacheable)                               non-Gathering,
                                                  non-Reordering, no Early
                                                  Write Acknowledgement)

    This is correctly implemented in edk2, in the ArmConfigureMmu()
    function, via the ArmSetMAIR() call and the MAIR_ATTR() macro:

    The TT_ATTR_INDX_DEVICE_MEMORY (== 0) memory attribute index, which is
    used for EFI_MEMORY_UC memory, is associated with the
    MAIR_ATTR_DEVICE_MEMORY (== 0x00, see above) memory attribute value,
    in the MAIR_ELx register.

As a consequence of (1) and (2), when edk2 code running in the guest
accesses an IPA falling in the flash range, it will completely bypass
the cache.

Therefore, when such a page is faulted in in user_mem_abort(), we must
flush the data cache; otherwise the guest will see stale data in the flash
chip.

This patch is not complete because I have no clue how to calculate the
memory attribute for "fault_ipa" in user_mem_abort(). Right now I set
"fault_ipa_uncached" to constant true, which might incur some performance
penalty for data faults, but it certainly improves correctness -- the
ArmVirtualizationPkg platform build of edk2 actually boots as a KVM guest
on APM Mustang.

Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---
 arch/arm/include/asm/kvm_mmu.h   |  5 +++--
 arch/arm64/include/asm/kvm_mmu.h |  5 +++--
 arch/arm/kvm/mmu.c               | 10 ++++++++--
 3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index acb0d57..f867060 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -161,9 +161,10 @@  static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 }
 
 static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
-					     unsigned long size)
+					     unsigned long size,
+					     bool ipa_uncached)
 {
-	if (!vcpu_has_cache_enabled(vcpu))
+	if (!vcpu_has_cache_enabled(vcpu) || ipa_uncached)
 		kvm_flush_dcache_to_poc((void *)hva, size);
 	
 	/*
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 0caf7a5..123b521 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -243,9 +243,10 @@  static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 }
 
 static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
-					     unsigned long size)
+					     unsigned long size,
+					     bool ipa_uncached)
 {
-	if (!vcpu_has_cache_enabled(vcpu))
+	if (!vcpu_has_cache_enabled(vcpu) || ipa_uncached)
 		kvm_flush_dcache_to_poc((void *)hva, size);
 
 	if (!icache_is_aliasing()) {		/* PIPT */
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 57a403a..e2323b7 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -847,6 +847,7 @@  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	struct vm_area_struct *vma;
 	pfn_t pfn;
 	pgprot_t mem_type = PAGE_S2;
+	bool fault_ipa_uncached;
 
 	write_fault = kvm_is_write_fault(vcpu);
 	if (fault_status == FSC_PERM && !write_fault) {
@@ -913,6 +914,9 @@  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (!hugetlb && !force_pte)
 		hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
 
+	/* TBD */
+	fault_ipa_uncached = true;
+
 	if (hugetlb) {
 		pmd_t new_pmd = pfn_pmd(pfn, mem_type);
 		new_pmd = pmd_mkhuge(new_pmd);
@@ -920,7 +924,8 @@  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			kvm_set_s2pmd_writable(&new_pmd);
 			kvm_set_pfn_dirty(pfn);
 		}
-		coherent_cache_guest_page(vcpu, hva & PMD_MASK, PMD_SIZE);
+		coherent_cache_guest_page(vcpu, hva & PMD_MASK, PMD_SIZE,
+					  fault_ipa_uncached);
 		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
 	} else {
 		pte_t new_pte = pfn_pte(pfn, mem_type);
@@ -928,7 +933,8 @@  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			kvm_set_s2pte_writable(&new_pte);
 			kvm_set_pfn_dirty(pfn);
 		}
-		coherent_cache_guest_page(vcpu, hva, PAGE_SIZE);
+		coherent_cache_guest_page(vcpu, hva, PAGE_SIZE,
+					  fault_ipa_uncached);
 		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte,
 			pgprot_val(mem_type) == pgprot_val(PAGE_S2_DEVICE));
 	}
-- 
1.8.3.1