Message ID | 546A1905.6080607@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Laszlo, couple observations. I'm wondering if access from qemu and guest won't result in mixed memory attributes and if that's acceptable to the CPU. Also is if you update memory from qemu you may break dirty page logging/migration. Unless there is some other way you keep track. Of course it may not be applicable in your case (i.e. flash unused after boot). - Mario On 11/17/2014 07:49 AM, Laszlo Ersek wrote: > On 11/17/14 16:29, Paolo Bonzini wrote: >> >> >> On 17/11/2014 15:58, Ard Biesheuvel wrote: >>> Readonly memslots are often used to implement emulation of ROMs and >>> NOR flashes, in which case the guest may legally map these regions as >>> uncached. >>> To deal with the incoherency associated with uncached guest mappings, >>> treat all readonly memslots as incoherent, and ensure that pages that >>> belong to regions tagged as such are flushed to DRAM before being passed >>> to the guest. >> >> On x86, the processor combines the cacheability values from the two >> levels of page tables. Is there no way to do the same on ARM? > > Combining occurs on ARMv8 too. The Stage1 (guest) mapping is very strict > (Device non-Gathering, non-Reordering, no Early Write Acknowledgement -- > for EFI_MEMORY_UC), which basically "overrides" the Stage2 (very lax > host) memory attributes. > > When qemu writes, as part of emulating the flash programming commands, > to the RAMBlock that *otherwise* backs the flash range (as a r/o > memslot), those writes (from host userspace) tend to end up in dcache. > > But, when the guest flips back the flash to romd mode, and tries to read > back the values from the flash as plain ROM, the dcache is completely > bypassed due to the strict stage1 mapping, and the guest goes directly > to DRAM. > > Where qemu's earlier writes are not yet / necessarily visible. > > Please see my original patch (which was incomplete) in the attachment, > it has a very verbose commit message. > > Anyway, I'll let others explain; they can word it better than I can :) > > FWIW, > > Series > Reviewed-by: Laszlo Ersek <lersek@redhat.com> > > I ported this series to a 3.17.0+ based kernel, and tested it. It works > fine. The ROM-like view of the NOR flash now reflects the previously > programmed contents. > > Series > Tested-by: Laszlo Ersek <lersek@redhat.com> > > Thanks! > Laszlo > > > > _______________________________________________ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/20/14 00:32, Mario Smarduch wrote: > Hi Laszlo, > > couple observations. > > I'm wondering if access from qemu and guest won't > result in mixed memory attributes and if that's acceptable > to the CPU. Normally this would be a problem I think (Jon raised the topic of live migration). However, for flash programming specifically, I think the guest's access pattern ensures that we'll see things OK. When the guest issues the first write access, the memslot is deleted, and everything is forwarded to qemu, both reads and writes. In response qemu modifies the array that *otherwise* backs the flash. These modifications by qemu end up in the dcache mostly. When the guest is done "programming", it writes a special command (read array mode) at which point the memslot is recreated (as read-only) and flushed / set up for flushing during demand paging. So from the emulated flash POV, the memslot either doesn't exist at all (and then qemu serves all accesses just fine), or it exists r/o, at which point qemu (host userspace) will have stopped writing to it, and will have set it up for flushing before and during guest read accesses. > Also is if you update memory from qemu you may break > dirty page logging/migration. Very probably. Jon said the same thing. > Unless there is some other way > you keep track. Of course it may not be applicable in your > case (i.e. flash unused after boot). The flash *is* used after boot, because the UEFI runtime variable services *are* exercised by the guest kernel. However those use the same access pattern (it's the same set of UEFI services just called by a different "client"). *Uncoordinated* access from guest and host in parallel will be a big problem; but we're not that far yet, and we need to get the flash problem sorted, so that we can at least boot and work on the basic stuff. The flash programming dance happens to provide coordination; the flash mode changes (which are equivalent to the teardown and the recreation of the memslot) can be considered barriers. I hope this is acceptable for the time being... Thanks Laszlo > > - Mario > > On 11/17/2014 07:49 AM, Laszlo Ersek wrote: >> On 11/17/14 16:29, Paolo Bonzini wrote: >>> >>> >>> On 17/11/2014 15:58, Ard Biesheuvel wrote: >>>> Readonly memslots are often used to implement emulation of ROMs and >>>> NOR flashes, in which case the guest may legally map these regions as >>>> uncached. >>>> To deal with the incoherency associated with uncached guest mappings, >>>> treat all readonly memslots as incoherent, and ensure that pages that >>>> belong to regions tagged as such are flushed to DRAM before being passed >>>> to the guest. >>> >>> On x86, the processor combines the cacheability values from the two >>> levels of page tables. Is there no way to do the same on ARM? >> >> Combining occurs on ARMv8 too. The Stage1 (guest) mapping is very strict >> (Device non-Gathering, non-Reordering, no Early Write Acknowledgement -- >> for EFI_MEMORY_UC), which basically "overrides" the Stage2 (very lax >> host) memory attributes. >> >> When qemu writes, as part of emulating the flash programming commands, >> to the RAMBlock that *otherwise* backs the flash range (as a r/o >> memslot), those writes (from host userspace) tend to end up in dcache. >> >> But, when the guest flips back the flash to romd mode, and tries to read >> back the values from the flash as plain ROM, the dcache is completely >> bypassed due to the strict stage1 mapping, and the guest goes directly >> to DRAM. >> >> Where qemu's earlier writes are not yet / necessarily visible. >> >> Please see my original patch (which was incomplete) in the attachment, >> it has a very verbose commit message. >> >> Anyway, I'll let others explain; they can word it better than I can :) >> >> FWIW, >> >> Series >> Reviewed-by: Laszlo Ersek <lersek@redhat.com> >> >> I ported this series to a 3.17.0+ based kernel, and tested it. It works >> fine. The ROM-like view of the NOR flash now reflects the previously >> programmed contents. >> >> Series >> Tested-by: Laszlo Ersek <lersek@redhat.com> >> >> Thanks! >> Laszlo >> >> >> >> _______________________________________________ >> kvmarm mailing list >> kvmarm@lists.cs.columbia.edu >> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm >> > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/20/2014 12:08 AM, Laszlo Ersek wrote: > On 11/20/14 00:32, Mario Smarduch wrote: >> Hi Laszlo, >> >> couple observations. >> >> I'm wondering if access from qemu and guest won't >> result in mixed memory attributes and if that's acceptable >> to the CPU. > > Normally this would be a problem I think (Jon raised the topic of live > migration). However, for flash programming specifically, I think the > guest's access pattern ensures that we'll see things OK. > > When the guest issues the first write access, the memslot is deleted, > and everything is forwarded to qemu, both reads and writes. In response > qemu modifies the array that *otherwise* backs the flash. These > modifications by qemu end up in the dcache mostly. When the guest is > done "programming", it writes a special command (read array mode) at > which point the memslot is recreated (as read-only) and flushed / set up > for flushing during demand paging. > > So from the emulated flash POV, the memslot either doesn't exist at all > (and then qemu serves all accesses just fine), or it exists r/o, at > which point qemu (host userspace) will have stopped writing to it, and > will have set it up for flushing before and during guest read accesses. I think beyond consistency, there should be no double mappings with conflicting attributes at any time or CPU state is undefined. At least that's what I recall for cases where identity mapping was cacheble and user mmapp'ed regions uncacheable. Side effects like CPU hardstop or victim invalidate of dirty cache line. With virtualization extensions maybe behavior is different. I guess if you're not seeing lock ups or crashes then it appears to work :) Probably more senior folks in ARM community are in better position to address this, but I thought I raise a flag. > >> Also is if you update memory from qemu you may break >> dirty page logging/migration. > > Very probably. Jon said the same thing. > >> Unless there is some other way >> you keep track. Of course it may not be applicable in your >> case (i.e. flash unused after boot). > > The flash *is* used after boot, because the UEFI runtime variable > services *are* exercised by the guest kernel. However those use the same > access pattern (it's the same set of UEFI services just called by a > different "client"). > > *Uncoordinated* access from guest and host in parallel will be a big > problem; but we're not that far yet, and we need to get the flash > problem sorted, so that we can at least boot and work on the basic > stuff. The flash programming dance happens to provide coordination; the > flash mode changes (which are equivalent to the teardown and the > recreation of the memslot) can be considered barriers. > > I hope this is acceptable for the time being... Yeah I understand you have a more imediatte requirement to support, migration isssue is more fyi. Thanks for the details helps to understand the context. - Mario > > Thanks > Laszlo > >> >> - Mario >> >> On 11/17/2014 07:49 AM, Laszlo Ersek wrote: >>> On 11/17/14 16:29, Paolo Bonzini wrote: >>>> >>>> >>>> On 17/11/2014 15:58, Ard Biesheuvel wrote: >>>>> Readonly memslots are often used to implement emulation of ROMs and >>>>> NOR flashes, in which case the guest may legally map these regions as >>>>> uncached. >>>>> To deal with the incoherency associated with uncached guest mappings, >>>>> treat all readonly memslots as incoherent, and ensure that pages that >>>>> belong to regions tagged as such are flushed to DRAM before being passed >>>>> to the guest. >>>> >>>> On x86, the processor combines the cacheability values from the two >>>> levels of page tables. Is there no way to do the same on ARM? >>> >>> Combining occurs on ARMv8 too. The Stage1 (guest) mapping is very strict >>> (Device non-Gathering, non-Reordering, no Early Write Acknowledgement -- >>> for EFI_MEMORY_UC), which basically "overrides" the Stage2 (very lax >>> host) memory attributes. >>> >>> When qemu writes, as part of emulating the flash programming commands, >>> to the RAMBlock that *otherwise* backs the flash range (as a r/o >>> memslot), those writes (from host userspace) tend to end up in dcache. >>> >>> But, when the guest flips back the flash to romd mode, and tries to read >>> back the values from the flash as plain ROM, the dcache is completely >>> bypassed due to the strict stage1 mapping, and the guest goes directly >>> to DRAM. >>> >>> Where qemu's earlier writes are not yet / necessarily visible. >>> >>> Please see my original patch (which was incomplete) in the attachment, >>> it has a very verbose commit message. >>> >>> Anyway, I'll let others explain; they can word it better than I can :) >>> >>> FWIW, >>> >>> Series >>> Reviewed-by: Laszlo Ersek <lersek@redhat.com> >>> >>> I ported this series to a 3.17.0+ based kernel, and tested it. It works >>> fine. The ROM-like view of the NOR flash now reflects the previously >>> programmed contents. >>> >>> Series >>> Tested-by: Laszlo Ersek <lersek@redhat.com> >>> >>> Thanks! >>> Laszlo >>> >>> >>> >>> _______________________________________________ >>> kvmarm mailing list >>> kvmarm@lists.cs.columbia.edu >>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm >>> >> > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 20 November 2014 18:35, Mario Smarduch <m.smarduch@samsung.com> wrote: > I think beyond consistency, there should be no double mappings with > conflicting attributes at any time or CPU state is undefined. The situation is not so bleak as this. See section B2.9 "Mismatched memory attributes" in the ARMv8 ARM ARM (DDI0487A.d), which lays out in some detail what the results of mismatched attributes are (generally, you lose ordering or coherency guarantees you might have hoped to have). They're not pretty, but it's not as bad as completely UNPREDICTABLE behaviour. thanks -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/20/2014 10:40 AM, Peter Maydell wrote: > On 20 November 2014 18:35, Mario Smarduch <m.smarduch@samsung.com> wrote: >> I think beyond consistency, there should be no double mappings with >> conflicting attributes at any time or CPU state is undefined. > > The situation is not so bleak as this. See section B2.9 "Mismatched > memory attributes" in the ARMv8 ARM ARM (DDI0487A.d), which lays > out in some detail what the results of mismatched attributes are > (generally, you lose ordering or coherency guarantees you might > have hoped to have). They're not pretty, but it's not as bad > as completely UNPREDICTABLE behaviour. > > thanks > -- PMM > Hi Peter, thanks for digging that up, quite a list to navigate but it does provide detailed guidance. - Mario -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/20/2014 01:40 PM, Peter Maydell wrote: > On 20 November 2014 18:35, Mario Smarduch <m.smarduch@samsung.com> wrote: >> I think beyond consistency, there should be no double mappings with >> conflicting attributes at any time or CPU state is undefined. > > The situation is not so bleak as this. See section B2.9 "Mismatched > memory attributes" in the ARMv8 ARM ARM (DDI0487A.d), which lays > out in some detail what the results of mismatched attributes are > (generally, you lose ordering or coherency guarantees you might > have hoped to have). They're not pretty, but it's not as bad > as completely UNPREDICTABLE behaviour. Quick side note that I did raise exactly this issue with the ARM Architecture team several years ago (that of missmatched memory attributes between a guest and hypervisor) and it is a known concern. I'm personally concerned about a couple of things that I won't go into here but will followup on what the longer term plan might be. Jon. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 20 November 2014 19:49, Jon Masters <jcm@redhat.com> wrote: > On 11/20/2014 01:40 PM, Peter Maydell wrote: >> The situation is not so bleak as this. See section B2.9 "Mismatched >> memory attributes" in the ARMv8 ARM ARM (DDI0487A.d), which lays >> out in some detail what the results of mismatched attributes are >> (generally, you lose ordering or coherency guarantees you might >> have hoped to have). They're not pretty, but it's not as bad >> as completely UNPREDICTABLE behaviour. > > Quick side note that I did raise exactly this issue with the ARM > Architecture team several years ago (that of missmatched memory > attributes between a guest and hypervisor) and it is a known concern. I think in practice for a well-behaved guest we can arrange that everything is fine (roughly, the guest has to treat DMA-capable devices as doing coherent-dma, which we can tell them to do via DT bindings or ACPI[*], plus the special case handling we already have for bootup), and naughty guests will only confuse themselves. But I need to think a bit more about it (and we should probably write down how it works somewhere :-)). [*] We should be able to emulate non-coherent-DMA devices but would need an extra API from KVM so userspace can do "clean dcache to point of coherency". And in practice I'm not sure we need to emulate those devices... -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/20/14 21:10, Peter Maydell wrote: > On 20 November 2014 19:49, Jon Masters <jcm@redhat.com> wrote: >> On 11/20/2014 01:40 PM, Peter Maydell wrote: >>> The situation is not so bleak as this. See section B2.9 "Mismatched >>> memory attributes" in the ARMv8 ARM ARM (DDI0487A.d), which lays >>> out in some detail what the results of mismatched attributes are >>> (generally, you lose ordering or coherency guarantees you might >>> have hoped to have). They're not pretty, but it's not as bad >>> as completely UNPREDICTABLE behaviour. >> >> Quick side note that I did raise exactly this issue with the ARM >> Architecture team several years ago (that of missmatched memory >> attributes between a guest and hypervisor) and it is a known concern. > > I think in practice for a well-behaved guest we can arrange > that everything is fine (roughly, the guest has to treat > DMA-capable devices as doing coherent-dma, which we can tell > them to do via DT bindings or ACPI[*], plus the special > case handling we already have for bootup), and naughty guests > will only confuse themselves. But I need to think a bit more > about it (and we should probably write down how it works > somewhere :-)). > > [*] We should be able to emulate non-coherent-DMA devices but > would need an extra API from KVM so userspace can do "clean > dcache to point of coherency". And in practice I'm not sure > we need to emulate those devices... This basically means that virtio transfers should just use normal memory in the guest (treating virtio transfers as "coherent DMA"), right? We're actually doing that in the ArmVirtualizationQemu.dsc build of edk2, and Things Work Great (TM) in guests on the Mustang. Thanks Laszlo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 20 November 2014 21:13, Laszlo Ersek <lersek@redhat.com> wrote: > On 11/20/14 21:10, Peter Maydell wrote: >> I think in practice for a well-behaved guest we can arrange >> that everything is fine (roughly, the guest has to treat >> DMA-capable devices as doing coherent-dma, which we can tell >> them to do via DT bindings or ACPI[*], plus the special >> case handling we already have for bootup), and naughty guests >> will only confuse themselves. But I need to think a bit more >> about it (and we should probably write down how it works >> somewhere :-)). > This basically means that virtio transfers should just use normal memory > in the guest (treating virtio transfers as "coherent DMA"), right? Normal *cacheable*, but yes. -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Mario, On Wed, Nov 19, 2014 at 03:32:31PM -0800, Mario Smarduch wrote: > Hi Laszlo, > > couple observations. > > I'm wondering if access from qemu and guest won't > result in mixed memory attributes and if that's acceptable > to the CPU. > > Also is if you update memory from qemu you may break > dirty page logging/migration. Unless there is some other way > you keep track. Of course it may not be applicable in your > case (i.e. flash unused after boot). > I'm not concerned about this particular case; dirty page logging exists so KVM can inform userspace when a page may have been dirtied. If userspace directly dirties (is that a verb?) a page, then it already knows that it needs to migrate that page and deal with it accordingly. Or did I miss some more subtle point here? -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/21/2014 03:19 AM, Christoffer Dall wrote: > Hi Mario, > > On Wed, Nov 19, 2014 at 03:32:31PM -0800, Mario Smarduch wrote: >> Hi Laszlo, >> >> couple observations. >> >> I'm wondering if access from qemu and guest won't >> result in mixed memory attributes and if that's acceptable >> to the CPU. >> >> Also is if you update memory from qemu you may break >> dirty page logging/migration. Unless there is some other way >> you keep track. Of course it may not be applicable in your >> case (i.e. flash unused after boot). >> > I'm not concerned about this particular case; dirty page logging exists > so KVM can inform userspace when a page may have been dirtied. If > userspace directly dirties (is that a verb?) a page, I would think so, I rely on software too much :) > then it already knows that it needs to migrate that page and > deal with it accordingly. > > Or did I miss some more subtle point here QEMU has a global migration bitmap for all regions initially set dirty, and it's updated over iterations with KVM's dirty bitmap. Once dirty pages are migrated bits are cleared. If QEMU updates a memory region directly I can't see how it's reflected in that migration bitmap that determines what pages should be migrated as it makes it's passes. On x86 if host updates guest memory it marks that page dirty. But virtio writes to guest memory directly and that appears to work just fine. I read that code sometime back, and will need to revisit. - Mario > > -Christoffer > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 21, 2014 at 05:50:43PM -0800, Mario Smarduch wrote: > On 11/21/2014 03:19 AM, Christoffer Dall wrote: > > Hi Mario, > > > > On Wed, Nov 19, 2014 at 03:32:31PM -0800, Mario Smarduch wrote: > >> Hi Laszlo, > >> > >> couple observations. > >> > >> I'm wondering if access from qemu and guest won't > >> result in mixed memory attributes and if that's acceptable > >> to the CPU. > >> > >> Also is if you update memory from qemu you may break > >> dirty page logging/migration. Unless there is some other way > >> you keep track. Of course it may not be applicable in your > >> case (i.e. flash unused after boot). > >> > > I'm not concerned about this particular case; dirty page logging exists > > so KVM can inform userspace when a page may have been dirtied. If > > userspace directly dirties (is that a verb?) a page, > I would think so, I rely on software too much :) > > then it already knows that it needs to migrate that page and > > deal with it accordingly. > > > > Or did I miss some more subtle point here > > QEMU has a global migration bitmap for all regions initially set > dirty, and it's updated over iterations with KVM's dirty bitmap. Once > dirty pages are migrated bits are cleared. If QEMU updates a > memory region directly I can't see how it's reflected in that migration > bitmap that determines what pages should be migrated as it makes > it's passes. On x86 if host updates guest memory it marks that > page dirty. > > But virtio writes to guest memory directly and that appears to > work just fine. I read that code sometime back, and will need to revisit. > In any case, that's a QEMU implementation issue and nothing the kernel needs to be concerned with. -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/22/14 11:18, Christoffer Dall wrote: > On Fri, Nov 21, 2014 at 05:50:43PM -0800, Mario Smarduch wrote: >> But virtio writes to guest memory directly and that appears to >> work just fine. I read that code sometime back, and will need to revisit. >> > In any case, that's a QEMU implementation issue and nothing the kernel > needs to be concerned with. (Let's call it "qemu implementation detail" -- there's no "issue". The reason virtio works is not by incident, it has a known cause. As confirmed by Peter, there's no problem with the virtio buffers because the guest doesn't mark them as uncacheable, so *all* accesses go through the dcache.) Thanks Laszlo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 22 November 2014 at 01:50, Mario Smarduch <m.smarduch@samsung.com> wrote: > QEMU has a global migration bitmap for all regions initially set > dirty, and it's updated over iterations with KVM's dirty bitmap. Once > dirty pages are migrated bits are cleared. If QEMU updates a > memory region directly I can't see how it's reflected in that migration > bitmap that determines what pages should be migrated as it makes > it's passes. On x86 if host updates guest memory it marks that > page dirty. > > But virtio writes to guest memory directly and that appears to > work just fine. I read that code sometime back, and will need to revisit. All devices in QEMU that write to guest memory will do it via a function in exec.c (possibly through wrapper functions) which eventually will call invalidate_and_set_dirty(), which is what is responsible for updating our dirty bitmaps. In the specific case of virtio, the virtio device ends up calling virtqueue_fill(), which does a cpu_physical_memory_unmap(), which just calls address_space_unmap(), which will either directly call invalidate_and_set_dirty() or, if using a bounce buffer, will copy the bounce buffer to guest RAM with address_space_write(), which calls address_space_rw(), which will do the invalidate_and_set_dirty(). There's no cache incoherency issue for migration, because the migration code is in the QEMU process and so it will read the most recent thing QEMU wrote whether that data is still in the dcache or has migrated out to real (host) RAM. -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From a2b4da9b03f03ccdb8b0988a5cc64d1967f00398 Mon Sep 17 00:00:00 2001 From: Laszlo Ersek <lersek@redhat.com> Date: Sun, 16 Nov 2014 01:43:11 +0100 Subject: [PATCH] arm, arm64: KVM: clean cache on page fault also when IPA is uncached (WIP) This patch builds on Marc Zyngier's commit 2d58b733c87689d3d5144e4ac94ea861cc729145. (1) The guest bypasses the cache *not only* when the VCPU's dcache is disabled (see bit 0 and bit 2 in SCTLR_EL1, "MMU enable" and "Cache enable", respectively -- vcpu_has_cache_enabled()). The guest bypasses the cache *also* when the Stage 1 memory attributes say "device memory" about the Intermediate Page Address in question, independently of the Stage 2 memory attributes. Refer to: Table D5-38 Combining the stage 1 and stage 2 memory type assignments in the ARM ARM. (This is likely similar to MTRRs on x86.) (2) In edk2 (EFI Development Kit II), the ARM NOR flash driver, ArmPlatformPkg/Drivers/NorFlashDxe/NorFlashFvbDxe.c uses the AddMemorySpace() and SetMemorySpaceAttributes() Global Coherency Domain Services of DXE (Driver eXecution Environment) to *justifiedly* set the attributes of the guest memory covering the flash chip to EFI_MEMORY_UC ("uncached"). According to the AArch64 bindings for UEFI (see "2.3.6.1 Memory types" in the UEFI-2.4A specification), EFI_MEMORY_UC is mapped to: ARM Memory Type: MAIR attribute encoding ARM Memory Type: EFI Memory Type Attr<n> [7:4] [3:0] Meaning --------------- ----------------------- ------------------------ EFI_MEMORY_UC 0000 0000 Device-nGnRnE (Device (Not cacheable) non-Gathering, non-Reordering, no Early Write Acknowledgement) This is correctly implemented in edk2, in the ArmConfigureMmu() function, via the ArmSetMAIR() call and the MAIR_ATTR() macro: The TT_ATTR_INDX_DEVICE_MEMORY (== 0) memory attribute index, which is used for EFI_MEMORY_UC memory, is associated with the MAIR_ATTR_DEVICE_MEMORY (== 0x00, see above) memory attribute value, in the MAIR_ELx register. As a consequence of (1) and (2), when edk2 code running in the guest accesses an IPA falling in the flash range, it will completely bypass the cache. Therefore, when such a page is faulted in in user_mem_abort(), we must flush the data cache; otherwise the guest will see stale data in the flash chip. This patch is not complete because I have no clue how to calculate the memory attribute for "fault_ipa" in user_mem_abort(). Right now I set "fault_ipa_uncached" to constant true, which might incur some performance penalty for data faults, but it certainly improves correctness -- the ArmVirtualizationPkg platform build of edk2 actually boots as a KVM guest on APM Mustang. Signed-off-by: Laszlo Ersek <lersek@redhat.com> --- arch/arm/include/asm/kvm_mmu.h | 5 +++-- arch/arm64/include/asm/kvm_mmu.h | 5 +++-- arch/arm/kvm/mmu.c | 10 ++++++++-- 3 files changed, 14 insertions(+), 6 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index acb0d57..f867060 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -161,9 +161,10 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu) } static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva, - unsigned long size) + unsigned long size, + bool ipa_uncached) { - if (!vcpu_has_cache_enabled(vcpu)) + if (!vcpu_has_cache_enabled(vcpu) || ipa_uncached) kvm_flush_dcache_to_poc((void *)hva, size); /* diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 0caf7a5..123b521 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -243,9 +243,10 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu) } static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva, - unsigned long size) + unsigned long size, + bool ipa_uncached) { - if (!vcpu_has_cache_enabled(vcpu)) + if (!vcpu_has_cache_enabled(vcpu) || ipa_uncached) kvm_flush_dcache_to_poc((void *)hva, size); if (!icache_is_aliasing()) { /* PIPT */ diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 57a403a..e2323b7 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -847,6 +847,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct vm_area_struct *vma; pfn_t pfn; pgprot_t mem_type = PAGE_S2; + bool fault_ipa_uncached; write_fault = kvm_is_write_fault(vcpu); if (fault_status == FSC_PERM && !write_fault) { @@ -913,6 +914,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (!hugetlb && !force_pte) hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa); + /* TBD */ + fault_ipa_uncached = true; + if (hugetlb) { pmd_t new_pmd = pfn_pmd(pfn, mem_type); new_pmd = pmd_mkhuge(new_pmd); @@ -920,7 +924,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, kvm_set_s2pmd_writable(&new_pmd); kvm_set_pfn_dirty(pfn); } - coherent_cache_guest_page(vcpu, hva & PMD_MASK, PMD_SIZE); + coherent_cache_guest_page(vcpu, hva & PMD_MASK, PMD_SIZE, + fault_ipa_uncached); ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd); } else { pte_t new_pte = pfn_pte(pfn, mem_type); @@ -928,7 +933,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, kvm_set_s2pte_writable(&new_pte); kvm_set_pfn_dirty(pfn); } - coherent_cache_guest_page(vcpu, hva, PAGE_SIZE); + coherent_cache_guest_page(vcpu, hva, PAGE_SIZE, + fault_ipa_uncached); ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, pgprot_val(mem_type) == pgprot_val(PAGE_S2_DEVICE)); } -- 1.8.3.1