Message ID | 20240105091237.24577-1-yan.y.zhao@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | KVM: Honor guest memory types for virtio GPU devices | expand |
On Fri, Jan 05, 2024 at 05:12:37PM +0800, Yan Zhao wrote: > This series allow user space to notify KVM of noncoherent DMA status so as > to let KVM honor guest memory types in specified memory slot ranges. > > Motivation > === > A virtio GPU device may want to configure GPU hardware to work in > noncoherent mode, i.e. some of its DMAs do not snoop CPU caches. Does this mean some DMA reads do not snoop the caches or does it include DMA writes not synchronizing the caches too? > This is generally for performance consideration. > In certain platform, GFX performance can improve 20+% with DMAs going to > noncoherent path. > > This noncoherent DMA mode works in below sequence: > 1. Host backend driver programs hardware not to snoop memory of target > DMA buffer. > 2. Host backend driver indicates guest frontend driver to program guest PAT > to WC for target DMA buffer. > 3. Guest frontend driver writes to the DMA buffer without clflush stuffs. > 4. Hardware does noncoherent DMA to the target buffer. > > In this noncoherent DMA mode, both guest and hardware regard a DMA buffer > as not cached. So, if KVM forces the effective memory type of this DMA > buffer to be WB, hardware DMA may read incorrect data and cause misc > failures. I don't know all the details, but a big concern would be that the caches remain fully coherent with the underlying memory at any point where kvm decides to revoke the page from the VM. If you allow an incoherence of cache != physical then it opens a security attack where the observed content of memory can change when it should not. ARM64 has issues like this and due to that ARM has to have explict, expensive, cache flushing at certain points. Jason
On Fri, Jan 05, 2024 at 03:55:51PM -0400, Jason Gunthorpe wrote: > On Fri, Jan 05, 2024 at 05:12:37PM +0800, Yan Zhao wrote: > > This series allow user space to notify KVM of noncoherent DMA status so as > > to let KVM honor guest memory types in specified memory slot ranges. > > > > Motivation > > === > > A virtio GPU device may want to configure GPU hardware to work in > > noncoherent mode, i.e. some of its DMAs do not snoop CPU caches. > > Does this mean some DMA reads do not snoop the caches or does it > include DMA writes not synchronizing the caches too? Both DMA reads and writes are not snooped. The virtio host side will mmap the buffer to WC (pgprot_writecombine) for CPU access and program the device to access the buffer in uncached way. Meanwhile, virtio host side will construct a memslot in KVM with the PTR returned from the mmap, and notify virtio guest side to mmap the same buffer in guest page table with PAT=WC, too. > > > This is generally for performance consideration. > > In certain platform, GFX performance can improve 20+% with DMAs going to > > noncoherent path. > > > > This noncoherent DMA mode works in below sequence: > > 1. Host backend driver programs hardware not to snoop memory of target > > DMA buffer. > > 2. Host backend driver indicates guest frontend driver to program guest PAT > > to WC for target DMA buffer. > > 3. Guest frontend driver writes to the DMA buffer without clflush stuffs. > > 4. Hardware does noncoherent DMA to the target buffer. > > > > In this noncoherent DMA mode, both guest and hardware regard a DMA buffer > > as not cached. So, if KVM forces the effective memory type of this DMA > > buffer to be WB, hardware DMA may read incorrect data and cause misc > > failures. > > I don't know all the details, but a big concern would be that the > caches remain fully coherent with the underlying memory at any point > where kvm decides to revoke the page from the VM. Ah, you mean, for page migration, the content of the page may not be copied correctly, right? Currently in x86, we have 2 ways to let KVM honor guest memory types: 1. through KVM memslot flag introduced in this series, for virtio GPUs, in memslot granularity. 2. through increasing noncoherent dma count, as what's done in VFIO, for Intel GPU passthrough, for all guest memory. This page migration issue should not be the case for virtio GPU, as both host and guest are synced to use the same memory type and actually the pages are not anonymous pages. For GPU pass-through, though host mmaps with WB, it's still fine for guest to use WC because page migration on pages of VMs with pass-through device is not allowed. But I agree, this should be a case if user space sets the memslot flag to honor guest memory type to memslots for guest system RAM where non-enlightened guest components may cause guest and host to access with different memory types. Or simply when the guest is a malicious one. > If you allow an incoherence of cache != physical then it opens a > security attack where the observed content of memory can change when > it should not. In this case, will this security attack impact other guests? > > ARM64 has issues like this and due to that ARM has to have explict, > expensive, cache flushing at certain points. >
On Mon, Jan 08, 2024 at 02:02:57PM +0800, Yan Zhao wrote: > On Fri, Jan 05, 2024 at 03:55:51PM -0400, Jason Gunthorpe wrote: > > On Fri, Jan 05, 2024 at 05:12:37PM +0800, Yan Zhao wrote: > > > This series allow user space to notify KVM of noncoherent DMA status so as > > > to let KVM honor guest memory types in specified memory slot ranges. > > > > > > Motivation > > > === > > > A virtio GPU device may want to configure GPU hardware to work in > > > noncoherent mode, i.e. some of its DMAs do not snoop CPU caches. > > > > Does this mean some DMA reads do not snoop the caches or does it > > include DMA writes not synchronizing the caches too? > Both DMA reads and writes are not snooped. Oh that sounds really dangerous. > > > This is generally for performance consideration. > > > In certain platform, GFX performance can improve 20+% with DMAs going to > > > noncoherent path. > > > > > > This noncoherent DMA mode works in below sequence: > > > 1. Host backend driver programs hardware not to snoop memory of target > > > DMA buffer. > > > 2. Host backend driver indicates guest frontend driver to program guest PAT > > > to WC for target DMA buffer. > > > 3. Guest frontend driver writes to the DMA buffer without clflush stuffs. > > > 4. Hardware does noncoherent DMA to the target buffer. > > > > > > In this noncoherent DMA mode, both guest and hardware regard a DMA buffer > > > as not cached. So, if KVM forces the effective memory type of this DMA > > > buffer to be WB, hardware DMA may read incorrect data and cause misc > > > failures. > > > > I don't know all the details, but a big concern would be that the > > caches remain fully coherent with the underlying memory at any point > > where kvm decides to revoke the page from the VM. > Ah, you mean, for page migration, the content of the page may not be copied > correctly, right? Not just migration. Any point where KVM revokes the page from the VM. Ie just tearing down the VM still has to make the cache coherent with physical or there may be problems. > Currently in x86, we have 2 ways to let KVM honor guest memory types: > 1. through KVM memslot flag introduced in this series, for virtio GPUs, in > memslot granularity. > 2. through increasing noncoherent dma count, as what's done in VFIO, for > Intel GPU passthrough, for all guest memory. And where does all this fixup the coherency problem? > This page migration issue should not be the case for virtio GPU, as both host > and guest are synced to use the same memory type and actually the pages > are not anonymous pages. The guest isn't required to do this so it can force the cache to become incoherent. > > If you allow an incoherence of cache != physical then it opens a > > security attack where the observed content of memory can change when > > it should not. > > In this case, will this security attack impact other guests? It impacts the hypervisor potentially. It depends.. Jason
On Mon, Jan 08, 2024 at 10:02:50AM -0400, Jason Gunthorpe wrote: > On Mon, Jan 08, 2024 at 02:02:57PM +0800, Yan Zhao wrote: > > On Fri, Jan 05, 2024 at 03:55:51PM -0400, Jason Gunthorpe wrote: > > > On Fri, Jan 05, 2024 at 05:12:37PM +0800, Yan Zhao wrote: > > > > This series allow user space to notify KVM of noncoherent DMA status so as > > > > to let KVM honor guest memory types in specified memory slot ranges. > > > > > > > > Motivation > > > > === > > > > A virtio GPU device may want to configure GPU hardware to work in > > > > noncoherent mode, i.e. some of its DMAs do not snoop CPU caches. > > > > > > Does this mean some DMA reads do not snoop the caches or does it > > > include DMA writes not synchronizing the caches too? > > Both DMA reads and writes are not snooped. > > Oh that sounds really dangerous. So if this is an issue then we might already have a problem, because with many devices it's entirely up to the device programming whether the i/o is snooping or not. So the moment you pass such a device to a guest, whether there's explicit support for non-coherent or not, you have a problem. _If_ there is a fundamental problem. I'm not sure of that, because my assumption was that at most the guest shoots itself and the data corruption doesn't go any further the moment the hypervisor does the dma/iommu unmapping. Also, there's a pile of x86 devices where this very much applies, x86 being dma-coherent is not really the true ground story. Cheers, Sima > > > > This is generally for performance consideration. > > > > In certain platform, GFX performance can improve 20+% with DMAs going to > > > > noncoherent path. > > > > > > > > This noncoherent DMA mode works in below sequence: > > > > 1. Host backend driver programs hardware not to snoop memory of target > > > > DMA buffer. > > > > 2. Host backend driver indicates guest frontend driver to program guest PAT > > > > to WC for target DMA buffer. > > > > 3. Guest frontend driver writes to the DMA buffer without clflush stuffs. > > > > 4. Hardware does noncoherent DMA to the target buffer. > > > > > > > > In this noncoherent DMA mode, both guest and hardware regard a DMA buffer > > > > as not cached. So, if KVM forces the effective memory type of this DMA > > > > buffer to be WB, hardware DMA may read incorrect data and cause misc > > > > failures. > > > > > > I don't know all the details, but a big concern would be that the > > > caches remain fully coherent with the underlying memory at any point > > > where kvm decides to revoke the page from the VM. > > Ah, you mean, for page migration, the content of the page may not be copied > > correctly, right? > > Not just migration. Any point where KVM revokes the page from the > VM. Ie just tearing down the VM still has to make the cache coherent > with physical or there may be problems. > > > Currently in x86, we have 2 ways to let KVM honor guest memory types: > > 1. through KVM memslot flag introduced in this series, for virtio GPUs, in > > memslot granularity. > > 2. through increasing noncoherent dma count, as what's done in VFIO, for > > Intel GPU passthrough, for all guest memory. > > And where does all this fixup the coherency problem? > > > This page migration issue should not be the case for virtio GPU, as both host > > and guest are synced to use the same memory type and actually the pages > > are not anonymous pages. > > The guest isn't required to do this so it can force the cache to > become incoherent. > > > > If you allow an incoherence of cache != physical then it opens a > > > security attack where the observed content of memory can change when > > > it should not. > > > > In this case, will this security attack impact other guests? > > It impacts the hypervisor potentially. It depends.. > > Jason
On Mon, Jan 08, 2024 at 04:25:02PM +0100, Daniel Vetter wrote: > On Mon, Jan 08, 2024 at 10:02:50AM -0400, Jason Gunthorpe wrote: > > On Mon, Jan 08, 2024 at 02:02:57PM +0800, Yan Zhao wrote: > > > On Fri, Jan 05, 2024 at 03:55:51PM -0400, Jason Gunthorpe wrote: > > > > On Fri, Jan 05, 2024 at 05:12:37PM +0800, Yan Zhao wrote: > > > > > This series allow user space to notify KVM of noncoherent DMA status so as > > > > > to let KVM honor guest memory types in specified memory slot ranges. > > > > > > > > > > Motivation > > > > > === > > > > > A virtio GPU device may want to configure GPU hardware to work in > > > > > noncoherent mode, i.e. some of its DMAs do not snoop CPU caches. > > > > > > > > Does this mean some DMA reads do not snoop the caches or does it > > > > include DMA writes not synchronizing the caches too? > > > Both DMA reads and writes are not snooped. > > > > Oh that sounds really dangerous. > > So if this is an issue then we might already have a problem, because with > many devices it's entirely up to the device programming whether the i/o is > snooping or not. So the moment you pass such a device to a guest, whether > there's explicit support for non-coherent or not, you have a > problem. No, the iommus (except Intel and only for Intel integrated GPU, IIRC) prohibit the use of non-coherent DMA entirely from a VM. Eg AMD systems 100% block non-coherent DMA in VMs at the iommu level. > _If_ there is a fundamental problem. I'm not sure of that, because my > assumption was that at most the guest shoots itself and the data > corruption doesn't go any further the moment the hypervisor does the > dma/iommu unmapping. Who fixes the cache on the unmapping? I didn't see anything.. Jason
On Mon, Jan 08, 2024 at 10:02:50AM -0400, Jason Gunthorpe wrote: > On Mon, Jan 08, 2024 at 02:02:57PM +0800, Yan Zhao wrote: > > On Fri, Jan 05, 2024 at 03:55:51PM -0400, Jason Gunthorpe wrote: > > > On Fri, Jan 05, 2024 at 05:12:37PM +0800, Yan Zhao wrote: > > > > This series allow user space to notify KVM of noncoherent DMA status so as > > > > to let KVM honor guest memory types in specified memory slot ranges. > > > > > > > > Motivation > > > > === > > > > A virtio GPU device may want to configure GPU hardware to work in > > > > noncoherent mode, i.e. some of its DMAs do not snoop CPU caches. > > > > > > Does this mean some DMA reads do not snoop the caches or does it > > > include DMA writes not synchronizing the caches too? > > Both DMA reads and writes are not snooped. > > Oh that sounds really dangerous. > But the IOMMU for Intel GPU does not do force-snoop, no matter KVM honors guest memory type or not. > > > > This is generally for performance consideration. > > > > In certain platform, GFX performance can improve 20+% with DMAs going to > > > > noncoherent path. > > > > > > > > This noncoherent DMA mode works in below sequence: > > > > 1. Host backend driver programs hardware not to snoop memory of target > > > > DMA buffer. > > > > 2. Host backend driver indicates guest frontend driver to program guest PAT > > > > to WC for target DMA buffer. > > > > 3. Guest frontend driver writes to the DMA buffer without clflush stuffs. > > > > 4. Hardware does noncoherent DMA to the target buffer. > > > > > > > > In this noncoherent DMA mode, both guest and hardware regard a DMA buffer > > > > as not cached. So, if KVM forces the effective memory type of this DMA > > > > buffer to be WB, hardware DMA may read incorrect data and cause misc > > > > failures. > > > > > > I don't know all the details, but a big concern would be that the > > > caches remain fully coherent with the underlying memory at any point > > > where kvm decides to revoke the page from the VM. > > Ah, you mean, for page migration, the content of the page may not be copied > > correctly, right? > > Not just migration. Any point where KVM revokes the page from the > VM. Ie just tearing down the VM still has to make the cache coherent > with physical or there may be problems. Not sure what's the mentioned problem during KVM revoking. In host, - If the memory type is WB, as the case in intel GPU passthrough, the mismatch can only happen when guest memory type is UC/WC/WT/WP, all stronger than WB. So, even after KVM revoking the page, the host will not get delayed data from cache. - If the memory type is WC, as the case in virtio GPU, after KVM revoking the page, the page is still hold in the virtio host side. Even though a incooperative guest can cause wrong data in the page, the guest can achieve the purpose in a more straight-forward way, i.e. writing a wrong data directly to the page. So, I don't see the problem in this case too. > > > Currently in x86, we have 2 ways to let KVM honor guest memory types: > > 1. through KVM memslot flag introduced in this series, for virtio GPUs, in > > memslot granularity. > > 2. through increasing noncoherent dma count, as what's done in VFIO, for > > Intel GPU passthrough, for all guest memory. > > And where does all this fixup the coherency problem? > > > This page migration issue should not be the case for virtio GPU, as both host > > and guest are synced to use the same memory type and actually the pages > > are not anonymous pages. > > The guest isn't required to do this so it can force the cache to > become incoherent. > > > > If you allow an incoherence of cache != physical then it opens a > > > security attack where the observed content of memory can change when > > > it should not. > > > > In this case, will this security attack impact other guests? > > It impacts the hypervisor potentially. It depends.. Could you elaborate more on how it will impact hypervisor? We can try to fix it if it's really a case.
On Tue, Jan 09, 2024 at 07:36:22AM +0800, Yan Zhao wrote: > On Mon, Jan 08, 2024 at 10:02:50AM -0400, Jason Gunthorpe wrote: > > On Mon, Jan 08, 2024 at 02:02:57PM +0800, Yan Zhao wrote: > > > On Fri, Jan 05, 2024 at 03:55:51PM -0400, Jason Gunthorpe wrote: > > > > On Fri, Jan 05, 2024 at 05:12:37PM +0800, Yan Zhao wrote: > > > > > This series allow user space to notify KVM of noncoherent DMA status so as > > > > > to let KVM honor guest memory types in specified memory slot ranges. > > > > > > > > > > Motivation > > > > > === > > > > > A virtio GPU device may want to configure GPU hardware to work in > > > > > noncoherent mode, i.e. some of its DMAs do not snoop CPU caches. > > > > > > > > Does this mean some DMA reads do not snoop the caches or does it > > > > include DMA writes not synchronizing the caches too? > > > Both DMA reads and writes are not snooped. > > > > Oh that sounds really dangerous. > > > But the IOMMU for Intel GPU does not do force-snoop, no matter KVM > honors guest memory type or not. Yes, I know. Sounds dangerous! > > Not just migration. Any point where KVM revokes the page from the > > VM. Ie just tearing down the VM still has to make the cache coherent > > with physical or there may be problems. > Not sure what's the mentioned problem during KVM revoking. > In host, > - If the memory type is WB, as the case in intel GPU passthrough, > the mismatch can only happen when guest memory type is UC/WC/WT/WP, all > stronger than WB. > So, even after KVM revoking the page, the host will not get delayed > data from cache. > - If the memory type is WC, as the case in virtio GPU, after KVM revoking > the page, the page is still hold in the virtio host side. > Even though a incooperative guest can cause wrong data in the page, > the guest can achieve the purpose in a more straight-forward way, i.e. > writing a wrong data directly to the page. > So, I don't see the problem in this case too. You can't let cache incoherent memory leak back into the hypervisor for other uses or who knows what can happen. In many cases something will zero the page and you can probably reliably argue that will make the cache coherent, but there are still all sorts of cases where pages are write protected and then used in the hypervisor context. Eg page out or something where the incoherence is a big problem. eg RAID parity and mirror calculations become at-rist of malfunction. Storage CRCs stop working reliably, etc, etc. It is certainly a big enough problem that a generic KVM switch to allow incoherence should be trated with alot of skepticism. You can't argue that the only use of the generic switch will be with GPUs that exclude all the troublesome cases! > > > In this case, will this security attack impact other guests? > > > > It impacts the hypervisor potentially. It depends.. > Could you elaborate more on how it will impact hypervisor? > We can try to fix it if it's really a case. Well, for instance, when you install pages into the KVM the hypervisor will have taken kernel memory, then zero'd it with cachable writes, however the VM can read it incoherently with DMA and access the pre-zero'd data since the zero'd writes potentially hasn't left the cache. That is an information leakage exploit. Who knows what else you can get up to if you are creative. The whole security model assumes there is only one view of memory, not two. Jason
On Mon, Jan 08, 2024 at 08:22:20PM -0400, Jason Gunthorpe wrote: > On Tue, Jan 09, 2024 at 07:36:22AM +0800, Yan Zhao wrote: > > On Mon, Jan 08, 2024 at 10:02:50AM -0400, Jason Gunthorpe wrote: > > > On Mon, Jan 08, 2024 at 02:02:57PM +0800, Yan Zhao wrote: > > > > On Fri, Jan 05, 2024 at 03:55:51PM -0400, Jason Gunthorpe wrote: > > > > > On Fri, Jan 05, 2024 at 05:12:37PM +0800, Yan Zhao wrote: > > > > > > This series allow user space to notify KVM of noncoherent DMA status so as > > > > > > to let KVM honor guest memory types in specified memory slot ranges. > > > > > > > > > > > > Motivation > > > > > > === > > > > > > A virtio GPU device may want to configure GPU hardware to work in > > > > > > noncoherent mode, i.e. some of its DMAs do not snoop CPU caches. > > > > > > > > > > Does this mean some DMA reads do not snoop the caches or does it > > > > > include DMA writes not synchronizing the caches too? > > > > Both DMA reads and writes are not snooped. > > > > > > Oh that sounds really dangerous. > > > > > But the IOMMU for Intel GPU does not do force-snoop, no matter KVM > > honors guest memory type or not. > > Yes, I know. Sounds dangerous! > > > > Not just migration. Any point where KVM revokes the page from the > > > VM. Ie just tearing down the VM still has to make the cache coherent > > > with physical or there may be problems. > > Not sure what's the mentioned problem during KVM revoking. > > In host, > > - If the memory type is WB, as the case in intel GPU passthrough, > > the mismatch can only happen when guest memory type is UC/WC/WT/WP, all > > stronger than WB. > > So, even after KVM revoking the page, the host will not get delayed > > data from cache. > > - If the memory type is WC, as the case in virtio GPU, after KVM revoking > > the page, the page is still hold in the virtio host side. > > Even though a incooperative guest can cause wrong data in the page, > > the guest can achieve the purpose in a more straight-forward way, i.e. > > writing a wrong data directly to the page. > > So, I don't see the problem in this case too. > > You can't let cache incoherent memory leak back into the hypervisor > for other uses or who knows what can happen. In many cases something > will zero the page and you can probably reliably argue that will make > the cache coherent, but there are still all sorts of cases where pages > are write protected and then used in the hypervisor context. Eg page > out or something where the incoherence is a big problem. > > eg RAID parity and mirror calculations become at-rist of > malfunction. Storage CRCs stop working reliably, etc, etc. > > It is certainly a big enough problem that a generic KVM switch to > allow incoherence should be trated with alot of skepticism. You can't > argue that the only use of the generic switch will be with GPUs that > exclude all the troublesome cases! > You are right. It's more safe with only one view of memory. But even something will zero the page, if it happens before returning the page to host, looks the impact is constrained in VM scope? e.g. for the write protected page, hypervisor cannot rely on the page content is correct or expected. For virtio GPU's use case, do you think a better way for KVM is to pull the memory type from host page table in the specified memslot? But for noncoherent DMA device passthrough, we can't pull host memory type, because we rely on guest device driver to do cache flush properly, and if the guest device driver thinks a memory is uncached while it's effectively cached, the device cannot work properly. > > > > In this case, will this security attack impact other guests? > > > > > > It impacts the hypervisor potentially. It depends.. > > Could you elaborate more on how it will impact hypervisor? > > We can try to fix it if it's really a case. > > Well, for instance, when you install pages into the KVM the hypervisor > will have taken kernel memory, then zero'd it with cachable writes, > however the VM can read it incoherently with DMA and access the > pre-zero'd data since the zero'd writes potentially hasn't left the > cache. That is an information leakage exploit. This makes sense. How about KVM doing cache flush before installing/revoking the page if guest memory type is honored? > Who knows what else you can get up to if you are creative. The whole > security model assumes there is only one view of memory, not two. >
On Tue, Jan 09, 2024 at 10:11:23AM +0800, Yan Zhao wrote: > > Well, for instance, when you install pages into the KVM the hypervisor > > will have taken kernel memory, then zero'd it with cachable writes, > > however the VM can read it incoherently with DMA and access the > > pre-zero'd data since the zero'd writes potentially hasn't left the > > cache. That is an information leakage exploit. > > This makes sense. > How about KVM doing cache flush before installing/revoking the > page if guest memory type is honored? I think if you are going to allow the guest to bypass the cache in any way then KVM should fully flush the cache before allowing the guest to access memory and it should fully flush the cache after removing memory from the guest. Noting that fully removing the memory now includes VFIO too, which is going to be very hard to co-ordinate between KVM and VFIO. ARM has the hooks for most of this in the common code already, so it should not be outrageous to do, but slow I suspect. Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Tuesday, January 16, 2024 12:31 AM > > On Tue, Jan 09, 2024 at 10:11:23AM +0800, Yan Zhao wrote: > > > > Well, for instance, when you install pages into the KVM the hypervisor > > > will have taken kernel memory, then zero'd it with cachable writes, > > > however the VM can read it incoherently with DMA and access the > > > pre-zero'd data since the zero'd writes potentially hasn't left the > > > cache. That is an information leakage exploit. > > > > This makes sense. > > How about KVM doing cache flush before installing/revoking the > > page if guest memory type is honored? > > I think if you are going to allow the guest to bypass the cache in any > way then KVM should fully flush the cache before allowing the guest to > access memory and it should fully flush the cache after removing > memory from the guest. For GPU passthrough can we rely on the fact that the entire guest memory is pinned so the only occurrence of removing memory is when killing the guest then the pages will be zero-ed by mm before next use? then we just need to flush the cache before the 1st guest run to avoid information leak. yes it's a more complex issue if allowing guest to bypass cache in a configuration mixing host mm activities on guest pages at run-time. > > Noting that fully removing the memory now includes VFIO too, which is > going to be very hard to co-ordinate between KVM and VFIO. if only talking about GPU passthrough do we still need such coordination? > > ARM has the hooks for most of this in the common code already, so it > should not be outrageous to do, but slow I suspect. > > Jason
> From: Tian, Kevin > Sent: Tuesday, January 16, 2024 8:46 AM > > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Tuesday, January 16, 2024 12:31 AM > > > > On Tue, Jan 09, 2024 at 10:11:23AM +0800, Yan Zhao wrote: > > > > > > Well, for instance, when you install pages into the KVM the hypervisor > > > > will have taken kernel memory, then zero'd it with cachable writes, > > > > however the VM can read it incoherently with DMA and access the > > > > pre-zero'd data since the zero'd writes potentially hasn't left the > > > > cache. That is an information leakage exploit. > > > > > > This makes sense. > > > How about KVM doing cache flush before installing/revoking the > > > page if guest memory type is honored? > > > > I think if you are going to allow the guest to bypass the cache in any > > way then KVM should fully flush the cache before allowing the guest to > > access memory and it should fully flush the cache after removing > > memory from the guest. > > For GPU passthrough can we rely on the fact that the entire guest memory > is pinned so the only occurrence of removing memory is when killing the > guest then the pages will be zero-ed by mm before next use? then we > just need to flush the cache before the 1st guest run to avoid information > leak. Just checked your past comments. If there is no guarantee that the removed pages will be zero-ed before next use then yes cache has to be flushed after the page is removed from the guest. :/ > > yes it's a more complex issue if allowing guest to bypass cache in a > configuration mixing host mm activities on guest pages at run-time. > > > > > Noting that fully removing the memory now includes VFIO too, which is > > going to be very hard to co-ordinate between KVM and VFIO. > Probably we could just handle cache flush in IOMMUFD or VFIO type1 map/unmap which is the gate of allowing/denying non-coherent DMAs to specific pages.
On Tue, Jan 16, 2024 at 04:05:08AM +0000, Tian, Kevin wrote: > > From: Tian, Kevin > > Sent: Tuesday, January 16, 2024 8:46 AM > > > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > Sent: Tuesday, January 16, 2024 12:31 AM > > > > > > On Tue, Jan 09, 2024 at 10:11:23AM +0800, Yan Zhao wrote: > > > > > > > > Well, for instance, when you install pages into the KVM the hypervisor > > > > > will have taken kernel memory, then zero'd it with cachable writes, > > > > > however the VM can read it incoherently with DMA and access the > > > > > pre-zero'd data since the zero'd writes potentially hasn't left the > > > > > cache. That is an information leakage exploit. > > > > > > > > This makes sense. > > > > How about KVM doing cache flush before installing/revoking the > > > > page if guest memory type is honored? > > > > > > I think if you are going to allow the guest to bypass the cache in any > > > way then KVM should fully flush the cache before allowing the guest to > > > access memory and it should fully flush the cache after removing > > > memory from the guest. > > > > For GPU passthrough can we rely on the fact that the entire guest memory > > is pinned so the only occurrence of removing memory is when killing the > > guest then the pages will be zero-ed by mm before next use? then we > > just need to flush the cache before the 1st guest run to avoid information > > leak. > > Just checked your past comments. If there is no guarantee that the removed > pages will be zero-ed before next use then yes cache has to be flushed > after the page is removed from the guest. :/ Next use may include things like swap to disk or live migrate the VM. So it isn't quite so simple in the general case. > > > Noting that fully removing the memory now includes VFIO too, which is > > > going to be very hard to co-ordinate between KVM and VFIO. > > Probably we could just handle cache flush in IOMMUFD or VFIO type1 > map/unmap which is the gate of allowing/denying non-coherent DMAs > to specific pages. Maybe, and on live migrate dma stop.. Jason