Message ID | 20190109144736.17452-1-pagupta@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | kvm "virtio pmem" device | expand |
On Wed, Jan 09, 2019 at 08:17:31PM +0530, Pankaj Gupta wrote: > This patch series has implementation for "virtio pmem". > "virtio pmem" is fake persistent memory(nvdimm) in guest > which allows to bypass the guest page cache. This also > implements a VIRTIO based asynchronous flush mechanism. Hmmmm. Sharing the host page cache direct into the guest VM. Sounds like a good idea, but..... This means the guest VM can now run timing attacks to observe host side page cache residency, and depending on the implementation I'm guessing that the guest will be able to control host side page cache eviction, too (e.g. via discard or hole punch operations). Which means this functionality looks to me like a new vector for information leakage into and out of the guest VM via guest controlled host page cache manipulation. https://arxiv.org/pdf/1901.01161 I might be wrong, but if I'm not we're going to have to be very careful about how guest VMs can access and manipulate host side resources like the page cache..... Cheers, Dave.
On Thu, 2019-01-10 at 12:26 +1100, Dave Chinner wrote: > On Wed, Jan 09, 2019 at 08:17:31PM +0530, Pankaj Gupta wrote: > > This patch series has implementation for "virtio pmem". > > "virtio pmem" is fake persistent memory(nvdimm) in guest > > which allows to bypass the guest page cache. This also > > implements a VIRTIO based asynchronous flush mechanism. > > Hmmmm. Sharing the host page cache direct into the guest VM. Sounds > like a good idea, but..... > > This means the guest VM can now run timing attacks to observe host > side page cache residency, and depending on the implementation I'm > guessing that the guest will be able to control host side page > cache eviction, too (e.g. via discard or hole punch operations). Even if the guest can only access its own disk image, using its own page cache pages, Pankaj's patches could still be a win because they allow the host kernel to easily evict page cache pages without doing IO. Evicting page cache pages that live inside a guest involves something like swapping, ballooning, or some sort of callback into the guest to figure out whether a page is clean. Evicting page cache pages that live in the host, OTOH...
On Thu 10-01-19 12:26:17, Dave Chinner wrote: > On Wed, Jan 09, 2019 at 08:17:31PM +0530, Pankaj Gupta wrote: > > This patch series has implementation for "virtio pmem". > > "virtio pmem" is fake persistent memory(nvdimm) in guest > > which allows to bypass the guest page cache. This also > > implements a VIRTIO based asynchronous flush mechanism. > > Hmmmm. Sharing the host page cache direct into the guest VM. Sounds > like a good idea, but..... > > This means the guest VM can now run timing attacks to observe host > side page cache residency, and depending on the implementation I'm > guessing that the guest will be able to control host side page > cache eviction, too (e.g. via discard or hole punch operations). > > Which means this functionality looks to me like a new vector for > information leakage into and out of the guest VM via guest > controlled host page cache manipulation. > > https://arxiv.org/pdf/1901.01161 > > I might be wrong, but if I'm not we're going to have to be very > careful about how guest VMs can access and manipulate host side > resources like the page cache..... Right. Thinking about this I would be more concerned about the fact that guest can effectively pin amount of host's page cache upto size of the device/file passed to guest as PMEM, can't it Pankaj? Or is there some QEMU magic that avoids this? Honza
> > On Wed, Jan 09, 2019 at 08:17:31PM +0530, Pankaj Gupta wrote: > > This patch series has implementation for "virtio pmem". > > "virtio pmem" is fake persistent memory(nvdimm) in guest > > which allows to bypass the guest page cache. This also > > implements a VIRTIO based asynchronous flush mechanism. > > Hmmmm. Sharing the host page cache direct into the guest VM. Sounds > like a good idea, but..... > > This means the guest VM can now run timing attacks to observe host > side page cache residency, and depending on the implementation I'm > guessing that the guest will be able to control host side page > cache eviction, too (e.g. via discard or hole punch operations). Not sure how? this is similar to mmapping virtual memory by any userspace process. Any host userspace process can do such attack on host page cache using mincore & mmap shared file. But i don't think guest can do this alone. For virtio-pmem usecase guest won't be using page cache so timing attack from only guest side is not possible unless host userspace can run checks on page cache eviction state using mincore etc. As rightly described by Rik, guest will only access its own page cache pages and if guest page cache is managed directly by host, this saves alot of effort for guest in transferring guest state of page cache. > > Which means this functionality looks to me like a new vector for > information leakage into and out of the guest VM via guest > controlled host page cache manipulation. > > https://arxiv.org/pdf/1901.01161 > > I might be wrong, but if I'm not we're going to have to be very > careful about how guest VMs can access and manipulate host side > resources like the page cache..... If I am following correctly the discussions in MM thread. Important steps to mitigate this: * Avoid running mincore in privilege mode: to safeguard page evict state of any page cache page. * tweaking RWF_NOWAIT I think if we secure ways to find current state(cached/evicted) of a page in host, we should be able to mitigate the impact for any page cache page access attack including virtio-pmem. Thanks, Pankaj
> > On Thu 10-01-19 12:26:17, Dave Chinner wrote: > > On Wed, Jan 09, 2019 at 08:17:31PM +0530, Pankaj Gupta wrote: > > > This patch series has implementation for "virtio pmem". > > > "virtio pmem" is fake persistent memory(nvdimm) in guest > > > which allows to bypass the guest page cache. This also > > > implements a VIRTIO based asynchronous flush mechanism. > > > > Hmmmm. Sharing the host page cache direct into the guest VM. Sounds > > like a good idea, but..... > > > > This means the guest VM can now run timing attacks to observe host > > side page cache residency, and depending on the implementation I'm > > guessing that the guest will be able to control host side page > > cache eviction, too (e.g. via discard or hole punch operations). > > > > Which means this functionality looks to me like a new vector for > > information leakage into and out of the guest VM via guest > > controlled host page cache manipulation. > > > > https://arxiv.org/pdf/1901.01161 > > > > I might be wrong, but if I'm not we're going to have to be very > > careful about how guest VMs can access and manipulate host side > > resources like the page cache..... > > Right. Thinking about this I would be more concerned about the fact that > guest can effectively pin amount of host's page cache upto size of the > device/file passed to guest as PMEM, can't it Pankaj? Or is there some QEMU > magic that avoids this? Yes, guest will pin these host page cache pages using 'get_user_pages' by elevating the page reference count. But these pages can be reclaimed by host at any time when there is memory pressure. KVM does not permanently pin pages. vfio does that but we are not using it here. Could you please elaborate what you are thinking? Thanks, Pankaj
On Sat, Jan 12, 2019 at 5:38 PM Pankaj Gupta <pagupta@redhat.com> wrote: > > > > > > > On Thu 10-01-19 12:26:17, Dave Chinner wrote: > > > On Wed, Jan 09, 2019 at 08:17:31PM +0530, Pankaj Gupta wrote: > > > > This patch series has implementation for "virtio pmem". > > > > "virtio pmem" is fake persistent memory(nvdimm) in guest > > > > which allows to bypass the guest page cache. This also > > > > implements a VIRTIO based asynchronous flush mechanism. > > > > > > Hmmmm. Sharing the host page cache direct into the guest VM. Sounds > > > like a good idea, but..... > > > > > > This means the guest VM can now run timing attacks to observe host > > > side page cache residency, and depending on the implementation I'm > > > guessing that the guest will be able to control host side page > > > cache eviction, too (e.g. via discard or hole punch operations). > > > > > > Which means this functionality looks to me like a new vector for > > > information leakage into and out of the guest VM via guest > > > controlled host page cache manipulation. > > > > > > https://arxiv.org/pdf/1901.01161 > > > > > > I might be wrong, but if I'm not we're going to have to be very > > > careful about how guest VMs can access and manipulate host side > > > resources like the page cache..... > > > > Right. Thinking about this I would be more concerned about the fact that > > guest can effectively pin amount of host's page cache upto size of the > > device/file passed to guest as PMEM, can't it Pankaj? Or is there some QEMU > > magic that avoids this? > > Yes, guest will pin these host page cache pages using 'get_user_pages' by > elevating the page reference count. But these pages can be reclaimed by host > at any time when there is memory pressure. Wait, how can the guest pin the host pages? I would expect this to happen only when using vfio and device assignment. Otherwise, no the host can't reclaim a pinned page, that's the whole point of a pin to prevent the mm from reclaiming ownership. > KVM does not permanently pin pages. vfio does that but we are not using > it here. Right, so I'm confused by your pin assertion above.
> > > > > > > > > > > > On Thu 10-01-19 12:26:17, Dave Chinner wrote: > > > > On Wed, Jan 09, 2019 at 08:17:31PM +0530, Pankaj Gupta wrote: > > > > > This patch series has implementation for "virtio pmem". > > > > > "virtio pmem" is fake persistent memory(nvdimm) in guest > > > > > which allows to bypass the guest page cache. This also > > > > > implements a VIRTIO based asynchronous flush mechanism. > > > > > > > > Hmmmm. Sharing the host page cache direct into the guest VM. Sounds > > > > like a good idea, but..... > > > > > > > > This means the guest VM can now run timing attacks to observe host > > > > side page cache residency, and depending on the implementation I'm > > > > guessing that the guest will be able to control host side page > > > > cache eviction, too (e.g. via discard or hole punch operations). > > > > > > > > Which means this functionality looks to me like a new vector for > > > > information leakage into and out of the guest VM via guest > > > > controlled host page cache manipulation. > > > > > > > > https://arxiv.org/pdf/1901.01161 > > > > > > > > I might be wrong, but if I'm not we're going to have to be very > > > > careful about how guest VMs can access and manipulate host side > > > > resources like the page cache..... > > > > > > Right. Thinking about this I would be more concerned about the fact that > > > guest can effectively pin amount of host's page cache upto size of the > > > device/file passed to guest as PMEM, can't it Pankaj? Or is there some > > > QEMU > > > magic that avoids this? > > > > Yes, guest will pin these host page cache pages using 'get_user_pages' by > > elevating the page reference count. But these pages can be reclaimed by > > host > > at any time when there is memory pressure. > > Wait, how can the guest pin the host pages? I would expect this to > happen only when using vfio and device assignment. Otherwise, no the > host can't reclaim a pinned page, that's the whole point of a pin to > prevent the mm from reclaiming ownership. yes. You are right I just used the pin word but it does not actually pin pages permanently. I had gone through the discussion on existing problems with get_user_pages and DMA e.g [1] to understand Jan's POV. It does mention GUP pin pages so I also used the word 'pin'. But guest does not permanently pin these pages and these pages can be reclaimed by host. > > > KVM does not permanently pin pages. vfio does that but we are not using > > it here. > > Right, so I'm confused by your pin assertion above. Sorry! for the confusion. [1] https://lwn.net/Articles/753027/ Thanks, Pankaj
On Fri, Jan 11, 2019 at 02:45:04AM -0500, Pankaj Gupta wrote: > > > > > On Wed, Jan 09, 2019 at 08:17:31PM +0530, Pankaj Gupta wrote: > > > This patch series has implementation for "virtio pmem". > > > "virtio pmem" is fake persistent memory(nvdimm) in guest > > > which allows to bypass the guest page cache. This also > > > implements a VIRTIO based asynchronous flush mechanism. > > > > Hmmmm. Sharing the host page cache direct into the guest VM. Sounds > > like a good idea, but..... > > > > This means the guest VM can now run timing attacks to observe host > > side page cache residency, and depending on the implementation I'm > > guessing that the guest will be able to control host side page > > cache eviction, too (e.g. via discard or hole punch operations). > > Not sure how? this is similar to mmapping virtual memory by any userspace > process. Any host userspace process can do such attack on host page cache > using mincore & mmap shared file. Mincore is for monitoring, not cached eviction. And it's not required to observe cache residency, either. That's a wide open field containing an uncountable number of moles... > But i don't think guest can do this alone. For virtio-pmem usecase > guest won't be using page cache so timing attack from only guest > side is not possible unless host userspace can run checks on page > cache eviction state using mincore etc. As rightly described by > Rik, guest will only access its own page cache pages and if guest > page cache is managed directly by host, this saves alot of effort > for guest in transferring guest state of page cache. Until you have images (and hence host page cache) shared between multiple guests. People will want to do this, because it means they only need a single set of pages in host memory for executable binaries rather than a set of pages per guest. Then you have multiple guests being able to detect residency of the same set of pages. If the guests can then, in any way, control eviction of the pages from the host cache, then we have a guest-to-guest information leak channel. i.e. it's something we need to be aware of and really careful about enabling infrastructure that /will/ be abused if guests can find a way to influence the host side cache residency. Cheers, Dave.
On Mon, Jan 14, 2019 at 10:29:02AM +1100, Dave Chinner wrote: > Until you have images (and hence host page cache) shared between > multiple guests. People will want to do this, because it means they > only need a single set of pages in host memory for executable > binaries rather than a set of pages per guest. Then you have > multiple guests being able to detect residency of the same set of > pages. If the guests can then, in any way, control eviction of the > pages from the host cache, then we have a guest-to-guest information > leak channel. I don't think we should ever be considering something that would allow a guest to evict page's from the host's pagecache [1]. The guest should be able to kick its own references to the host's pagecache out of its own pagecache, but not be able to influence whether the host or another guest has a read-only mapping cached. [1] Unless the guest is allowed to modify the host's file; obviously truncation, holepunching, etc are going to evict pages from the host's page cache.
On Sun, Jan 13, 2019 at 03:38:21PM -0800, Matthew Wilcox wrote: > On Mon, Jan 14, 2019 at 10:29:02AM +1100, Dave Chinner wrote: > > Until you have images (and hence host page cache) shared between > > multiple guests. People will want to do this, because it means they > > only need a single set of pages in host memory for executable > > binaries rather than a set of pages per guest. Then you have > > multiple guests being able to detect residency of the same set of > > pages. If the guests can then, in any way, control eviction of the > > pages from the host cache, then we have a guest-to-guest information > > leak channel. > > I don't think we should ever be considering something that would allow a > guest to evict page's from the host's pagecache [1]. The guest should > be able to kick its own references to the host's pagecache out of its > own pagecache, but not be able to influence whether the host or another > guest has a read-only mapping cached. > > [1] Unless the guest is allowed to modify the host's file; obviously > truncation, holepunching, etc are going to evict pages from the host's > page cache. Right, and that's exactly what I mean by "we need to be real careful with functionality like this". To be honest, I really don't think I've even touched the surface here. e.g. Filesystems and storage can share logical and physical extents. Which means that image files that share storage (e.g. because they are all cloned from the same master image and/or there's in-line deduplication running on the storage) and can be directly accessed by guests may very well be susceptible to detection of host side deduplication and subsequent copy-on-write operations. This really doesn't seem much different to me from the guest being able to infer host side KSM page deduplication and COW operation in the guest side page cache. The only difference is that DAX is being used to probe the host side page cache and storage rather than the guest side. IOWs, I suspect there's a world of pain waiting for us if we punch huge holes through the virtual machine abstractions like this. Improving performance is a laudible goal, but at what price? Cheers, Dave.
> > Until you have images (and hence host page cache) shared between > > multiple guests. People will want to do this, because it means they > > only need a single set of pages in host memory for executable > > binaries rather than a set of pages per guest. Then you have > > multiple guests being able to detect residency of the same set of > > pages. If the guests can then, in any way, control eviction of the > > pages from the host cache, then we have a guest-to-guest information > > leak channel. > > I don't think we should ever be considering something that would allow a > guest to evict page's from the host's pagecache [1]. The guest should > be able to kick its own references to the host's pagecache out of its > own pagecache, but not be able to influence whether the host or another > guest has a read-only mapping cached. > > [1] Unless the guest is allowed to modify the host's file; obviously > truncation, holepunching, etc are going to evict pages from the host's > page cache. This is so correct. Guest does not not evict host page cache pages directly. In case of virtio-pmem & DAX, guest clears guest page cache exceptional entries. Its solely decision of host to take action on the host page cache pages. In case of virtio-pmem, guest does not modify host file directly i.e don't perform hole punch & truncation operation directly on host file. Thanks, Pankaj
On Sat 12-01-19 21:17:46, Pankaj Gupta wrote: > > > > Right. Thinking about this I would be more concerned about the fact that > > > > guest can effectively pin amount of host's page cache upto size of the > > > > device/file passed to guest as PMEM, can't it Pankaj? Or is there some > > > > QEMU > > > > magic that avoids this? > > > > > > Yes, guest will pin these host page cache pages using 'get_user_pages' by > > > elevating the page reference count. But these pages can be reclaimed by > > > host > > > at any time when there is memory pressure. > > > > Wait, how can the guest pin the host pages? I would expect this to > > happen only when using vfio and device assignment. Otherwise, no the > > host can't reclaim a pinned page, that's the whole point of a pin to > > prevent the mm from reclaiming ownership. > > yes. You are right I just used the pin word but it does not actually pin pages > permanently. I had gone through the discussion on existing problems with > get_user_pages and DMA e.g [1] to understand Jan's POV. It does mention GUP > pin pages so I also used the word 'pin'. But guest does not permanently pin > these pages and these pages can be reclaimed by host. OK, then I was just confused how virtio-pmem is going to work. Thanks for explanation! So can I imagine this as guest mmaping the host file and providing the mapped range as "NVDIMM pages" to the kernel inside the guest? Or is it more complex? Honza
> > > > > Right. Thinking about this I would be more concerned about the fact > > > > > that > > > > > guest can effectively pin amount of host's page cache upto size of > > > > > the > > > > > device/file passed to guest as PMEM, can't it Pankaj? Or is there > > > > > some > > > > > QEMU > > > > > magic that avoids this? > > > > > > > > Yes, guest will pin these host page cache pages using 'get_user_pages' > > > > by > > > > elevating the page reference count. But these pages can be reclaimed by > > > > host > > > > at any time when there is memory pressure. > > > > > > Wait, how can the guest pin the host pages? I would expect this to > > > happen only when using vfio and device assignment. Otherwise, no the > > > host can't reclaim a pinned page, that's the whole point of a pin to > > > prevent the mm from reclaiming ownership. > > > > yes. You are right I just used the pin word but it does not actually pin > > pages > > permanently. I had gone through the discussion on existing problems with > > get_user_pages and DMA e.g [1] to understand Jan's POV. It does mention GUP > > pin pages so I also used the word 'pin'. But guest does not permanently pin > > these pages and these pages can be reclaimed by host. > > OK, then I was just confused how virtio-pmem is going to work. Thanks for > explanation! So can I imagine this as guest mmaping the host file and > providing the mapped range as "NVDIMM pages" to the kernel inside the > guest? Or is it more complex? yes, that's correct. Host's Qemu process virtual address range is used as guest physical address and a direct mapping(EPT/NPT) is established. At guest side, this physical memory range is plugged into guest system memory map and DAX mapping is setup using nvdimm calls. Thanks, Pankaj
On Mon, Jan 14, 2019 at 02:15:40AM -0500, Pankaj Gupta wrote: > > > > Until you have images (and hence host page cache) shared between > > > multiple guests. People will want to do this, because it means they > > > only need a single set of pages in host memory for executable > > > binaries rather than a set of pages per guest. Then you have > > > multiple guests being able to detect residency of the same set of > > > pages. If the guests can then, in any way, control eviction of the > > > pages from the host cache, then we have a guest-to-guest information > > > leak channel. > > > > I don't think we should ever be considering something that would allow a > > guest to evict page's from the host's pagecache [1]. The guest should > > be able to kick its own references to the host's pagecache out of its > > own pagecache, but not be able to influence whether the host or another > > guest has a read-only mapping cached. > > > > [1] Unless the guest is allowed to modify the host's file; obviously > > truncation, holepunching, etc are going to evict pages from the host's > > page cache. > > This is so correct. Guest does not not evict host page cache pages directly. They don't right now. But someone is going to end up asking for discard to work so that the guest can free unused space in the underlying spares image (i.e. make use of fstrim or mount -o discard) because they have workloads that have bursts of space usage and they need to trim the image files afterwards to keep their overall space usage under control. And then.... > In case of virtio-pmem & DAX, guest clears guest page cache exceptional entries. > Its solely decision of host to take action on the host page cache pages. > > In case of virtio-pmem, guest does not modify host file directly i.e don't > perform hole punch & truncation operation directly on host file. ... this will no longer be true, and the nuclear landmine in this driver interface will have been armed.... Cheers, Dave.
On Mon, Jan 14, 2019 at 1:25 PM Dave Chinner <david@fromorbit.com> wrote: > > On Mon, Jan 14, 2019 at 02:15:40AM -0500, Pankaj Gupta wrote: > > > > > > Until you have images (and hence host page cache) shared between > > > > multiple guests. People will want to do this, because it means they > > > > only need a single set of pages in host memory for executable > > > > binaries rather than a set of pages per guest. Then you have > > > > multiple guests being able to detect residency of the same set of > > > > pages. If the guests can then, in any way, control eviction of the > > > > pages from the host cache, then we have a guest-to-guest information > > > > leak channel. > > > > > > I don't think we should ever be considering something that would allow a > > > guest to evict page's from the host's pagecache [1]. The guest should > > > be able to kick its own references to the host's pagecache out of its > > > own pagecache, but not be able to influence whether the host or another > > > guest has a read-only mapping cached. > > > > > > [1] Unless the guest is allowed to modify the host's file; obviously > > > truncation, holepunching, etc are going to evict pages from the host's > > > page cache. > > > > This is so correct. Guest does not not evict host page cache pages directly. > > They don't right now. > > But someone is going to end up asking for discard to work so that > the guest can free unused space in the underlying spares image (i.e. > make use of fstrim or mount -o discard) because they have workloads > that have bursts of space usage and they need to trim the image > files afterwards to keep their overall space usage under control. > > And then.... ...we reject / push back on that patch citing the above concern. > > In case of virtio-pmem & DAX, guest clears guest page cache exceptional entries. > > Its solely decision of host to take action on the host page cache pages. > > > > In case of virtio-pmem, guest does not modify host file directly i.e don't > > perform hole punch & truncation operation directly on host file. > > ... this will no longer be true, and the nuclear landmine in this > driver interface will have been armed.... I agree with the need to be careful when / if explicit cache control is added, but that's not the case today.
On Mon, Jan 14, 2019 at 01:35:57PM -0800, Dan Williams wrote: > On Mon, Jan 14, 2019 at 1:25 PM Dave Chinner <david@fromorbit.com> wrote: > > > > On Mon, Jan 14, 2019 at 02:15:40AM -0500, Pankaj Gupta wrote: > > > > > > > > Until you have images (and hence host page cache) shared between > > > > > multiple guests. People will want to do this, because it means they > > > > > only need a single set of pages in host memory for executable > > > > > binaries rather than a set of pages per guest. Then you have > > > > > multiple guests being able to detect residency of the same set of > > > > > pages. If the guests can then, in any way, control eviction of the > > > > > pages from the host cache, then we have a guest-to-guest information > > > > > leak channel. > > > > > > > > I don't think we should ever be considering something that would allow a > > > > guest to evict page's from the host's pagecache [1]. The guest should > > > > be able to kick its own references to the host's pagecache out of its > > > > own pagecache, but not be able to influence whether the host or another > > > > guest has a read-only mapping cached. > > > > > > > > [1] Unless the guest is allowed to modify the host's file; obviously > > > > truncation, holepunching, etc are going to evict pages from the host's > > > > page cache. > > > > > > This is so correct. Guest does not not evict host page cache pages directly. > > > > They don't right now. > > > > But someone is going to end up asking for discard to work so that > > the guest can free unused space in the underlying spares image (i.e. > > make use of fstrim or mount -o discard) because they have workloads > > that have bursts of space usage and they need to trim the image > > files afterwards to keep their overall space usage under control. > > > > And then.... > > ...we reject / push back on that patch citing the above concern. So at what point do we draw the line? We're allowing writable DAX mappings, but as I've pointed out that means we are going to be allowing a potential information leak via files with shared extents to be directly mapped and written to. But we won't allow useful admin operations that allow better management of host side storage space similar to how normal image files are used by guests because it's an information leak vector? That's splitting some really fine hairs there... > > > In case of virtio-pmem & DAX, guest clears guest page cache exceptional entries. > > > Its solely decision of host to take action on the host page cache pages. > > > > > > In case of virtio-pmem, guest does not modify host file directly i.e don't > > > perform hole punch & truncation operation directly on host file. > > > > ... this will no longer be true, and the nuclear landmine in this > > driver interface will have been armed.... > > I agree with the need to be careful when / if explicit cache control > is added, but that's not the case today. "if"? I expect it to be "when", not if. Expect the worst, plan for it now. Cheers, Dave.
On Tue, Jan 15, 2019 at 09:21:32AM +1100, Dave Chinner wrote: > On Mon, Jan 14, 2019 at 01:35:57PM -0800, Dan Williams wrote: > > On Mon, Jan 14, 2019 at 1:25 PM Dave Chinner <david@fromorbit.com> wrote: > > > > > > On Mon, Jan 14, 2019 at 02:15:40AM -0500, Pankaj Gupta wrote: > > > > > > > > > > Until you have images (and hence host page cache) shared between > > > > > > multiple guests. People will want to do this, because it means they > > > > > > only need a single set of pages in host memory for executable > > > > > > binaries rather than a set of pages per guest. Then you have > > > > > > multiple guests being able to detect residency of the same set of > > > > > > pages. If the guests can then, in any way, control eviction of the > > > > > > pages from the host cache, then we have a guest-to-guest information > > > > > > leak channel. > > > > > > > > > > I don't think we should ever be considering something that would allow a > > > > > guest to evict page's from the host's pagecache [1]. The guest should > > > > > be able to kick its own references to the host's pagecache out of its > > > > > own pagecache, but not be able to influence whether the host or another > > > > > guest has a read-only mapping cached. > > > > > > > > > > [1] Unless the guest is allowed to modify the host's file; obviously > > > > > truncation, holepunching, etc are going to evict pages from the host's > > > > > page cache. > > > > > > > > This is so correct. Guest does not not evict host page cache pages directly. > > > > > > They don't right now. > > > > > > But someone is going to end up asking for discard to work so that > > > the guest can free unused space in the underlying spares image (i.e. > > > make use of fstrim or mount -o discard) because they have workloads > > > that have bursts of space usage and they need to trim the image > > > files afterwards to keep their overall space usage under control. > > > > > > And then.... > > > > ...we reject / push back on that patch citing the above concern. > > So at what point do we draw the line? > > We're allowing writable DAX mappings, but as I've pointed out that > means we are going to be allowing a potential information leak via > files with shared extents to be directly mapped and written to. > > But we won't allow useful admin operations that allow better > management of host side storage space similar to how normal image > files are used by guests because it's an information leak vector? > > That's splitting some really fine hairs there... May I summarize that th security implications need to be documented? In fact that would make a fine security implications section in the device specification. > > > > In case of virtio-pmem & DAX, guest clears guest page cache exceptional entries. > > > > Its solely decision of host to take action on the host page cache pages. > > > > > > > > In case of virtio-pmem, guest does not modify host file directly i.e don't > > > > perform hole punch & truncation operation directly on host file. > > > > > > ... this will no longer be true, and the nuclear landmine in this > > > driver interface will have been armed.... > > > > I agree with the need to be careful when / if explicit cache control > > is added, but that's not the case today. > > "if"? > > I expect it to be "when", not if. Expect the worst, plan for it now. > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com
> > > On Mon, Jan 14, 2019 at 02:15:40AM -0500, Pankaj Gupta wrote: > > > > > > > > > > Until you have images (and hence host page cache) shared between > > > > > > multiple guests. People will want to do this, because it means they > > > > > > only need a single set of pages in host memory for executable > > > > > > binaries rather than a set of pages per guest. Then you have > > > > > > multiple guests being able to detect residency of the same set of > > > > > > pages. If the guests can then, in any way, control eviction of the > > > > > > pages from the host cache, then we have a guest-to-guest > > > > > > information > > > > > > leak channel. > > > > > > > > > > I don't think we should ever be considering something that would > > > > > allow a > > > > > guest to evict page's from the host's pagecache [1]. The guest > > > > > should > > > > > be able to kick its own references to the host's pagecache out of its > > > > > own pagecache, but not be able to influence whether the host or > > > > > another > > > > > guest has a read-only mapping cached. > > > > > > > > > > [1] Unless the guest is allowed to modify the host's file; obviously > > > > > truncation, holepunching, etc are going to evict pages from the > > > > > host's > > > > > page cache. > > > > > > > > This is so correct. Guest does not not evict host page cache pages > > > > directly. > > > > > > They don't right now. > > > > > > But someone is going to end up asking for discard to work so that > > > the guest can free unused space in the underlying spares image (i.e. > > > make use of fstrim or mount -o discard) because they have workloads > > > that have bursts of space usage and they need to trim the image > > > files afterwards to keep their overall space usage under control. > > > > > > And then.... > > > > ...we reject / push back on that patch citing the above concern. > > So at what point do we draw the line? > > We're allowing writable DAX mappings, but as I've pointed out that > means we are going to be allowing a potential information leak via > files with shared extents to be directly mapped and written to. > > But we won't allow useful admin operations that allow better > management of host side storage space similar to how normal image > files are used by guests because it's an information leak vector? First of all Thank you for all the useful discussions. I am summarizing here: - We have to live with the limitation to not support fstrim and mount -o discard options with virtio-pmem as they will evict host page cache pages. We cannot allow this for virtio-pmem for security reasons. These filesystem commands will just zero out unused pages currently. - If alot of space is unused and not freed guest can request host Administrator for truncating the host backing image. We are also planning to support qcow2 sparse image format at host side with virtio-pmem. - There is no existing solution for Qemu persistent memory emulation with write support currently. This solution provides us the paravartualized way of emulating persistent memory. It does not emulate of ACPI structures instead it just uses VIRTIO for communication between guest & host. It is fast because of its asynchronous nature and it works well. This makes use of at guest side libnvdimm API's - If disk size freeing problem with guest files trim truncate is very important for users, they can still use real hardware which will provide them both (advance disk features & page cache by pass). Considering all the above reasons I think this feature is useful from virtualization point of view. As Dave rightly said we should be careful and I think now we are careful with the security implications of this device. Thanks again for all the inputs. Best regards, Pankaj > > That's splitting some really fine hairs there... > > > > > In case of virtio-pmem & DAX, guest clears guest page cache exceptional > > > > entries. > > > > Its solely decision of host to take action on the host page cache > > > > pages. > > > > > > > > In case of virtio-pmem, guest does not modify host file directly i.e > > > > don't > > > > perform hole punch & truncation operation directly on host file. > > > > > > ... this will no longer be true, and the nuclear landmine in this > > > driver interface will have been armed.... > > > > I agree with the need to be careful when / if explicit cache control > > is added, but that's not the case today. > > "if"? > > I expect it to be "when", not if. Expect the worst, plan for it now. > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com >
> > > > > > > > On Mon, Jan 14, 2019 at 02:15:40AM -0500, Pankaj Gupta wrote: > > > > > > > > > > > > Until you have images (and hence host page cache) shared between > > > > > > > multiple guests. People will want to do this, because it means > > > > > > > they > > > > > > > only need a single set of pages in host memory for executable > > > > > > > binaries rather than a set of pages per guest. Then you have > > > > > > > multiple guests being able to detect residency of the same set of > > > > > > > pages. If the guests can then, in any way, control eviction of > > > > > > > the > > > > > > > pages from the host cache, then we have a guest-to-guest > > > > > > > information > > > > > > > leak channel. > > > > > > > > > > > > I don't think we should ever be considering something that would > > > > > > allow a > > > > > > guest to evict page's from the host's pagecache [1]. The guest > > > > > > should > > > > > > be able to kick its own references to the host's pagecache out of > > > > > > its > > > > > > own pagecache, but not be able to influence whether the host or > > > > > > another > > > > > > guest has a read-only mapping cached. > > > > > > > > > > > > [1] Unless the guest is allowed to modify the host's file; > > > > > > obviously > > > > > > truncation, holepunching, etc are going to evict pages from the > > > > > > host's > > > > > > page cache. > > > > > > > > > > This is so correct. Guest does not not evict host page cache pages > > > > > directly. > > > > > > > > They don't right now. > > > > > > > > But someone is going to end up asking for discard to work so that > > > > the guest can free unused space in the underlying spares image (i.e. > > > > make use of fstrim or mount -o discard) because they have workloads > > > > that have bursts of space usage and they need to trim the image > > > > files afterwards to keep their overall space usage under control. > > > > > > > > And then.... > > > > > > ...we reject / push back on that patch citing the above concern. > > > > So at what point do we draw the line? > > > > We're allowing writable DAX mappings, but as I've pointed out that > > means we are going to be allowing a potential information leak via > > files with shared extents to be directly mapped and written to. > > > > But we won't allow useful admin operations that allow better > > management of host side storage space similar to how normal image > > files are used by guests because it's an information leak vector? > > > > That's splitting some really fine hairs there... > > May I summarize that th security implications need to > be documented? > > In fact that would make a fine security implications section > in the device specification. This is a very good suggestion. I will document the security implications in details in device specification with details of what all filesystem features we don't support and why. Best regards, Pankaj > > > > > > > > > > In case of virtio-pmem & DAX, guest clears guest page cache > > > > > exceptional entries. > > > > > Its solely decision of host to take action on the host page cache > > > > > pages. > > > > > > > > > > In case of virtio-pmem, guest does not modify host file directly i.e > > > > > don't > > > > > perform hole punch & truncation operation directly on host file. > > > > > > > > ... this will no longer be true, and the nuclear landmine in this > > > > driver interface will have been armed.... > > > > > > I agree with the need to be careful when / if explicit cache control > > > is added, but that's not the case today. > > > > "if"? > > > > I expect it to be "when", not if. Expect the worst, plan for it now. > > > > Cheers, > > > > Dave. > > -- > > Dave Chinner > > david@fromorbit.com > >
On Tue, Jan 15, 2019 at 12:35:06AM -0500, Pankaj Gupta wrote: > > > > > On Mon, Jan 14, 2019 at 02:15:40AM -0500, Pankaj Gupta wrote: > > > > > > > > > > > > Until you have images (and hence host page cache) shared between > > > > > > > multiple guests. People will want to do this, because it means they > > > > > > > only need a single set of pages in host memory for executable > > > > > > > binaries rather than a set of pages per guest. Then you have > > > > > > > multiple guests being able to detect residency of the same set of > > > > > > > pages. If the guests can then, in any way, control eviction of the > > > > > > > pages from the host cache, then we have a guest-to-guest > > > > > > > information > > > > > > > leak channel. > > > > > > > > > > > > I don't think we should ever be considering something that would > > > > > > allow a > > > > > > guest to evict page's from the host's pagecache [1]. The guest > > > > > > should > > > > > > be able to kick its own references to the host's pagecache out of its > > > > > > own pagecache, but not be able to influence whether the host or > > > > > > another > > > > > > guest has a read-only mapping cached. > > > > > > > > > > > > [1] Unless the guest is allowed to modify the host's file; obviously > > > > > > truncation, holepunching, etc are going to evict pages from the > > > > > > host's > > > > > > page cache. > > > > > > > > > > This is so correct. Guest does not not evict host page cache pages > > > > > directly. > > > > > > > > They don't right now. > > > > > > > > But someone is going to end up asking for discard to work so that > > > > the guest can free unused space in the underlying spares image (i.e. > > > > make use of fstrim or mount -o discard) because they have workloads > > > > that have bursts of space usage and they need to trim the image > > > > files afterwards to keep their overall space usage under control. > > > > > > > > And then.... > > > > > > ...we reject / push back on that patch citing the above concern. > > > > So at what point do we draw the line? > > > > We're allowing writable DAX mappings, but as I've pointed out that > > means we are going to be allowing a potential information leak via > > files with shared extents to be directly mapped and written to. > > > > But we won't allow useful admin operations that allow better > > management of host side storage space similar to how normal image > > files are used by guests because it's an information leak vector? > > First of all Thank you for all the useful discussions. > I am summarizing here: > > - We have to live with the limitation to not support fstrim and > mount -o discard options with virtio-pmem as they will evict > host page cache pages. We cannot allow this for virtio-pmem > for security reasons. These filesystem commands will just zero out > unused pages currently. Not sure I follow you here - what pages are going to be zeroed and when will they be zeroed? If discard is not allowed, filesystems just don't issue such commands and the underlying device will never seen them. > - If alot of space is unused and not freed guest can request host > Administrator for truncating the host backing image. You can't use truncate to free space in a disk image file. The only way to do it safely in a generic, filesystem agnositic way is to mount the disk image (e.g. on loopback) and run fstrim on it. The loopback device will punches holes in the file where all the free space is reported by the filesystem via discard requests. Which is kinda my point - this could only be done if the guest is shut down, which makes it very difficult for admins to manage. > We are also planning to support qcow2 sparse image format at > host side with virtio-pmem. So you're going to be remapping a huge number of disjoint regions into a linear pmem mapping? ISTR discussions about similar things for virtio+fuse+dax that came up against "large numbers of mapped regions don't scale" and so it wasn't a practical solution compared to a just using raw sparse files.... > - There is no existing solution for Qemu persistent memory > emulation with write support currently. This solution provides > us the paravartualized way of emulating persistent memory. Sure, but the question is why do you need to create an emulation that doesn't actually perform like pmem? The whole point of pmem is performance, and emulating pmem by mmap() of a file on spinning disks is going to be horrible for performance. Even on SSDs it's going to be orders of magnitudes slower than real pmem. So exactly what problem are you trying to solve with this driver? Cheers, Dave.
On Wed, Jan 09, 2019 at 08:17:31PM +0530, Pankaj Gupta wrote: > This patch series has implementation for "virtio pmem". > "virtio pmem" is fake persistent memory(nvdimm) in guest > which allows to bypass the guest page cache. This also > implements a VIRTIO based asynchronous flush mechanism. At Pankaj's request I looked at information leak implications of virtio pmem in light of the recent page cache side channels paper (https://arxiv.org/pdf/1901.01161.pdf) - to see what kind of side channels it might create if any. TLDR - I think that depending on the host side implementation there could be some, but this might be addressable by better documentation in both code and spec. The fake dax approach backing the guest memory by a host page cache does seem to have potential issues. For clarity: we are talking about leaking information either to a VM, or within a VM (I did not look into leaks to hypervisor in configurations such as SEV) through host page cache. Leaks into a VM: It seems clear that while pmem allows memory accesses versus read/write with e.g. a block device, from host page cache point of view this doesn't matter much: reads populate cache in the same way as memory faults. Thus ignoring presence of information leaks (which is an interesting question e.g. in light of recent discard support) pmem doesn't seem to be any better or worse for leaking information into a VM. Leaks within VM: Right now pmem seems to bypass the guest page cache completely. Whether pmem memory is then resident in a page cache would be up to the device/host. Assuming that it is, the "Preventing Efficient Eviction while Increasing the System Performance" countermeasure for the page cache side channel attack would appear to become ineffective with pmem. What is suggested is a per-process management of the page cache, and host does not have visibility of processes within a VM. Another possible countermeasure - not discussed in the paper - could be modify the applications to lock the security relevant pages in memory. Again this becomes impractical with pmem as host does not have visibility into that. However note that as long as the only countermeasure linux uses is "Privileged Access" (i.e. blocking mincore) nothing can be done as guest page cache remains as vulnerable as host page cache. Countermeasures: which host-side countermeasures can be designed would depend on which countermeasures are used guest-side - we would need to make sure they are not broken by pmem. For "Preventing Efficient Eviction while Increasing the System Performance" modifying the host implementation to ensure that pmem device bypasses the host page cache would seem to address the security problem.Similarly, ensuring that a real memory device (e.g. DAX, RAM such as hugetlbfs, pmem for nested virt) is used for pmem would make the memory locking countermeasure work. Whether with such limitations the device is still useful performance wise is an open question. These questions probably should be addressed in the documentation, spec and possible qemu code. Severity of the security implications: some people argue that the security implications of the page cache leaks are minor. I do not have an opinion on this: the severity would seem to depend on the specific configuration. Other security implications: recent discussion seems to suggest there are other concerns around e.g. resource management and thus DOS potential. If that's so, it's a matter for a separate discussion as I didn't look into that in depth. Some or all of the above might be based on a misunderstanding of the current pmem code, the whitepaper and linux page cache in general. If so I apologise, do not hesitate to call out any mistakes. Thanks!
+CC [Dave Chinner], to maintain updated CC list > > This patch series has implementation for "virtio pmem". > > "virtio pmem" is fake persistent memory(nvdimm) in guest > > which allows to bypass the guest page cache. This also > > implements a VIRTIO based asynchronous flush mechanism. > > > At Pankaj's request I looked at information leak implications of virtio > pmem in light of the recent page cache side channels paper > (https://arxiv.org/pdf/1901.01161.pdf) - to see what > kind of side channels it might create if any. TLDR - I think that > depending on the host side implementation there could be some, but this > might be addressable by better documentation in both code and spec. > The fake dax approach backing the guest memory by a host page cache > does seem to have potential issues. > > For clarity: we are talking about leaking information either to a VM, or > within a VM (I did not look into leaks to hypervisor in configurations > such as SEV) through host page cache. > > Leaks into a VM: It seems clear that while pmem allows memory accesses > versus read/write with e.g. a block device, from host page cache point > of view this doesn't matter much: reads populate cache in the same way > as memory faults. Thus ignoring presence of information leaks (which is > an interesting question e.g. in light of recent discard support) pmem > doesn't seem to be any better or worse for leaking information into a > VM. > > Leaks within VM: Right now pmem seems to bypass the guest page cache > completely. Whether pmem memory is then resident in a page cache would > be up to the device/host. Assuming that it is, the "Preventing > Efficient Eviction while Increasing the System Performance" > countermeasure for the page cache side channel attack would appear to > become ineffective with pmem. What is suggested is a per-process > management of the page cache, and host does not have visibility of > processes within a VM. Another possible countermeasure - not discussed > in the paper - could be modify the applications to lock the security > relevant pages in memory. Again this becomes impractical with pmem as > host does not have visibility into that. However note that as long > as the only countermeasure linux uses is "Privileged Access" > (i.e. blocking mincore) nothing can be done as guest page cache > remains as vulnerable as host page cache. > > > Countermeasures: which host-side countermeasures can be designed would > depend on which countermeasures are used guest-side - we would need to > make sure they are not broken by pmem. For "Preventing Efficient > Eviction while Increasing the System Performance" modifying the host > implementation to ensure that pmem device bypasses the host page cache > would seem to address the security problem.Similarly, ensuring that a > real memory device (e.g. DAX, RAM such as hugetlbfs, pmem for nested > virt) is used for pmem would make the memory locking countermeasure > work. Whether with such limitations the device is still useful > performance wise is an open question. These questions probably should > be addressed in the documentation, spec and possible qemu code. > > > > Severity of the security implications: some people argue that the > security implications of the page cache leaks are minor. I do not have > an opinion on this: the severity would seem to depend on the specific > configuration. > > > Other security implications: recent discussion seems to suggest there > are other concerns around e.g. resource management and thus DOS > potential. If that's so, it's a matter for a separate discussion > as I didn't look into that in depth. > > Some or all of the above might be based on a misunderstanding of the > current pmem code, the whitepaper and linux page cache in general. > If so I apologise, do not hesitate to call out any mistakes. > > Thanks! > > -- > MST > >
On 04.02.19 23:56, Michael S. Tsirkin wrote: > > On Wed, Jan 09, 2019 at 08:17:31PM +0530, Pankaj Gupta wrote: >> This patch series has implementation for "virtio pmem". >> "virtio pmem" is fake persistent memory(nvdimm) in guest >> which allows to bypass the guest page cache. This also >> implements a VIRTIO based asynchronous flush mechanism. > > > At Pankaj's request I looked at information leak implications of virtio > pmem in light of the recent page cache side channels paper > (https://arxiv.org/pdf/1901.01161.pdf) - to see what > kind of side channels it might create if any. TLDR - I think that > depending on the host side implementation there could be some, but this > might be addressable by better documentation in both code and spec. > The fake dax approach backing the guest memory by a host page cache > does seem to have potential issues. > > For clarity: we are talking about leaking information either to a VM, or > within a VM (I did not look into leaks to hypervisor in configurations > such as SEV) through host page cache. > > Leaks into a VM: It seems clear that while pmem allows memory accesses > versus read/write with e.g. a block device, from host page cache point > of view this doesn't matter much: reads populate cache in the same way > as memory faults. Thus ignoring presence of information leaks (which is > an interesting question e.g. in light of recent discard support) pmem > doesn't seem to be any better or worse for leaking information into a > VM. +1, just a different way to access that cache. Conceptually a virtio-pmem devices is from the guest view a "device with a managed buffer". Some accesses might be faster than others. There are no guarantees on how fast a certain access is. And yes, actions on other guests can result in accesses being slower but not faster. Also other storage devices have caches like that (well, the caches size depends on the device) - thinking especially about storage systems - which would in my opinion, also allow similar leaks. How are such security concerns handled there? Are they different (besides eventually access speed)? > > Leaks within VM: Right now pmem seems to bypass the guest page cache > completely. Whether pmem memory is then resident in a page cache would > be up to the device/host. Assuming that it is, the "Preventing > Efficient Eviction while Increasing the System Performance" > countermeasure for the page cache side channel attack would appear to > become ineffective with pmem. What is suggested is a per-process > management of the page cache, and host does not have visibility of > processes within a VM. Another possible countermeasure - not discussed > in the paper - could be modify the applications to lock the security > relevant pages in memory. Again this becomes impractical with pmem as > host does not have visibility into that. However note that as long > as the only countermeasure linux uses is "Privileged Access" > (i.e. blocking mincore) nothing can be done as guest page cache > remains as vulnerable as host page cache. This sounds very use-case specific. If I run a VM only with a very specific workload (say, a container running one application), I usually don't care about leaks within the VM. At least not leaks between applications ;) In contrast, to running different applications (e.g. containers from different customers) on one system, I really care about leaks within a VM. > > > Countermeasures: which host-side countermeasures can be designed would > depend on which countermeasures are used guest-side - we would need to > make sure they are not broken by pmem. For "Preventing Efficient > Eviction while Increasing the System Performance" modifying the host > implementation to ensure that pmem device bypasses the host page cache > would seem to address the security problem.Similarly, ensuring that a > real memory device (e.g. DAX, RAM such as hugetlbfs, pmem for nested > virt) is used for pmem would make the memory locking countermeasure > work. Whether with such limitations the device is still useful > performance wise is an open question. These questions probably should > be addressed in the documentation, spec and possible qemu code. > I also want to note that using a disk/file as memory backend with NVDIMMs in QEMU essentially results in the exact same questions we have with virtio-pmem. E.g. kata-containers use nvdimms for the rootfile system (read-only) as far as I am aware. Conceptually, a virtio-pmem device is just an emulated nvdimm device with a flush interface. And the nice thing is, that it is designed to also work on architectures that don't speak "nvdimm". > > Severity of the security implications: some people argue that the > security implications of the page cache leaks are minor. I do not have > an opinion on this: the severity would seem to depend on the specific > configuration. I guess configuration and use case. Nice summary, thanks for looking into this Michael!
On Wed, Feb 06, 2019 at 03:00:26PM +0100, David Hildenbrand wrote: > On 04.02.19 23:56, Michael S. Tsirkin wrote: > > > > On Wed, Jan 09, 2019 at 08:17:31PM +0530, Pankaj Gupta wrote: > >> This patch series has implementation for "virtio pmem". > >> "virtio pmem" is fake persistent memory(nvdimm) in guest > >> which allows to bypass the guest page cache. This also > >> implements a VIRTIO based asynchronous flush mechanism. > > > > > > At Pankaj's request I looked at information leak implications of virtio > > pmem in light of the recent page cache side channels paper > > (https://arxiv.org/pdf/1901.01161.pdf) - to see what > > kind of side channels it might create if any. TLDR - I think that > > depending on the host side implementation there could be some, but this > > might be addressable by better documentation in both code and spec. > > The fake dax approach backing the guest memory by a host page cache > > does seem to have potential issues. > > > > For clarity: we are talking about leaking information either to a VM, or > > within a VM (I did not look into leaks to hypervisor in configurations > > such as SEV) through host page cache. > > > > Leaks into a VM: It seems clear that while pmem allows memory accesses > > versus read/write with e.g. a block device, from host page cache point > > of view this doesn't matter much: reads populate cache in the same way > > as memory faults. Thus ignoring presence of information leaks (which is > > an interesting question e.g. in light of recent discard support) pmem > > doesn't seem to be any better or worse for leaking information into a > > VM. > > +1, just a different way to access that cache. > > Conceptually a virtio-pmem devices is from the guest view a "device with > a managed buffer". Some accesses might be faster than others. There are > no guarantees on how fast a certain access is. And yes, actions on other > guests can result in accesses being slower but not faster. > > Also other storage devices have caches like that (well, the caches size > depends on the device) - thinking especially about storage systems - > which would in my opinion, also allow similar leaks. How are such > security concerns handled there? Are they different (besides eventually > access speed)? > > > > > Leaks within VM: Right now pmem seems to bypass the guest page cache > > completely. Whether pmem memory is then resident in a page cache would > > be up to the device/host. Assuming that it is, the "Preventing > > Efficient Eviction while Increasing the System Performance" > > countermeasure for the page cache side channel attack would appear to > > become ineffective with pmem. What is suggested is a per-process > > management of the page cache, and host does not have visibility of > > processes within a VM. Another possible countermeasure - not discussed > > in the paper - could be modify the applications to lock the security > > relevant pages in memory. Again this becomes impractical with pmem as > > host does not have visibility into that. However note that as long > > as the only countermeasure linux uses is "Privileged Access" > > (i.e. blocking mincore) nothing can be done as guest page cache > > remains as vulnerable as host page cache. > > This sounds very use-case specific. If I run a VM only with a very > specific workload (say, a container running one application), I usually > don't care about leaks within the VM. At least not leaks between > applications ;) > > In contrast, to running different applications (e.g. containers from > different customers) on one system, I really care about leaks within a VM. Clearly, not everyone cares about closing off information leaks. > > > > > > Countermeasures: which host-side countermeasures can be designed would > > depend on which countermeasures are used guest-side - we would need to > > make sure they are not broken by pmem. For "Preventing Efficient > > Eviction while Increasing the System Performance" modifying the host > > implementation to ensure that pmem device bypasses the host page cache > > would seem to address the security problem.Similarly, ensuring that a > > real memory device (e.g. DAX, RAM such as hugetlbfs, pmem for nested > > virt) is used for pmem would make the memory locking countermeasure > > work. Whether with such limitations the device is still useful > > performance wise is an open question. These questions probably should > > be addressed in the documentation, spec and possible qemu code. > > > I also want to note that using a disk/file as memory backend with > NVDIMMs in QEMU essentially results in the exact same questions we have > with virtio-pmem. > > E.g. kata-containers use nvdimms for the rootfile system (read-only) as > far as I am aware. > > Conceptually, a virtio-pmem device is just an emulated nvdimm device > with a flush interface. And the nice thing is, that it is designed to > also work on architectures that don't speak "nvdimm". > > > > > Severity of the security implications: some people argue that the > > security implications of the page cache leaks are minor. I do not have > > an opinion on this: the severity would seem to depend on the specific > > configuration. > > I guess configuration and use case. Good point. > Nice summary, thanks for looking into this Michael! > > > -- > > Thanks, > > David / dhildenb > _______________________________________________ > Virtualization mailing list > Virtualization@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Hi Michael, Thanks for looking into this and summarizing in detail. > > This patch series has implementation for "virtio pmem". > > "virtio pmem" is fake persistent memory(nvdimm) in guest > > which allows to bypass the guest page cache. This also > > implements a VIRTIO based asynchronous flush mechanism. > > > At Pankaj's request I looked at information leak implications of virtio > pmem in light of the recent page cache side channels paper > (https://arxiv.org/pdf/1901.01161.pdf) - to see what > kind of side channels it might create if any. TLDR - I think that > depending on the host side implementation there could be some, but this > might be addressable by better documentation in both code and spec. > The fake dax approach backing the guest memory by a host page cache > does seem to have potential issues. > > For clarity: we are talking about leaking information either to a VM, or > within a VM (I did not look into leaks to hypervisor in configurations > such as SEV) through host page cache. > > Leaks into a VM: It seems clear that while pmem allows memory accesses > versus read/write with e.g. a block device, from host page cache point > of view this doesn't matter much: reads populate cache in the same way > as memory faults. Thus ignoring presence of information leaks (which is > an interesting question e.g. in light of recent discard support) pmem > doesn't seem to be any better or worse for leaking information into a > VM. > > Leaks within VM: Right now pmem seems to bypass the guest page cache > completely. Whether pmem memory is then resident in a page cache would > be up to the device/host. Assuming that it is, the "Preventing > Efficient Eviction while Increasing the System Performance" > countermeasure for the page cache side channel attack would appear to > become ineffective with pmem. What is suggested is a per-process > management of the page cache, and host does not have visibility of > processes within a VM. Another possible countermeasure - not discussed > in the paper - could be modify the applications to lock the security > relevant pages in memory. Again this becomes impractical with pmem as > host does not have visibility into that. However note that as long > as the only countermeasure linux uses is "Privileged Access" > (i.e. blocking mincore) nothing can be done as guest page cache > remains as vulnerable as host page cache. > > > Countermeasures: which host-side countermeasures can be designed would > depend on which countermeasures are used guest-side - we would need to > make sure they are not broken by pmem. For "Preventing Efficient > Eviction while Increasing the System Performance" modifying the host > implementation to ensure that pmem device bypasses the host page cache > would seem to address the security problem.Similarly, ensuring that a > real memory device (e.g. DAX, RAM such as hugetlbfs, pmem for nested > virt) is used for pmem would make the memory locking countermeasure > work. Whether with such limitations the device is still useful > performance wise is an open question. These questions probably should > be addressed in the documentation, spec and possible qemu code. > > > > Severity of the security implications: some people argue that the > security implications of the page cache leaks are minor. I do not have > an opinion on this: the severity would seem to depend on the specific > configuration. > > > Other security implications: recent discussion seems to suggest there > are other concerns around e.g. resource management and thus DOS > potential. If that's so, it's a matter for a separate discussion > as I didn't look into that in depth. > > Some or all of the above might be based on a misunderstanding of the > current pmem code, the whitepaper and linux page cache in general. > If so I apologise, do not hesitate to call out any mistakes. I agree similar to any guest VM(without virtio-pmem) or any host userspace process, virtio-pmem may also have some security implications. We need to document these in virtio-pmem host device specification and in qemu code. This is to make sure while creating virtio-pmem device, we are aware of these implications and create/use host side device backing file accordingly as per the use-case described by David here [1]. I will document these in next version of my patch series. Hello Dave, Are we okay with this? Thank you everyone for the discussion. [1] https://marc.info/?l=linux-kernel&m=154946989419403&w=2 Best regards, Pankaj
On Mon, Feb 11, 2019 at 02:29:46AM -0500, Pankaj Gupta wrote: > Hello Dave, > Are we okay with this? Sure. I'm not sure I agree with all the analysis presented, but, well, I haven't looked any deeper because I'm tired of being shouted at and being called argumentative for daring to ask hard questions about this topic.... Cheers, Dave.
On 11.02.19 23:29, Dave Chinner wrote: > On Mon, Feb 11, 2019 at 02:29:46AM -0500, Pankaj Gupta wrote: >> Hello Dave, >> Are we okay with this? > > Sure. > > I'm not sure I agree with all the analysis presented, but, well, I > haven't looked any deeper because I'm tired of being shouted at and > being called argumentative for daring to ask hard questions about > this topic.... I think if you have concerns, they should definitely be discussed. Making people frustrated that review code is not what we want. Not at all. I suggest that Pankaj properly documents what we found out so far about security concerns and properly describes intended use cases and answers other questions you had in the cover letter / documentation of the follow up series. Thanks Dave! > > Cheers, > > Dave. >
On Mon, Feb 11, 2019 at 11:58:15PM +0100, David Hildenbrand wrote: > On 11.02.19 23:29, Dave Chinner wrote: > > On Mon, Feb 11, 2019 at 02:29:46AM -0500, Pankaj Gupta wrote: > >> Hello Dave, > >> Are we okay with this? > > > > Sure. > > > > I'm not sure I agree with all the analysis presented, but, well, I > > haven't looked any deeper because I'm tired of being shouted at and > > being called argumentative for daring to ask hard questions about > > this topic.... > > I think if you have concerns, they should definitely be discussed. > Making people frustrated that review code is not what we want. Not at all. > > I suggest that Pankaj properly documents what we found out so far about > security concerns and properly describes intended use cases and answers > other questions you had in the cover letter / documentation of the > follow up series. > > Thanks Dave! Right. Also, there's an open question that you posed: Also other storage devices have caches like that (well, the caches size depends on the device) - thinking especially about storage systems - which would in my opinion, also allow similar leaks. How are such security concerns handled there? Are they different (besides eventually access speed)? and that needs some looking into, and reporting on. > > > > Cheers, > > > > Dave. > > > > > -- > > Thanks, > > David / dhildenb