Message ID | 20210421032117.5177-1-jasowang@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | Untrusted device support for virtio | expand |
On Wed, Apr 21, 2021 at 11:21:10AM +0800, Jason Wang wrote: > The behaivor for non DMA API is kept for minimizing the performance > impact. NAK. Everyone should be using the DMA API in a modern world. So treating the DMA API path worse than the broken legacy path does not make any sense whatsoever.
在 2021/4/22 下午2:31, Christoph Hellwig 写道: > On Wed, Apr 21, 2021 at 11:21:10AM +0800, Jason Wang wrote: >> The behaivor for non DMA API is kept for minimizing the performance >> impact. > NAK. Everyone should be using the DMA API in a modern world. So > treating the DMA API path worse than the broken legacy path does not > make any sense whatsoever. I think the goal is not treat DMA API path worse than legacy. The issue is that the management layer should guarantee that ACCESS_PLATFORM is set so DMA API is guaranteed to be used by the driver. So I'm not sure how much value we can gain from trying to 'fix' the legacy path. But I can change the behavior of legacy path to match DMA API path. Thanks
On Thu, Apr 22, 2021 at 04:19:16PM +0800, Jason Wang wrote: > > 在 2021/4/22 下午2:31, Christoph Hellwig 写道: > > On Wed, Apr 21, 2021 at 11:21:10AM +0800, Jason Wang wrote: > > > The behaivor for non DMA API is kept for minimizing the performance > > > impact. > > NAK. Everyone should be using the DMA API in a modern world. So > > treating the DMA API path worse than the broken legacy path does not > > make any sense whatsoever. > > > I think the goal is not treat DMA API path worse than legacy. The issue is > that the management layer should guarantee that ACCESS_PLATFORM is set so > DMA API is guaranteed to be used by the driver. So I'm not sure how much > value we can gain from trying to 'fix' the legacy path. But I can change the > behavior of legacy path to match DMA API path. > > Thanks I think before we maintain different paths with/without ACCESS_PLATFORM it's worth checking whether it's even a net gain. Avoiding sharing by storing data in private memory can actually turn out to be a net gain even without DMA API. It is worth checking what is the performance effect of this patch.
在 2021/4/24 上午4:14, Michael S. Tsirkin 写道: > On Thu, Apr 22, 2021 at 04:19:16PM +0800, Jason Wang wrote: >> 在 2021/4/22 下午2:31, Christoph Hellwig 写道: >>> On Wed, Apr 21, 2021 at 11:21:10AM +0800, Jason Wang wrote: >>>> The behaivor for non DMA API is kept for minimizing the performance >>>> impact. >>> NAK. Everyone should be using the DMA API in a modern world. So >>> treating the DMA API path worse than the broken legacy path does not >>> make any sense whatsoever. >> >> I think the goal is not treat DMA API path worse than legacy. The issue is >> that the management layer should guarantee that ACCESS_PLATFORM is set so >> DMA API is guaranteed to be used by the driver. So I'm not sure how much >> value we can gain from trying to 'fix' the legacy path. But I can change the >> behavior of legacy path to match DMA API path. >> >> Thanks > I think before we maintain different paths with/without ACCESS_PLATFORM > it's worth checking whether it's even a net gain. Avoiding sharing > by storing data in private memory can actually turn out to be > a net gain even without DMA API. I agree. > > It is worth checking what is the performance effect of this patch. So I've posted v2, where private memory is used in no DMA API path (as what has been done in packed). Pktgen and netperf doens't see obvious difference. Thanks > >
On Wed, Apr 21, 2021 at 11:21:10AM +0800, Jason Wang wrote: > Hi All: > > Sometimes, the driver doesn't trust the device. This is usually > happens for the encrtpyed VM or VDUSE[1]. In both cases, technology > like swiotlb is used to prevent the poking/mangling of memory from the > device. But this is not sufficient since current virtio driver may > trust what is stored in the descriptor table (coherent mapping) for > performing the DMA operations like unmap and bounce so the device may > choose to utilize the behaviour of swiotlb to perform attacks[2]. We fixed it in the SWIOTLB. That is it saves the expected length of the DMA operation. See commit daf9514fd5eb098d7d6f3a1247cb8cc48fc94155 Author: Martin Radev <martin.b.radev@gmail.com> Date: Tue Jan 12 16:07:29 2021 +0100 swiotlb: Validate bounce size in the sync/unmap path The size of the buffer being bounced is not checked if it happens to be larger than the size of the mapped buffer. Because the size can be controlled by a device, as it's the case with virtio devices, this can lead to memory corruption. > > For double insurance, to protect from a malicous device, when DMA API > is used for the device, this series store and use the descriptor > metadata in an auxiliay structure which can not be accessed via > swiotlb instead of the ones in the descriptor table. Actually, we've Sorry for being dense here, but how wold SWIOTLB be utilized for this attack? > almost achieved that through packed virtqueue and we just need to fix > a corner case of handling mapping errors. For split virtqueue we just > follow what's done in the packed. > > Note that we don't duplicate descriptor medata for indirect > descriptors since it uses stream mapping which is read only so it's > safe if the metadata of non-indirect descriptors are correct. > > The behaivor for non DMA API is kept for minimizing the performance > impact. > > Slightly tested with packed on/off, iommu on/of, swiotlb force/off in > the guest. > > Please review. > > [1] https://lore.kernel.org/netdev/fab615ce-5e13-a3b3-3715-a4203b4ab010@redhat.com/T/ > [2] https://yhbt.net/lore/all/c3629a27-3590-1d9f-211b-c0b7be152b32@redhat.com/T/#mc6b6e2343cbeffca68ca7a97e0f473aaa871c95b > > Jason Wang (7): > virtio-ring: maintain next in extra state for packed virtqueue > virtio_ring: rename vring_desc_extra_packed > virtio-ring: factor out desc_extra allocation > virtio_ring: secure handling of mapping errors > virtio_ring: introduce virtqueue_desc_add_split() > virtio: use err label in __vring_new_virtqueue() > virtio-ring: store DMA metadata in desc_extra for split virtqueue > > drivers/virtio/virtio_ring.c | 189 ++++++++++++++++++++++++++--------- > 1 file changed, 141 insertions(+), 48 deletions(-) > > -- > 2.25.1 >
在 2021/4/29 上午5:06, Konrad Rzeszutek Wilk 写道: > On Wed, Apr 21, 2021 at 11:21:10AM +0800, Jason Wang wrote: >> Hi All: >> >> Sometimes, the driver doesn't trust the device. This is usually >> happens for the encrtpyed VM or VDUSE[1]. In both cases, technology >> like swiotlb is used to prevent the poking/mangling of memory from the >> device. But this is not sufficient since current virtio driver may >> trust what is stored in the descriptor table (coherent mapping) for >> performing the DMA operations like unmap and bounce so the device may >> choose to utilize the behaviour of swiotlb to perform attacks[2]. > We fixed it in the SWIOTLB. That is it saves the expected length > of the DMA operation. See > > commit daf9514fd5eb098d7d6f3a1247cb8cc48fc94155 > Author: Martin Radev <martin.b.radev@gmail.com> > Date: Tue Jan 12 16:07:29 2021 +0100 > > swiotlb: Validate bounce size in the sync/unmap path > > The size of the buffer being bounced is not checked if it happens > to be larger than the size of the mapped buffer. Because the size > can be controlled by a device, as it's the case with virtio devices, > this can lead to memory corruption. > Good to know this, but this series tries to protect at different level. And I believe such protection needs to be done at both levels. >> For double insurance, to protect from a malicous device, when DMA API >> is used for the device, this series store and use the descriptor >> metadata in an auxiliay structure which can not be accessed via >> swiotlb instead of the ones in the descriptor table. Actually, we've > Sorry for being dense here, but how wold SWIOTLB be utilized for > this attack? So we still behaviors that is triggered by device that is not trusted. Such behavior is what the series tries to avoid. We've learnt a lot of lessons to eliminate the potential attacks via this. And it would be too late to fix if we found another issue of SWIOTLB. Proving "the unexpected device triggered behavior is safe" is very hard (or even impossible) than "eliminating the unexpected device triggered behavior totally". E.g I wonder whether something like this can happen: Consider the DMA direction of unmap is under the control of device. The device can cheat the SWIOTLB by changing the flag to modify the device read only buffer. If yes, it is really safe? The above patch only log the bounce size but it doesn't log the flag. Even if it logs the flag, SWIOTLB still doesn't know how each buffer is used and when it's the appropriate(safe) time to unmap the buffer, only the driver that is using the SWIOTLB know them. So I think we need to consolidate on both layers instead of solely depending on the SWIOTLB. Thanks > >> almost achieved that through packed virtqueue and we just need to fix >> a corner case of handling mapping errors. For split virtqueue we just >> follow what's done in the packed. >> >> Note that we don't duplicate descriptor medata for indirect >> descriptors since it uses stream mapping which is read only so it's >> safe if the metadata of non-indirect descriptors are correct. >> >> The behaivor for non DMA API is kept for minimizing the performance >> impact. >> >> Slightly tested with packed on/off, iommu on/of, swiotlb force/off in >> the guest. >> >> Please review. >> >> [1] https://lore.kernel.org/netdev/fab615ce-5e13-a3b3-3715-a4203b4ab010@redhat.com/T/ >> [2] https://yhbt.net/lore/all/c3629a27-3590-1d9f-211b-c0b7be152b32@redhat.com/T/#mc6b6e2343cbeffca68ca7a97e0f473aaa871c95b >> >> Jason Wang (7): >> virtio-ring: maintain next in extra state for packed virtqueue >> virtio_ring: rename vring_desc_extra_packed >> virtio-ring: factor out desc_extra allocation >> virtio_ring: secure handling of mapping errors >> virtio_ring: introduce virtqueue_desc_add_split() >> virtio: use err label in __vring_new_virtqueue() >> virtio-ring: store DMA metadata in desc_extra for split virtqueue >> >> drivers/virtio/virtio_ring.c | 189 ++++++++++++++++++++++++++--------- >> 1 file changed, 141 insertions(+), 48 deletions(-) >> >> -- >> 2.25.1 >>
On 4/29/21 12:16 AM, Jason Wang wrote: > > 在 2021/4/29 上午5:06, Konrad Rzeszutek Wilk 写道: >> On Wed, Apr 21, 2021 at 11:21:10AM +0800, Jason Wang wrote: >>> Hi All: >>> >>> Sometimes, the driver doesn't trust the device. This is usually >>> happens for the encrtpyed VM or VDUSE[1]. In both cases, technology >>> like swiotlb is used to prevent the poking/mangling of memory from the >>> device. But this is not sufficient since current virtio driver may >>> trust what is stored in the descriptor table (coherent mapping) for >>> performing the DMA operations like unmap and bounce so the device may >>> choose to utilize the behaviour of swiotlb to perform attacks[2]. >> We fixed it in the SWIOTLB. That is it saves the expected length >> of the DMA operation. See >> >> commit daf9514fd5eb098d7d6f3a1247cb8cc48fc94155 >> Author: Martin Radev <martin.b.radev@gmail.com> >> Date: Tue Jan 12 16:07:29 2021 +0100 >> >> swiotlb: Validate bounce size in the sync/unmap path >> The size of the buffer being bounced is not checked if it happens >> to be larger than the size of the mapped buffer. Because the size >> can be controlled by a device, as it's the case with virtio devices, >> this can lead to memory corruption. > > > Good to know this, but this series tries to protect at different level. > And I believe such protection needs to be done at both levels. > My apologies for taking so long to respond, somehow this disappeared in one of the folders. > >>> For double insurance, to protect from a malicous device, when DMA API >>> is used for the device, this series store and use the descriptor >>> metadata in an auxiliay structure which can not be accessed via >>> swiotlb instead of the ones in the descriptor table. Actually, we've >> Sorry for being dense here, but how wold SWIOTLB be utilized for >> this attack? > > > So we still behaviors that is triggered by device that is not trusted. > Such behavior is what the series tries to avoid. We've learnt a lot of > lessons to eliminate the potential attacks via this. And it would be too > late to fix if we found another issue of SWIOTLB. > > Proving "the unexpected device triggered behavior is safe" is very hard > (or even impossible) than "eliminating the unexpected device triggered > behavior totally". > > E.g I wonder whether something like this can happen: Consider the DMA > direction of unmap is under the control of device. The device can cheat > the SWIOTLB by changing the flag to modify the device read only buffer. <blinks> Why would you want to expose that to the device? And wouldn't that be specific to Linux devices - because surely Windows DMA APIs are different and this 'flag' seems very Linux-kernel specific? > If yes, it is really safe? Well no? But neither is rm -Rf / but we still allow folks to do that. > > The above patch only log the bounce size but it doesn't log the flag. It logs and panics the system. > Even if it logs the flag, SWIOTLB still doesn't know how each buffer is > used and when it's the appropriate(safe) time to unmap the buffer, only > the driver that is using the SWIOTLB know them. Fair enough. Is the intent to do the same thing for all the other drivers that could be running in an encrypted guest and would require SWIOTLB. Like legacy devices that KVM can expose (floppy driver?, SVGA driver)? > > So I think we need to consolidate on both layers instead of solely > depending on the SWIOTLB. Please make sure that this explanation is in part of the cover letter or in the commit/Kconfig. Also, are you aware of the patchset than Andi been working on that tries to make the DMA code to have extra bells and whistles for this purpose? Thank you. > Thanks > > >> >>> almost achieved that through packed virtqueue and we just need to fix >>> a corner case of handling mapping errors. For split virtqueue we just >>> follow what's done in the packed. >>> >>> Note that we don't duplicate descriptor medata for indirect >>> descriptors since it uses stream mapping which is read only so it's >>> safe if the metadata of non-indirect descriptors are correct. >>> >>> The behaivor for non DMA API is kept for minimizing the performance >>> impact. >>> >>> Slightly tested with packed on/off, iommu on/of, swiotlb force/off in >>> the guest. >>> >>> Please review. >>> >>> [1] >>> https://lore.kernel.org/netdev/fab615ce-5e13-a3b3-3715-a4203b4ab010@redhat.com/T/ >>> >>> [2] >>> https://yhbt.net/lore/all/c3629a27-3590-1d9f-211b-c0b7be152b32@redhat.com/T/#mc6b6e2343cbeffca68ca7a97e0f473aaa871c95b >>> >>> >>> Jason Wang (7): >>> virtio-ring: maintain next in extra state for packed virtqueue >>> virtio_ring: rename vring_desc_extra_packed >>> virtio-ring: factor out desc_extra allocation >>> virtio_ring: secure handling of mapping errors >>> virtio_ring: introduce virtqueue_desc_add_split() >>> virtio: use err label in __vring_new_virtqueue() >>> virtio-ring: store DMA metadata in desc_extra for split virtqueue >>> >>> drivers/virtio/virtio_ring.c | 189 ++++++++++++++++++++++++++--------- >>> 1 file changed, 141 insertions(+), 48 deletions(-) >>> >>> -- >>> 2.25.1 >>> >
在 2021/6/4 下午11:17, Konrad Rzeszutek Wilk 写道: > On 4/29/21 12:16 AM, Jason Wang wrote: >> >> 在 2021/4/29 上午5:06, Konrad Rzeszutek Wilk 写道: >>> On Wed, Apr 21, 2021 at 11:21:10AM +0800, Jason Wang wrote: >>>> Hi All: >>>> >>>> Sometimes, the driver doesn't trust the device. This is usually >>>> happens for the encrtpyed VM or VDUSE[1]. In both cases, technology >>>> like swiotlb is used to prevent the poking/mangling of memory from the >>>> device. But this is not sufficient since current virtio driver may >>>> trust what is stored in the descriptor table (coherent mapping) for >>>> performing the DMA operations like unmap and bounce so the device may >>>> choose to utilize the behaviour of swiotlb to perform attacks[2]. >>> We fixed it in the SWIOTLB. That is it saves the expected length >>> of the DMA operation. See >>> >>> commit daf9514fd5eb098d7d6f3a1247cb8cc48fc94155 >>> Author: Martin Radev <martin.b.radev@gmail.com> >>> Date: Tue Jan 12 16:07:29 2021 +0100 >>> >>> swiotlb: Validate bounce size in the sync/unmap path >>> The size of the buffer being bounced is not checked if it happens >>> to be larger than the size of the mapped buffer. Because the size >>> can be controlled by a device, as it's the case with virtio >>> devices, >>> this can lead to memory corruption. >> >> >> Good to know this, but this series tries to protect at different >> level. And I believe such protection needs to be done at both levels. >> > > My apologies for taking so long to respond, somehow this disappeared > in one of the folders. No problem. >> >>>> For double insurance, to protect from a malicous device, when DMA API >>>> is used for the device, this series store and use the descriptor >>>> metadata in an auxiliay structure which can not be accessed via >>>> swiotlb instead of the ones in the descriptor table. Actually, we've >>> Sorry for being dense here, but how wold SWIOTLB be utilized for >>> this attack? >> >> >> So we still behaviors that is triggered by device that is not >> trusted. Such behavior is what the series tries to avoid. We've >> learnt a lot of lessons to eliminate the potential attacks via this. >> And it would be too late to fix if we found another issue of SWIOTLB. >> >> Proving "the unexpected device triggered behavior is safe" is very >> hard (or even impossible) than "eliminating the unexpected device >> triggered behavior totally". >> >> E.g I wonder whether something like this can happen: Consider the DMA >> direction of unmap is under the control of device. The device can >> cheat the SWIOTLB by changing the flag to modify the device read only >> buffer. > > <blinks> Why would you want to expose that to the device? And wouldn't > that be specific to Linux devices - because surely Windows DMA APIs > are different and this 'flag' seems very Linux-kernel specific? Just to make sure we are in the same page. The "flag" I actually mean the virtio descriptor flag which could be modified by the device. And driver deduce the DMA API flag from the descriptor flag. > >> If yes, it is really safe? > > Well no? But neither is rm -Rf / but we still allow folks to do that. >> >> The above patch only log the bounce size but it doesn't log the flag. > > It logs and panics the system. Good to know that. > >> Even if it logs the flag, SWIOTLB still doesn't know how each buffer >> is used and when it's the appropriate(safe) time to unmap the buffer, >> only the driver that is using the SWIOTLB know them. > > Fair enough. Is the intent to do the same thing for all the other > drivers that could be running in an encrypted guest and would require > SWIOTLB. > > Like legacy devices that KVM can expose (floppy driver?, SVGA driver)? My understanding is that we shouldn't enable the legacy devices at all in this case. Note that virtio has been extended to various types of devices (we can boot qemu without PCI and legacy devices (e.g the micro VM)) - virtio input - virtio gpu - virtio sound ... I'm not sure whether we need floppy, but it's not hard to have a virtio-floppy if necessary So it would be sufficient for us to audit/harden the virtio drivers. > >> >> So I think we need to consolidate on both layers instead of solely >> depending on the SWIOTLB. > > Please make sure that this explanation is in part of the cover letter > or in the commit/Kconfig. I will do that if the series needs a respin. > > Also, are you aware of the patchset than Andi been working on that > tries to make the DMA code to have extra bells and whistles for this > purpose? Yes, but as described above they are not duplicated. Protection at both levels would be optimal. Another note is that this series is not only for DMA/swiotlb stuffs, it eliminate all the possible attacks via the descriptor ring. (One example is the attack via descriptor.next) Thanks > > Thank you. >> Thanks >> >> >>> >>>> almost achieved that through packed virtqueue and we just need to fix >>>> a corner case of handling mapping errors. For split virtqueue we just >>>> follow what's done in the packed. >>>> >>>> Note that we don't duplicate descriptor medata for indirect >>>> descriptors since it uses stream mapping which is read only so it's >>>> safe if the metadata of non-indirect descriptors are correct. >>>> >>>> The behaivor for non DMA API is kept for minimizing the performance >>>> impact. >>>> >>>> Slightly tested with packed on/off, iommu on/of, swiotlb force/off in >>>> the guest. >>>> >>>> Please review. >>>> >>>> [1] >>>> https://lore.kernel.org/netdev/fab615ce-5e13-a3b3-3715-a4203b4ab010@redhat.com/T/ >>>> >>>> [2] >>>> https://yhbt.net/lore/all/c3629a27-3590-1d9f-211b-c0b7be152b32@redhat.com/T/#mc6b6e2343cbeffca68ca7a97e0f473aaa871c95b >>>> >>>> >>>> Jason Wang (7): >>>> virtio-ring: maintain next in extra state for packed virtqueue >>>> virtio_ring: rename vring_desc_extra_packed >>>> virtio-ring: factor out desc_extra allocation >>>> virtio_ring: secure handling of mapping errors >>>> virtio_ring: introduce virtqueue_desc_add_split() >>>> virtio: use err label in __vring_new_virtqueue() >>>> virtio-ring: store DMA metadata in desc_extra for split virtqueue >>>> >>>> drivers/virtio/virtio_ring.c | 189 >>>> ++++++++++++++++++++++++++--------- >>>> 1 file changed, 141 insertions(+), 48 deletions(-) >>>> >>>> -- >>>> 2.25.1 >>>> >> >