Message ID | 155414977872.12780.13728555131525362206.stgit@gimli.home (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vfio/type1: Limit DMA mappings per container | expand |
On Mon, Apr 01, 2019 at 02:16:52PM -0600, Alex Williamson wrote: [...] > @@ -1081,8 +1088,14 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, > goto out_unlock; > } > > + if (!atomic_add_unless(&iommu->dma_avail, -1, 0)) { > + ret = -ENOSPC; > + goto out_unlock; > + } > + > dma = kzalloc(sizeof(*dma), GFP_KERNEL); > if (!dma) { > + atomic_inc(&iommu->dma_avail); This should be the only special path to revert the change. Not sure whether this can be avoided by simply using atomic_read() or even READ_ONCE() (I feel like we don't need atomic ops with dma_avail because we've had the mutex but it of course it doesn't hurt...) to replace atomic_add_unless() above to check against zero then we do +1/-1 in vfio_[un]link_dma() only. But AFAICT this patch is correct. Reviewed-by: Peter Xu <peterx@redhat.com> Thanks,
On Tue, 2 Apr 2019 10:41:15 +0800 Peter Xu <peterx@redhat.com> wrote: > On Mon, Apr 01, 2019 at 02:16:52PM -0600, Alex Williamson wrote: > > [...] > > > @@ -1081,8 +1088,14 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, > > goto out_unlock; > > } > > > > + if (!atomic_add_unless(&iommu->dma_avail, -1, 0)) { > > + ret = -ENOSPC; > > + goto out_unlock; > > + } > > + > > dma = kzalloc(sizeof(*dma), GFP_KERNEL); > > if (!dma) { > > + atomic_inc(&iommu->dma_avail); > > This should be the only special path to revert the change. Not sure > whether this can be avoided by simply using atomic_read() or even > READ_ONCE() (I feel like we don't need atomic ops with dma_avail > because we've had the mutex but it of course it doesn't hurt...) to > replace atomic_add_unless() above to check against zero then we do > +1/-1 in vfio_[un]link_dma() only. But AFAICT this patch is correct. Thanks for the review, you're right, we're only twiddling this atomic while holding the iommu->lock mutex, so it appears unnecessary. Since we're within the mutex, I think we don't even need a READ_ONCE. We can simple test it before alloc and decrement after. Am I missing something that would specifically require READ_ONCE within our mutex critical section? Thanks, Alex
On Mon, Apr 01, 2019 at 10:34:13PM -0600, Alex Williamson wrote: > On Tue, 2 Apr 2019 10:41:15 +0800 > Peter Xu <peterx@redhat.com> wrote: > > > On Mon, Apr 01, 2019 at 02:16:52PM -0600, Alex Williamson wrote: > > > > [...] > > > > > @@ -1081,8 +1088,14 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, > > > goto out_unlock; > > > } > > > > > > + if (!atomic_add_unless(&iommu->dma_avail, -1, 0)) { > > > + ret = -ENOSPC; > > > + goto out_unlock; > > > + } > > > + > > > dma = kzalloc(sizeof(*dma), GFP_KERNEL); > > > if (!dma) { > > > + atomic_inc(&iommu->dma_avail); > > > > This should be the only special path to revert the change. Not sure > > whether this can be avoided by simply using atomic_read() or even > > READ_ONCE() (I feel like we don't need atomic ops with dma_avail > > because we've had the mutex but it of course it doesn't hurt...) to > > replace atomic_add_unless() above to check against zero then we do > > +1/-1 in vfio_[un]link_dma() only. But AFAICT this patch is correct. > > Thanks for the review, you're right, we're only twiddling this atomic > while holding the iommu->lock mutex, so it appears unnecessary. Since > we're within the mutex, I think we don't even need a READ_ONCE. We can > simple test it before alloc and decrement after. Am I missing something > that would specifically require READ_ONCE within our mutex critical > section? Thanks, I don't know very clear on this and I'd be glad to learn about that. My understanding is that [READ|WRITE]_ONCE() is the same as volatile mem operation and will make sure we don't keep variables in the registers. So if the mutex semantics can support that (say, a "*addr = val" following with a mutex_unlock will make sure "val" will definitely land into memory of "&addr") then I do think it's fine even without it (which corresponds to WRITE_ONCE(&addr, val) in this case). Thanks,
On Tue, 2 Apr 2019 13:18:02 +0800 Peter Xu <peterx@redhat.com> wrote: > On Mon, Apr 01, 2019 at 10:34:13PM -0600, Alex Williamson wrote: > > On Tue, 2 Apr 2019 10:41:15 +0800 > > Peter Xu <peterx@redhat.com> wrote: > > > > > On Mon, Apr 01, 2019 at 02:16:52PM -0600, Alex Williamson wrote: > > > > > > [...] > > > > > > > @@ -1081,8 +1088,14 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, > > > > goto out_unlock; > > > > } > > > > > > > > + if (!atomic_add_unless(&iommu->dma_avail, -1, 0)) { > > > > + ret = -ENOSPC; > > > > + goto out_unlock; > > > > + } > > > > + > > > > dma = kzalloc(sizeof(*dma), GFP_KERNEL); > > > > if (!dma) { > > > > + atomic_inc(&iommu->dma_avail); > > > > > > This should be the only special path to revert the change. Not sure > > > whether this can be avoided by simply using atomic_read() or even > > > READ_ONCE() (I feel like we don't need atomic ops with dma_avail > > > because we've had the mutex but it of course it doesn't hurt...) to > > > replace atomic_add_unless() above to check against zero then we do > > > +1/-1 in vfio_[un]link_dma() only. But AFAICT this patch is correct. > > > > Thanks for the review, you're right, we're only twiddling this atomic > > while holding the iommu->lock mutex, so it appears unnecessary. Since > > we're within the mutex, I think we don't even need a READ_ONCE. We can > > simple test it before alloc and decrement after. Am I missing something > > that would specifically require READ_ONCE within our mutex critical > > section? Thanks, > > I don't know very clear on this and I'd be glad to learn about that. > My understanding is that [READ|WRITE]_ONCE() is the same as volatile > mem operation and will make sure we don't keep variables in the > registers. So if the mutex semantics can support that (say, a "*addr > = val" following with a mutex_unlock will make sure "val" will > definitely land into memory of "&addr") then I do think it's fine even > without it (which corresponds to WRITE_ONCE(&addr, val) in this case). The READ/WRITE_ONCE macros add memory barriers, but we have the mutex for protecting concurrent access to the data. I don't see that there's anything special about a counter on the iommu object that needs special attention vs any other elements that might get modified in these sections. Thanks, Alex
On Mon, 01 Apr 2019 14:16:52 -0600 Alex Williamson <alex.williamson@redhat.com> wrote: > Memory backed DMA mappings are accounted against a user's locked > memory limit, including multiple mappings of the same memory. This > accounting bounds the number of such mappings that a user can create. > However, DMA mappings that are not backed by memory, such as DMA > mappings of device MMIO via mmaps, do not make use of page pinning > and therefore do not count against the user's locked memory limit. > These mappings still consume memory, but the memory is not well > associated to the process for the purpose of oom killing a task. > > To add bounding on this use case, we introduce a limit to the total > number of concurrent DMA mappings that a user is allowed to create. > This limit is exposed as a tunable module option where the default > value of 64K is expected to be well in excess of any reasonable use > case (a large virtual machine configuration would typically only make > use of tens of concurrent mappings). > > This fixes CVE-2019-3882. > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > --- > drivers/vfio/vfio_iommu_type1.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) Reviewed-by: Cornelia Huck <cohuck@redhat.com>
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 73652e21efec..7fc8fd7d4dc7 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -58,12 +58,18 @@ module_param_named(disable_hugepages, MODULE_PARM_DESC(disable_hugepages, "Disable VFIO IOMMU support for IOMMU hugepages."); +static int dma_entry_limit __read_mostly = U16_MAX; +module_param_named(dma_entry_limit, dma_entry_limit, int, 0644); +MODULE_PARM_DESC(dma_entry_limit, + "Maximum number of user DMA mappings per container (65535)."); + struct vfio_iommu { struct list_head domain_list; struct vfio_domain *external_domain; /* domain for external user */ struct mutex lock; struct rb_root dma_list; struct blocking_notifier_head notifier; + atomic_t dma_avail; bool v2; bool nesting; }; @@ -836,6 +842,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) vfio_unlink_dma(iommu, dma); put_task_struct(dma->task); kfree(dma); + atomic_inc(&iommu->dma_avail); } static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu) @@ -1081,8 +1088,14 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, goto out_unlock; } + if (!atomic_add_unless(&iommu->dma_avail, -1, 0)) { + ret = -ENOSPC; + goto out_unlock; + } + dma = kzalloc(sizeof(*dma), GFP_KERNEL); if (!dma) { + atomic_inc(&iommu->dma_avail); ret = -ENOMEM; goto out_unlock; } @@ -1583,6 +1596,7 @@ static void *vfio_iommu_type1_open(unsigned long arg) INIT_LIST_HEAD(&iommu->domain_list); iommu->dma_list = RB_ROOT; + atomic_set(&iommu->dma_avail, dma_entry_limit); mutex_init(&iommu->lock); BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
Memory backed DMA mappings are accounted against a user's locked memory limit, including multiple mappings of the same memory. This accounting bounds the number of such mappings that a user can create. However, DMA mappings that are not backed by memory, such as DMA mappings of device MMIO via mmaps, do not make use of page pinning and therefore do not count against the user's locked memory limit. These mappings still consume memory, but the memory is not well associated to the process for the purpose of oom killing a task. To add bounding on this use case, we introduce a limit to the total number of concurrent DMA mappings that a user is allowed to create. This limit is exposed as a tunable module option where the default value of 64K is expected to be well in excess of any reasonable use case (a large virtual machine configuration would typically only make use of tens of concurrent mappings). This fixes CVE-2019-3882. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> --- drivers/vfio/vfio_iommu_type1.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)