vfio/type1: Limit DMA mappings per container

Message ID	155414977872.12780.13728555131525362206.stgit@gimli.home (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> Subject: [PATCH] vfio/type1: Limit DMA mappings per container From: Alex Williamson <alex.williamson@redhat.com> To: alex.williamson@redhat.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, eric.auger@redhat.com, cohuck@redhat.com Date: Mon, 01 Apr 2019 14:16:52 -0600 Message-ID: <155414977872.12780.13728555131525362206.stgit@gimli.home> User-Agent: StGit/0.19-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: kvm-owner@vger.kernel.org Precedence: bulk
Series	vfio/type1: Limit DMA mappings per container \| expand vfio/type1: Limit DMA mappings per container

Alex Williamson April 1, 2019, 8:16 p.m. UTC

Memory backed DMA mappings are accounted against a user's locked
memory limit, including multiple mappings of the same memory.  This
accounting bounds the number of such mappings that a user can create.
However, DMA mappings that are not backed by memory, such as DMA
mappings of device MMIO via mmaps, do not make use of page pinning
and therefore do not count against the user's locked memory limit.
These mappings still consume memory, but the memory is not well
associated to the process for the purpose of oom killing a task.

To add bounding on this use case, we introduce a limit to the total
number of concurrent DMA mappings that a user is allowed to create.
This limit is exposed as a tunable module option where the default
value of 64K is expected to be well in excess of any reasonable use
case (a large virtual machine configuration would typically only make
use of tens of concurrent mappings).

This fixes CVE-2019-3882.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/vfio_iommu_type1.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

Peter Xu April 2, 2019, 2:41 a.m. UTC | #1

On Mon, Apr 01, 2019 at 02:16:52PM -0600, Alex Williamson wrote:

[...]

> @@ -1081,8 +1088,14 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>  		goto out_unlock;
>  	}
>  
> +	if (!atomic_add_unless(&iommu->dma_avail, -1, 0)) {
> +		ret = -ENOSPC;
> +		goto out_unlock;
> +	}
> +
>  	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>  	if (!dma) {
> +		atomic_inc(&iommu->dma_avail);

This should be the only special path to revert the change.  Not sure
whether this can be avoided by simply using atomic_read() or even
READ_ONCE() (I feel like we don't need atomic ops with dma_avail
because we've had the mutex but it of course it doesn't hurt...) to
replace atomic_add_unless() above to check against zero then we do
+1/-1 in vfio_[un]link_dma() only.  But AFAICT this patch is correct.

Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks,

Alex Williamson April 2, 2019, 4:34 a.m. UTC | #2

On Tue, 2 Apr 2019 10:41:15 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Mon, Apr 01, 2019 at 02:16:52PM -0600, Alex Williamson wrote:
> 
> [...]
> 
> > @@ -1081,8 +1088,14 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> >  		goto out_unlock;
> >  	}
> >  
> > +	if (!atomic_add_unless(&iommu->dma_avail, -1, 0)) {
> > +		ret = -ENOSPC;
> > +		goto out_unlock;
> > +	}
> > +
> >  	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
> >  	if (!dma) {
> > +		atomic_inc(&iommu->dma_avail);  
> 
> This should be the only special path to revert the change.  Not sure
> whether this can be avoided by simply using atomic_read() or even
> READ_ONCE() (I feel like we don't need atomic ops with dma_avail
> because we've had the mutex but it of course it doesn't hurt...) to
> replace atomic_add_unless() above to check against zero then we do
> +1/-1 in vfio_[un]link_dma() only.  But AFAICT this patch is correct.

Thanks for the review, you're right, we're only twiddling this atomic
while holding the iommu->lock mutex, so it appears unnecessary.  Since
we're within the mutex, I think we don't even need a READ_ONCE.  We can
simple test it before alloc and decrement after.  Am I missing something
that would specifically require READ_ONCE within our mutex critical
section?  Thanks,

Alex

Peter Xu April 2, 2019, 5:18 a.m. UTC | #3

On Mon, Apr 01, 2019 at 10:34:13PM -0600, Alex Williamson wrote:
> On Tue, 2 Apr 2019 10:41:15 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Mon, Apr 01, 2019 at 02:16:52PM -0600, Alex Williamson wrote:
> > 
> > [...]
> > 
> > > @@ -1081,8 +1088,14 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> > >  		goto out_unlock;
> > >  	}
> > >  
> > > +	if (!atomic_add_unless(&iommu->dma_avail, -1, 0)) {
> > > +		ret = -ENOSPC;
> > > +		goto out_unlock;
> > > +	}
> > > +
> > >  	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
> > >  	if (!dma) {
> > > +		atomic_inc(&iommu->dma_avail);  
> > 
> > This should be the only special path to revert the change.  Not sure
> > whether this can be avoided by simply using atomic_read() or even
> > READ_ONCE() (I feel like we don't need atomic ops with dma_avail
> > because we've had the mutex but it of course it doesn't hurt...) to
> > replace atomic_add_unless() above to check against zero then we do
> > +1/-1 in vfio_[un]link_dma() only.  But AFAICT this patch is correct.
> 
> Thanks for the review, you're right, we're only twiddling this atomic
> while holding the iommu->lock mutex, so it appears unnecessary.  Since
> we're within the mutex, I think we don't even need a READ_ONCE.  We can
> simple test it before alloc and decrement after.  Am I missing something
> that would specifically require READ_ONCE within our mutex critical
> section?  Thanks,

I don't know very clear on this and I'd be glad to learn about that.
My understanding is that [READ|WRITE]_ONCE() is the same as volatile
mem operation and will make sure we don't keep variables in the
registers.  So if the mutex semantics can support that (say, a "*addr
= val" following with a mutex_unlock will make sure "val" will
definitely land into memory of "&addr") then I do think it's fine even
without it (which corresponds to WRITE_ONCE(&addr, val) in this case).

Thanks,

Alex Williamson April 2, 2019, 2:40 p.m. UTC | #4

On Tue, 2 Apr 2019 13:18:02 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Mon, Apr 01, 2019 at 10:34:13PM -0600, Alex Williamson wrote:
> > On Tue, 2 Apr 2019 10:41:15 +0800
> > Peter Xu <peterx@redhat.com> wrote:
> >   
> > > On Mon, Apr 01, 2019 at 02:16:52PM -0600, Alex Williamson wrote:
> > > 
> > > [...]
> > >   
> > > > @@ -1081,8 +1088,14 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> > > >  		goto out_unlock;
> > > >  	}
> > > >  
> > > > +	if (!atomic_add_unless(&iommu->dma_avail, -1, 0)) {
> > > > +		ret = -ENOSPC;
> > > > +		goto out_unlock;
> > > > +	}
> > > > +
> > > >  	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
> > > >  	if (!dma) {
> > > > +		atomic_inc(&iommu->dma_avail);    
> > > 
> > > This should be the only special path to revert the change.  Not sure
> > > whether this can be avoided by simply using atomic_read() or even
> > > READ_ONCE() (I feel like we don't need atomic ops with dma_avail
> > > because we've had the mutex but it of course it doesn't hurt...) to
> > > replace atomic_add_unless() above to check against zero then we do
> > > +1/-1 in vfio_[un]link_dma() only.  But AFAICT this patch is correct.  
> > 
> > Thanks for the review, you're right, we're only twiddling this atomic
> > while holding the iommu->lock mutex, so it appears unnecessary.  Since
> > we're within the mutex, I think we don't even need a READ_ONCE.  We can
> > simple test it before alloc and decrement after.  Am I missing something
> > that would specifically require READ_ONCE within our mutex critical
> > section?  Thanks,  
> 
> I don't know very clear on this and I'd be glad to learn about that.
> My understanding is that [READ|WRITE]_ONCE() is the same as volatile
> mem operation and will make sure we don't keep variables in the
> registers.  So if the mutex semantics can support that (say, a "*addr
> = val" following with a mutex_unlock will make sure "val" will
> definitely land into memory of "&addr") then I do think it's fine even
> without it (which corresponds to WRITE_ONCE(&addr, val) in this case).

The READ/WRITE_ONCE macros add memory barriers, but we have the mutex
for protecting concurrent access to the data.  I don't see that there's
anything special about a counter on the iommu object that needs special
attention vs any other elements that might get modified in these
sections.  Thanks,

Alex

Cornelia Huck April 2, 2019, 2:58 p.m. UTC | #5

On Mon, 01 Apr 2019 14:16:52 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> Memory backed DMA mappings are accounted against a user's locked
> memory limit, including multiple mappings of the same memory.  This
> accounting bounds the number of such mappings that a user can create.
> However, DMA mappings that are not backed by memory, such as DMA
> mappings of device MMIO via mmaps, do not make use of page pinning
> and therefore do not count against the user's locked memory limit.
> These mappings still consume memory, but the memory is not well
> associated to the process for the purpose of oom killing a task.
> 
> To add bounding on this use case, we introduce a limit to the total
> number of concurrent DMA mappings that a user is allowed to create.
> This limit is exposed as a tunable module option where the default
> value of 64K is expected to be well in excess of any reasonable use
> case (a large virtual machine configuration would typically only make
> use of tens of concurrent mappings).
> 
> This fixes CVE-2019-3882.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c |   14 ++++++++++++++
>  1 file changed, 14 insertions(+)

Reviewed-by: Cornelia Huck <cohuck@redhat.com>

vfio/type1: Limit DMA mappings per container

Commit Message

Comments

Patch