diff mbox series

media: vb2-dc: skip CPU sync in map/unmap dma_buf

Message ID 20190228071943.13072-1-l.stach@pengutronix.de (mailing list archive)
State New, archived
Headers show
Series media: vb2-dc: skip CPU sync in map/unmap dma_buf | expand

Commit Message

Lucas Stach Feb. 28, 2019, 7:19 a.m. UTC
This is rougly equivalent to ca0e68e21aae (drm/prime: skip CPU sync
in map/unmap dma_buf). The contig memory allocated is already device
coherent memory, so there is no point in doing a CPU sync when
mapping it to another device. Also most importers currently cache
the mapping so the CPU sync would only happen on the first import.
With that in mind we are better off with not pretending to do a
cache synchronization at all.

This gets rid of a lot of CPU overhead in uses where those dma-bufs
are regularily imported and detached again, like Weston is currently
doing in the DRM compositor.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
---
 drivers/media/common/videobuf2/videobuf2-dma-contig.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Hans Verkuil March 12, 2019, 7:57 a.m. UTC | #1
Pawel and/or Marek, can you take a look at this?

It looks sane to me, but I'd like to have a second opinion as well before
merging this.

On 2/28/19 8:19 AM, Lucas Stach wrote:
> This is rougly equivalent to ca0e68e21aae (drm/prime: skip CPU sync
> in map/unmap dma_buf). The contig memory allocated is already device
> coherent memory, so there is no point in doing a CPU sync when
> mapping it to another device. Also most importers currently cache
> the mapping so the CPU sync would only happen on the first import.
> With that in mind we are better off with not pretending to do a
> cache synchronization at all.
> 
> This gets rid of a lot of CPU overhead in uses where those dma-bufs
> are regularily imported and detached again, like Weston is currently
> doing in the DRM compositor.

Lucas, one question: shouldn't the same be done for dma-sg and vmalloc?

Regards,

	Hans

> 
> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
> ---
>  drivers/media/common/videobuf2/videobuf2-dma-contig.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> index aff0ab7bf83d..d38f097c14ae 100644
> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> @@ -273,8 +273,8 @@ static void vb2_dc_dmabuf_ops_detach(struct dma_buf *dbuf,
>  
>  	/* release the scatterlist cache */
>  	if (attach->dma_dir != DMA_NONE)
> -		dma_unmap_sg(db_attach->dev, sgt->sgl, sgt->orig_nents,
> -			attach->dma_dir);
> +		dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
> +			attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
>  	sg_free_table(sgt);
>  	kfree(attach);
>  	db_attach->priv = NULL;
> @@ -305,8 +305,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
>  	}
>  
>  	/* mapping to the client with new direction */
> -	sgt->nents = dma_map_sg(db_attach->dev, sgt->sgl, sgt->orig_nents,
> -				dma_dir);
> +	sgt->nents = dma_map_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
> +				dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
>  	if (!sgt->nents) {
>  		pr_err("failed to map scatterlist\n");
>  		mutex_unlock(lock);
>
Lucas Stach March 12, 2019, 10:03 a.m. UTC | #2
Hi Hans,

Am Dienstag, den 12.03.2019, 08:57 +0100 schrieb Hans Verkuil:
> Pawel and/or Marek, can you take a look at this?
> 
> It looks sane to me, but I'd like to have a second opinion as well before
> merging this.
> 
> On 2/28/19 8:19 AM, Lucas Stach wrote:
> > This is rougly equivalent to ca0e68e21aae (drm/prime: skip CPU sync
> > in map/unmap dma_buf). The contig memory allocated is already device
> > coherent memory, so there is no point in doing a CPU sync when
> > mapping it to another device. Also most importers currently cache
> > the mapping so the CPU sync would only happen on the first import.
> > With that in mind we are better off with not pretending to do a
> > cache synchronization at all.
> > 
> > This gets rid of a lot of CPU overhead in uses where those dma-bufs
> > are regularily imported and detached again, like Weston is currently
> > doing in the DRM compositor.
> 
> Lucas, one question: shouldn't the same be done for dma-sg and vmalloc?

I think we should do it eventually, as the importer side caching of the
mappings will undermine any cache maintenance we do in the dmabuf
map/unmap calls. But then the argument is less clear cut for dma-sg and
vmalloc as those buffers will most likely have valid allocations in the
cache. dc is a very clear special case, as the device coherent
allocations never have valid cache allocations, so dropping the
synchronization is very low risk to break any existing use-case.

Both dma-sg and vmalloc require a bit more thought to make the cache
maintenance both effective and low overhead, so this patch is only a
start by picking the low-hanging fruit.

Regards,
Lucas

> > 
> > > > Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
> > ---
> >  drivers/media/common/videobuf2/videobuf2-dma-contig.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > index aff0ab7bf83d..d38f097c14ae 100644
> > --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > @@ -273,8 +273,8 @@ static void vb2_dc_dmabuf_ops_detach(struct dma_buf *dbuf,
> >  
> >  	/* release the scatterlist cache */
> >  	if (attach->dma_dir != DMA_NONE)
> > -		dma_unmap_sg(db_attach->dev, sgt->sgl, sgt->orig_nents,
> > -			attach->dma_dir);
> > +		dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
> > +			attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
> >  	sg_free_table(sgt);
> >  	kfree(attach);
> >  	db_attach->priv = NULL;
> > @@ -305,8 +305,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
> >  	}
> >  
> >  	/* mapping to the client with new direction */
> > -	sgt->nents = dma_map_sg(db_attach->dev, sgt->sgl, sgt->orig_nents,
> > -				dma_dir);
> > +	sgt->nents = dma_map_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
> > +				dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
> >  	if (!sgt->nents) {
> >  		pr_err("failed to map scatterlist\n");
> >  		mutex_unlock(lock);
> > 
> 
>
Lucas Stach May 3, 2019, 10:59 a.m. UTC | #3
Hi Hans,

Am Dienstag, den 12.03.2019, 11:03 +0100 schrieb Lucas Stach:
> Hi Hans,
> 
> Am Dienstag, den 12.03.2019, 08:57 +0100 schrieb Hans Verkuil:
> > Pawel and/or Marek, can you take a look at this?

Seems nobody else is interested in this patch. Would you be willing to
take it without further review?

Regards,
Lucas

> > It looks sane to me, but I'd like to have a second opinion as well before
> > merging this.
> > 
> > On 2/28/19 8:19 AM, Lucas Stach wrote:
> > > This is rougly equivalent to ca0e68e21aae (drm/prime: skip CPU sync
> > > in map/unmap dma_buf). The contig memory allocated is already device
> > > coherent memory, so there is no point in doing a CPU sync when
> > > mapping it to another device. Also most importers currently cache
> > > the mapping so the CPU sync would only happen on the first import.
> > > With that in mind we are better off with not pretending to do a
> > > cache synchronization at all.
> > > 
> > > This gets rid of a lot of CPU overhead in uses where those dma-bufs
> > > are regularily imported and detached again, like Weston is currently
> > > doing in the DRM compositor.
> > 
> > Lucas, one question: shouldn't the same be done for dma-sg and vmalloc?
> 
> I think we should do it eventually, as the importer side caching of the
> mappings will undermine any cache maintenance we do in the dmabuf
> map/unmap calls. But then the argument is less clear cut for dma-sg and
> vmalloc as those buffers will most likely have valid allocations in the
> cache. dc is a very clear special case, as the device coherent
> allocations never have valid cache allocations, so dropping the
> synchronization is very low risk to break any existing use-case.
> 
> Both dma-sg and vmalloc require a bit more thought to make the cache
> maintenance both effective and low overhead, so this patch is only a
> start by picking the low-hanging fruit.
> 
> Regards,
> Lucas
> 
> > > 
> > > > > Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
> > > 
> > > ---
> > >  drivers/media/common/videobuf2/videobuf2-dma-contig.c | 8 ++++----
> > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > > index aff0ab7bf83d..d38f097c14ae 100644
> > > --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > > +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > > @@ -273,8 +273,8 @@ static void vb2_dc_dmabuf_ops_detach(struct dma_buf *dbuf,
> > >  
> > > > > >  	/* release the scatterlist cache */
> > > > > >  	if (attach->dma_dir != DMA_NONE)
> > > > > > -		dma_unmap_sg(db_attach->dev, sgt->sgl, sgt->orig_nents,
> > > > > > -			attach->dma_dir);
> > > > > > +		dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
> > > > > > +			attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
> > > > > >  	sg_free_table(sgt);
> > > > > >  	kfree(attach);
> > > > > >  	db_attach->priv = NULL;
> > > @@ -305,8 +305,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
> > > > > >  	}
> > >  
> > > > > >  	/* mapping to the client with new direction */
> > > > > > -	sgt->nents = dma_map_sg(db_attach->dev, sgt->sgl, sgt->orig_nents,
> > > > > > -				dma_dir);
> > > > > > +	sgt->nents = dma_map_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
> > > > > > +				dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
> > > > > >  	if (!sgt->nents) {
> > > > > >  		pr_err("failed to map scatterlist\n");
> > > > > >  		mutex_unlock(lock);
> > > 
> > 
> > 
> 
>
Hans Verkuil May 3, 2019, 12:38 p.m. UTC | #4
Hi Lucas,

As you mentioned there hasn't been any further review comments, so
it is fair not to postpone this.

However, I have one new review comment myself that I would like to
see addressed in a v2 before I merge this for 5.3:

On 2/28/19 8:19 AM, Lucas Stach wrote:
> This is rougly equivalent to ca0e68e21aae (drm/prime: skip CPU sync
> in map/unmap dma_buf). The contig memory allocated is already device
> coherent memory, so there is no point in doing a CPU sync when
> mapping it to another device. Also most importers currently cache
> the mapping so the CPU sync would only happen on the first import.
> With that in mind we are better off with not pretending to do a
> cache synchronization at all.
> 
> This gets rid of a lot of CPU overhead in uses where those dma-bufs
> are regularily imported and detached again, like Weston is currently
> doing in the DRM compositor.
> 
> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
> ---
>  drivers/media/common/videobuf2/videobuf2-dma-contig.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> index aff0ab7bf83d..d38f097c14ae 100644
> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> @@ -273,8 +273,8 @@ static void vb2_dc_dmabuf_ops_detach(struct dma_buf *dbuf,
>  
>  	/* release the scatterlist cache */
>  	if (attach->dma_dir != DMA_NONE)
> -		dma_unmap_sg(db_attach->dev, sgt->sgl, sgt->orig_nents,
> -			attach->dma_dir);
> +		dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
> +			attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);

Please add comments here...

>  	sg_free_table(sgt);
>  	kfree(attach);
>  	db_attach->priv = NULL;
> @@ -305,8 +305,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
>  	}
>  
>  	/* mapping to the client with new direction */
> -	sgt->nents = dma_map_sg(db_attach->dev, sgt->sgl, sgt->orig_nents,
> -				dma_dir);
> +	sgt->nents = dma_map_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
> +				dma_dir, DMA_ATTR_SKIP_CPU_SYNC);

... and here to explain why you can skip the cpu sync. The comment in ops_detach
can refer to the comment in ops_map, so there is no need to give the full
explanation in two places.

It is not obvious that you can skip the CPU sync, so an explanation is helpful.

Regards,

	Hans

>  	if (!sgt->nents) {
>  		pr_err("failed to map scatterlist\n");
>  		mutex_unlock(lock);
>
Tomasz Figa June 7, 2019, 4:32 a.m. UTC | #5
Hi Hans, Lucas,

On Fri, May 3, 2019 at 9:38 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> Hi Lucas,
>
> As you mentioned there hasn't been any further review comments, so
> it is fair not to postpone this.
>

Sorry for being late to the party. I didn't notice this patch before.
(Perhaps it could be worth adding me as a reviewer to the MAINTAINERS
entries for these parts of the media subsystem?)

> However, I have one new review comment myself that I would like to
> see addressed in a v2 before I merge this for 5.3:
>
> On 2/28/19 8:19 AM, Lucas Stach wrote:
> > This is rougly equivalent to ca0e68e21aae (drm/prime: skip CPU sync
> > in map/unmap dma_buf). The contig memory allocated is already device
> > coherent memory, so there is no point in doing a CPU sync when
> > mapping it to another device. Also most importers currently cache
> > the mapping so the CPU sync would only happen on the first import.
> > With that in mind we are better off with not pretending to do a
> > cache synchronization at all.
> >
> > This gets rid of a lot of CPU overhead in uses where those dma-bufs
> > are regularily imported and detached again, like Weston is currently
> > doing in the DRM compositor.
> >
> > Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
> > ---
> >  drivers/media/common/videobuf2/videobuf2-dma-contig.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > index aff0ab7bf83d..d38f097c14ae 100644
> > --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > @@ -273,8 +273,8 @@ static void vb2_dc_dmabuf_ops_detach(struct dma_buf *dbuf,
> >
> >       /* release the scatterlist cache */
> >       if (attach->dma_dir != DMA_NONE)
> > -             dma_unmap_sg(db_attach->dev, sgt->sgl, sgt->orig_nents,
> > -                     attach->dma_dir);
> > +             dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
> > +                     attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
>
> Please add comments here...
>
> >       sg_free_table(sgt);
> >       kfree(attach);
> >       db_attach->priv = NULL;
> > @@ -305,8 +305,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
> >       }
> >
> >       /* mapping to the client with new direction */
> > -     sgt->nents = dma_map_sg(db_attach->dev, sgt->sgl, sgt->orig_nents,
> > -                             dma_dir);
> > +     sgt->nents = dma_map_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
> > +                             dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
>
> ... and here to explain why you can skip the cpu sync. The comment in ops_detach
> can refer to the comment in ops_map, so there is no need to give the full
> explanation in two places.
>
> It is not obvious that you can skip the CPU sync, so an explanation is helpful.

The change makes sense indeed, thanks. With Hans' suggestion addressed:

Reviewed-by: Tomasz Figa <tfiga@chromium.org>

In the bigger perspective, the DMA-buf and DMA API "map" semantics are
kind of confusing themselves, because they imply "sync" by default. I
wonder what it would take to just completely split "sync" from "map"
in both APIs. It's even worse with DMA-buf, because you cannot ask
dma_buf_map_attachment() not to skip the sync for you, nor you cannot
sync again without unmapping and mapping again...

Best regards,
Tomasz
Hans Verkuil June 7, 2019, 6:34 a.m. UTC | #6
On 6/7/19 6:32 AM, Tomasz Figa wrote:
> Hi Hans, Lucas,
> 
> On Fri, May 3, 2019 at 9:38 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>>
>> Hi Lucas,
>>
>> As you mentioned there hasn't been any further review comments, so
>> it is fair not to postpone this.
>>
> 
> Sorry for being late to the party. I didn't notice this patch before.
> (Perhaps it could be worth adding me as a reviewer to the MAINTAINERS
> entries for these parts of the media subsystem?)

Post a patch for that. I'd like to have more reviewers for patches in
this area since it is not really my area of expertise.

Regards,

	Hans
diff mbox series

Patch

diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
index aff0ab7bf83d..d38f097c14ae 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
@@ -273,8 +273,8 @@  static void vb2_dc_dmabuf_ops_detach(struct dma_buf *dbuf,
 
 	/* release the scatterlist cache */
 	if (attach->dma_dir != DMA_NONE)
-		dma_unmap_sg(db_attach->dev, sgt->sgl, sgt->orig_nents,
-			attach->dma_dir);
+		dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
+			attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
 	sg_free_table(sgt);
 	kfree(attach);
 	db_attach->priv = NULL;
@@ -305,8 +305,8 @@  static struct sg_table *vb2_dc_dmabuf_ops_map(
 	}
 
 	/* mapping to the client with new direction */
-	sgt->nents = dma_map_sg(db_attach->dev, sgt->sgl, sgt->orig_nents,
-				dma_dir);
+	sgt->nents = dma_map_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
+				dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
 	if (!sgt->nents) {
 		pr_err("failed to map scatterlist\n");
 		mutex_unlock(lock);