Message ID | 20200819065555.1802761-6-hch@lst.de (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [01/28] mm: turn alloc_pages into an inline function | expand |
Hi Christoph, On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig <hch@lst.de> wrote: > > The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused, Could you explain what makes you think it's unused? It's a feature of the UAPI generally supported by the videobuf2 framework and relied on by Chromium OS to get any kind of reasonable performance when accessing V4L2 buffers in the userspace. > and causes > weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is > unimplemented except on PARISC and some MIPS configs, and about to be > removed. It is implemented by the generic DMA mapping layer [1], which is used by a number of architectures including ARM64 and supposed to be used by new architectures going forward. [1] https://elixir.bootlin.com/linux/v5.9-rc1/source/kernel/dma/mapping.c#L341 When removing features from generic kernel code, I'd suggest first providing viable alternatives for its users, rather than killing the users altogether. Given the above, I'm afraid I have to NAK this. Best regards, Tomasz > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > .../userspace-api/media/v4l/buffer.rst | 17 --------- > .../media/v4l/vidioc-reqbufs.rst | 1 - > .../media/common/videobuf2/videobuf2-core.c | 36 +------------------ > .../common/videobuf2/videobuf2-dma-contig.c | 19 ---------- > .../media/common/videobuf2/videobuf2-dma-sg.c | 3 +- > .../media/common/videobuf2/videobuf2-v4l2.c | 12 ------- > include/media/videobuf2-core.h | 3 +- > include/uapi/linux/videodev2.h | 2 -- > 8 files changed, 3 insertions(+), 90 deletions(-) > > diff --git a/Documentation/userspace-api/media/v4l/buffer.rst b/Documentation/userspace-api/media/v4l/buffer.rst > index 57e752aaf414a7..2044ed13cd9d7d 100644 > --- a/Documentation/userspace-api/media/v4l/buffer.rst > +++ b/Documentation/userspace-api/media/v4l/buffer.rst > @@ -701,23 +701,6 @@ Memory Consistency Flags > :stub-columns: 0 > :widths: 3 1 4 > > - * .. _`V4L2-FLAG-MEMORY-NON-CONSISTENT`: > - > - - ``V4L2_FLAG_MEMORY_NON_CONSISTENT`` > - - 0x00000001 > - - A buffer is allocated either in consistent (it will be automatically > - coherent between the CPU and the bus) or non-consistent memory. The > - latter can provide performance gains, for instance the CPU cache > - sync/flush operations can be avoided if the buffer is accessed by the > - corresponding device only and the CPU does not read/write to/from that > - buffer. However, this requires extra care from the driver -- it must > - guarantee memory consistency by issuing a cache flush/sync when > - consistency is needed. If this flag is set V4L2 will attempt to > - allocate the buffer in non-consistent memory. The flag takes effect > - only if the buffer is used for :ref:`memory mapping <mmap>` I/O and the > - queue reports the :ref:`V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS > - <V4L2-BUF-CAP-SUPPORTS-MMAP-CACHE-HINTS>` capability. > - > .. c:type:: v4l2_memory > > enum v4l2_memory > diff --git a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst > index 75d894d9c36c42..3180c111d368ee 100644 > --- a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst > +++ b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst > @@ -169,7 +169,6 @@ aborting or finishing any DMA in progress, an implicit > - This capability is set by the driver to indicate that the queue supports > cache and memory management hints. However, it's only valid when the > queue is used for :ref:`memory mapping <mmap>` streaming I/O. See > - :ref:`V4L2_FLAG_MEMORY_NON_CONSISTENT <V4L2-FLAG-MEMORY-NON-CONSISTENT>`, > :ref:`V4L2_BUF_FLAG_NO_CACHE_INVALIDATE <V4L2-BUF-FLAG-NO-CACHE-INVALIDATE>` and > :ref:`V4L2_BUF_FLAG_NO_CACHE_CLEAN <V4L2-BUF-FLAG-NO-CACHE-CLEAN>`. > > diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c > index f544d3393e9d6b..66a41cef33c1b1 100644 > --- a/drivers/media/common/videobuf2/videobuf2-core.c > +++ b/drivers/media/common/videobuf2/videobuf2-core.c > @@ -721,39 +721,14 @@ int vb2_verify_memory_type(struct vb2_queue *q, > } > EXPORT_SYMBOL(vb2_verify_memory_type); > > -static void set_queue_consistency(struct vb2_queue *q, bool consistent_mem) > -{ > - q->dma_attrs &= ~DMA_ATTR_NON_CONSISTENT; > - > - if (!vb2_queue_allows_cache_hints(q)) > - return; > - if (!consistent_mem) > - q->dma_attrs |= DMA_ATTR_NON_CONSISTENT; > -} > - > -static bool verify_consistency_attr(struct vb2_queue *q, bool consistent_mem) > -{ > - bool queue_is_consistent = !(q->dma_attrs & DMA_ATTR_NON_CONSISTENT); > - > - if (consistent_mem != queue_is_consistent) { > - dprintk(q, 1, "memory consistency model mismatch\n"); > - return false; > - } > - return true; > -} > - > int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, > unsigned int flags, unsigned int *count) > { > unsigned int num_buffers, allocated_buffers, num_planes = 0; > unsigned plane_sizes[VB2_MAX_PLANES] = { }; > - bool consistent_mem = true; > unsigned int i; > int ret; > > - if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT) > - consistent_mem = false; > - > if (q->streaming) { > dprintk(q, 1, "streaming active\n"); > return -EBUSY; > @@ -765,8 +740,7 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, > } > > if (*count == 0 || q->num_buffers != 0 || > - (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory) || > - !verify_consistency_attr(q, consistent_mem)) { > + (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory)) { > /* > * We already have buffers allocated, so first check if they > * are not in use and can be freed. > @@ -803,7 +777,6 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, > num_buffers = min_t(unsigned int, num_buffers, VB2_MAX_FRAME); > memset(q->alloc_devs, 0, sizeof(q->alloc_devs)); > q->memory = memory; > - set_queue_consistency(q, consistent_mem); > > /* > * Ask the driver how many buffers and planes per buffer it requires. > @@ -894,12 +867,8 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory, > { > unsigned int num_planes = 0, num_buffers, allocated_buffers; > unsigned plane_sizes[VB2_MAX_PLANES] = { }; > - bool consistent_mem = true; > int ret; > > - if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT) > - consistent_mem = false; > - > if (q->num_buffers == VB2_MAX_FRAME) { > dprintk(q, 1, "maximum number of buffers already allocated\n"); > return -ENOBUFS; > @@ -912,15 +881,12 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory, > } > memset(q->alloc_devs, 0, sizeof(q->alloc_devs)); > q->memory = memory; > - set_queue_consistency(q, consistent_mem); > q->waiting_for_buffers = !q->is_output; > } else { > if (q->memory != memory) { > dprintk(q, 1, "memory model mismatch\n"); > return -EINVAL; > } > - if (!verify_consistency_attr(q, consistent_mem)) > - return -EINVAL; > } > > num_buffers = min(*count, VB2_MAX_FRAME - q->num_buffers); > diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c > index ec3446cc45b8da..7b1b86ec942d7d 100644 > --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c > +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c > @@ -42,11 +42,6 @@ struct vb2_dc_buf { > struct dma_buf_attachment *db_attach; > }; > > -static inline bool vb2_dc_buffer_consistent(unsigned long attr) > -{ > - return !(attr & DMA_ATTR_NON_CONSISTENT); > -} > - > /*********************************************/ > /* scatterlist table functions */ > /*********************************************/ > @@ -341,13 +336,6 @@ static int > vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf, > enum dma_data_direction direction) > { > - struct vb2_dc_buf *buf = dbuf->priv; > - struct sg_table *sgt = buf->dma_sgt; > - > - if (vb2_dc_buffer_consistent(buf->attrs)) > - return 0; > - > - dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir); > return 0; > } > > @@ -355,13 +343,6 @@ static int > vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf, > enum dma_data_direction direction) > { > - struct vb2_dc_buf *buf = dbuf->priv; > - struct sg_table *sgt = buf->dma_sgt; > - > - if (vb2_dc_buffer_consistent(buf->attrs)) > - return 0; > - > - dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir); > return 0; > } > > diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c > index 0a40e00f0d7e5c..a86fce5d8ea8bf 100644 > --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c > +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c > @@ -123,8 +123,7 @@ static void *vb2_dma_sg_alloc(struct device *dev, unsigned long dma_attrs, > /* > * NOTE: dma-sg allocates memory using the page allocator directly, so > * there is no memory consistency guarantee, hence dma-sg ignores DMA > - * attributes passed from the upper layer. That means that > - * V4L2_FLAG_MEMORY_NON_CONSISTENT has no effect on dma-sg buffers. > + * attributes passed from the upper layer. > */ > buf->pages = kvmalloc_array(buf->num_pages, sizeof(struct page *), > GFP_KERNEL | __GFP_ZERO); > diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c > index 30caad27281e1a..de83ad48783821 100644 > --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c > +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c > @@ -722,20 +722,11 @@ static void fill_buf_caps(struct vb2_queue *q, u32 *caps) > #endif > } > > -static void clear_consistency_attr(struct vb2_queue *q, > - int memory, > - unsigned int *flags) > -{ > - if (!q->allow_cache_hints || memory != V4L2_MEMORY_MMAP) > - *flags &= ~V4L2_FLAG_MEMORY_NON_CONSISTENT; > -} > - > int vb2_reqbufs(struct vb2_queue *q, struct v4l2_requestbuffers *req) > { > int ret = vb2_verify_memory_type(q, req->memory, req->type); > > fill_buf_caps(q, &req->capabilities); > - clear_consistency_attr(q, req->memory, &req->flags); > return ret ? ret : vb2_core_reqbufs(q, req->memory, > req->flags, &req->count); > } > @@ -769,7 +760,6 @@ int vb2_create_bufs(struct vb2_queue *q, struct v4l2_create_buffers *create) > unsigned i; > > fill_buf_caps(q, &create->capabilities); > - clear_consistency_attr(q, create->memory, &create->flags); > create->index = q->num_buffers; > if (create->count == 0) > return ret != -EBUSY ? ret : 0; > @@ -998,7 +988,6 @@ int vb2_ioctl_reqbufs(struct file *file, void *priv, > int res = vb2_verify_memory_type(vdev->queue, p->memory, p->type); > > fill_buf_caps(vdev->queue, &p->capabilities); > - clear_consistency_attr(vdev->queue, p->memory, &p->flags); > if (res) > return res; > if (vb2_queue_is_busy(vdev, file)) > @@ -1021,7 +1010,6 @@ int vb2_ioctl_create_bufs(struct file *file, void *priv, > > p->index = vdev->queue->num_buffers; > fill_buf_caps(vdev->queue, &p->capabilities); > - clear_consistency_attr(vdev->queue, p->memory, &p->flags); > /* > * If count == 0, then just check if memory and type are valid. > * Any -EBUSY result from vb2_verify_memory_type can be mapped to 0. > diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h > index 52ef92049073e3..4c7f25b07e9375 100644 > --- a/include/media/videobuf2-core.h > +++ b/include/media/videobuf2-core.h > @@ -744,8 +744,7 @@ void vb2_core_querybuf(struct vb2_queue *q, unsigned int index, void *pb); > * vb2_core_reqbufs() - Initiate streaming. > * @q: pointer to &struct vb2_queue with videobuf2 queue. > * @memory: memory type, as defined by &enum vb2_memory. > - * @flags: auxiliary queue/buffer management flags. Currently, the only > - * used flag is %V4L2_FLAG_MEMORY_NON_CONSISTENT. > + * @flags: auxiliary queue/buffer management flags. > * @count: requested buffer count. > * > * Videobuf2 core helper to implement VIDIOC_REQBUF() operation. It is called > diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h > index c7b70ff53bc1dd..5c00f63d9c1b58 100644 > --- a/include/uapi/linux/videodev2.h > +++ b/include/uapi/linux/videodev2.h > @@ -191,8 +191,6 @@ enum v4l2_memory { > V4L2_MEMORY_DMABUF = 4, > }; > > -#define V4L2_FLAG_MEMORY_NON_CONSISTENT (1 << 0) > - > /* see also http://vektor.theorem.ca/graphics/ycbcr/ */ > enum v4l2_colorspace { > /* > -- > 2.28.0 > > _______________________________________________ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu
Hi Tomasz, On 2020-08-19 12:16, Tomasz Figa wrote: > Hi Christoph, > > On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig <hch@lst.de> wrote: >> >> The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused, > > Could you explain what makes you think it's unused? It's a feature of > the UAPI generally supported by the videobuf2 framework and relied on > by Chromium OS to get any kind of reasonable performance when > accessing V4L2 buffers in the userspace. > >> and causes >> weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is >> unimplemented except on PARISC and some MIPS configs, and about to be >> removed. > > It is implemented by the generic DMA mapping layer [1], which is used > by a number of architectures including ARM64 and supposed to be used > by new architectures going forward. AFAICS all that V4L2_FLAG_MEMORY_NON_CONSISTENT does is end up controling whether DMA_ATTR_NON_CONSISTENT is added to vb2_queue::dma_attrs. Please can you point to where DMA_ATTR_NON_CONSISTENT does anything at all on arm64? Also, I posit that videobuf2 is not actually relying on DMA_ATTR_NON_CONSISTENT anyway, since it's clearly not using it properly: "By using this API, you are guaranteeing to the platform that you have all the correct and necessary sync points for this memory in the driver should it choose to return non-consistent memory." $ git grep dma_cache_sync drivers/media $ Robin. > [1] https://elixir.bootlin.com/linux/v5.9-rc1/source/kernel/dma/mapping.c#L341 > > When removing features from generic kernel code, I'd suggest first > providing viable alternatives for its users, rather than killing the > users altogether. > > Given the above, I'm afraid I have to NAK this. > > Best regards, > Tomasz > >> >> Signed-off-by: Christoph Hellwig <hch@lst.de> >> --- >> .../userspace-api/media/v4l/buffer.rst | 17 --------- >> .../media/v4l/vidioc-reqbufs.rst | 1 - >> .../media/common/videobuf2/videobuf2-core.c | 36 +------------------ >> .../common/videobuf2/videobuf2-dma-contig.c | 19 ---------- >> .../media/common/videobuf2/videobuf2-dma-sg.c | 3 +- >> .../media/common/videobuf2/videobuf2-v4l2.c | 12 ------- >> include/media/videobuf2-core.h | 3 +- >> include/uapi/linux/videodev2.h | 2 -- >> 8 files changed, 3 insertions(+), 90 deletions(-) >> >> diff --git a/Documentation/userspace-api/media/v4l/buffer.rst b/Documentation/userspace-api/media/v4l/buffer.rst >> index 57e752aaf414a7..2044ed13cd9d7d 100644 >> --- a/Documentation/userspace-api/media/v4l/buffer.rst >> +++ b/Documentation/userspace-api/media/v4l/buffer.rst >> @@ -701,23 +701,6 @@ Memory Consistency Flags >> :stub-columns: 0 >> :widths: 3 1 4 >> >> - * .. _`V4L2-FLAG-MEMORY-NON-CONSISTENT`: >> - >> - - ``V4L2_FLAG_MEMORY_NON_CONSISTENT`` >> - - 0x00000001 >> - - A buffer is allocated either in consistent (it will be automatically >> - coherent between the CPU and the bus) or non-consistent memory. The >> - latter can provide performance gains, for instance the CPU cache >> - sync/flush operations can be avoided if the buffer is accessed by the >> - corresponding device only and the CPU does not read/write to/from that >> - buffer. However, this requires extra care from the driver -- it must >> - guarantee memory consistency by issuing a cache flush/sync when >> - consistency is needed. If this flag is set V4L2 will attempt to >> - allocate the buffer in non-consistent memory. The flag takes effect >> - only if the buffer is used for :ref:`memory mapping <mmap>` I/O and the >> - queue reports the :ref:`V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS >> - <V4L2-BUF-CAP-SUPPORTS-MMAP-CACHE-HINTS>` capability. >> - >> .. c:type:: v4l2_memory >> >> enum v4l2_memory >> diff --git a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst >> index 75d894d9c36c42..3180c111d368ee 100644 >> --- a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst >> +++ b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst >> @@ -169,7 +169,6 @@ aborting or finishing any DMA in progress, an implicit >> - This capability is set by the driver to indicate that the queue supports >> cache and memory management hints. However, it's only valid when the >> queue is used for :ref:`memory mapping <mmap>` streaming I/O. See >> - :ref:`V4L2_FLAG_MEMORY_NON_CONSISTENT <V4L2-FLAG-MEMORY-NON-CONSISTENT>`, >> :ref:`V4L2_BUF_FLAG_NO_CACHE_INVALIDATE <V4L2-BUF-FLAG-NO-CACHE-INVALIDATE>` and >> :ref:`V4L2_BUF_FLAG_NO_CACHE_CLEAN <V4L2-BUF-FLAG-NO-CACHE-CLEAN>`. >> >> diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c >> index f544d3393e9d6b..66a41cef33c1b1 100644 >> --- a/drivers/media/common/videobuf2/videobuf2-core.c >> +++ b/drivers/media/common/videobuf2/videobuf2-core.c >> @@ -721,39 +721,14 @@ int vb2_verify_memory_type(struct vb2_queue *q, >> } >> EXPORT_SYMBOL(vb2_verify_memory_type); >> >> -static void set_queue_consistency(struct vb2_queue *q, bool consistent_mem) >> -{ >> - q->dma_attrs &= ~DMA_ATTR_NON_CONSISTENT; >> - >> - if (!vb2_queue_allows_cache_hints(q)) >> - return; >> - if (!consistent_mem) >> - q->dma_attrs |= DMA_ATTR_NON_CONSISTENT; >> -} >> - >> -static bool verify_consistency_attr(struct vb2_queue *q, bool consistent_mem) >> -{ >> - bool queue_is_consistent = !(q->dma_attrs & DMA_ATTR_NON_CONSISTENT); >> - >> - if (consistent_mem != queue_is_consistent) { >> - dprintk(q, 1, "memory consistency model mismatch\n"); >> - return false; >> - } >> - return true; >> -} >> - >> int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, >> unsigned int flags, unsigned int *count) >> { >> unsigned int num_buffers, allocated_buffers, num_planes = 0; >> unsigned plane_sizes[VB2_MAX_PLANES] = { }; >> - bool consistent_mem = true; >> unsigned int i; >> int ret; >> >> - if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT) >> - consistent_mem = false; >> - >> if (q->streaming) { >> dprintk(q, 1, "streaming active\n"); >> return -EBUSY; >> @@ -765,8 +740,7 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, >> } >> >> if (*count == 0 || q->num_buffers != 0 || >> - (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory) || >> - !verify_consistency_attr(q, consistent_mem)) { >> + (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory)) { >> /* >> * We already have buffers allocated, so first check if they >> * are not in use and can be freed. >> @@ -803,7 +777,6 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, >> num_buffers = min_t(unsigned int, num_buffers, VB2_MAX_FRAME); >> memset(q->alloc_devs, 0, sizeof(q->alloc_devs)); >> q->memory = memory; >> - set_queue_consistency(q, consistent_mem); >> >> /* >> * Ask the driver how many buffers and planes per buffer it requires. >> @@ -894,12 +867,8 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory, >> { >> unsigned int num_planes = 0, num_buffers, allocated_buffers; >> unsigned plane_sizes[VB2_MAX_PLANES] = { }; >> - bool consistent_mem = true; >> int ret; >> >> - if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT) >> - consistent_mem = false; >> - >> if (q->num_buffers == VB2_MAX_FRAME) { >> dprintk(q, 1, "maximum number of buffers already allocated\n"); >> return -ENOBUFS; >> @@ -912,15 +881,12 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory, >> } >> memset(q->alloc_devs, 0, sizeof(q->alloc_devs)); >> q->memory = memory; >> - set_queue_consistency(q, consistent_mem); >> q->waiting_for_buffers = !q->is_output; >> } else { >> if (q->memory != memory) { >> dprintk(q, 1, "memory model mismatch\n"); >> return -EINVAL; >> } >> - if (!verify_consistency_attr(q, consistent_mem)) >> - return -EINVAL; >> } >> >> num_buffers = min(*count, VB2_MAX_FRAME - q->num_buffers); >> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c >> index ec3446cc45b8da..7b1b86ec942d7d 100644 >> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c >> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c >> @@ -42,11 +42,6 @@ struct vb2_dc_buf { >> struct dma_buf_attachment *db_attach; >> }; >> >> -static inline bool vb2_dc_buffer_consistent(unsigned long attr) >> -{ >> - return !(attr & DMA_ATTR_NON_CONSISTENT); >> -} >> - >> /*********************************************/ >> /* scatterlist table functions */ >> /*********************************************/ >> @@ -341,13 +336,6 @@ static int >> vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf, >> enum dma_data_direction direction) >> { >> - struct vb2_dc_buf *buf = dbuf->priv; >> - struct sg_table *sgt = buf->dma_sgt; >> - >> - if (vb2_dc_buffer_consistent(buf->attrs)) >> - return 0; >> - >> - dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir); >> return 0; >> } >> >> @@ -355,13 +343,6 @@ static int >> vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf, >> enum dma_data_direction direction) >> { >> - struct vb2_dc_buf *buf = dbuf->priv; >> - struct sg_table *sgt = buf->dma_sgt; >> - >> - if (vb2_dc_buffer_consistent(buf->attrs)) >> - return 0; >> - >> - dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir); >> return 0; >> } >> >> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c >> index 0a40e00f0d7e5c..a86fce5d8ea8bf 100644 >> --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c >> +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c >> @@ -123,8 +123,7 @@ static void *vb2_dma_sg_alloc(struct device *dev, unsigned long dma_attrs, >> /* >> * NOTE: dma-sg allocates memory using the page allocator directly, so >> * there is no memory consistency guarantee, hence dma-sg ignores DMA >> - * attributes passed from the upper layer. That means that >> - * V4L2_FLAG_MEMORY_NON_CONSISTENT has no effect on dma-sg buffers. >> + * attributes passed from the upper layer. >> */ >> buf->pages = kvmalloc_array(buf->num_pages, sizeof(struct page *), >> GFP_KERNEL | __GFP_ZERO); >> diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c >> index 30caad27281e1a..de83ad48783821 100644 >> --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c >> +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c >> @@ -722,20 +722,11 @@ static void fill_buf_caps(struct vb2_queue *q, u32 *caps) >> #endif >> } >> >> -static void clear_consistency_attr(struct vb2_queue *q, >> - int memory, >> - unsigned int *flags) >> -{ >> - if (!q->allow_cache_hints || memory != V4L2_MEMORY_MMAP) >> - *flags &= ~V4L2_FLAG_MEMORY_NON_CONSISTENT; >> -} >> - >> int vb2_reqbufs(struct vb2_queue *q, struct v4l2_requestbuffers *req) >> { >> int ret = vb2_verify_memory_type(q, req->memory, req->type); >> >> fill_buf_caps(q, &req->capabilities); >> - clear_consistency_attr(q, req->memory, &req->flags); >> return ret ? ret : vb2_core_reqbufs(q, req->memory, >> req->flags, &req->count); >> } >> @@ -769,7 +760,6 @@ int vb2_create_bufs(struct vb2_queue *q, struct v4l2_create_buffers *create) >> unsigned i; >> >> fill_buf_caps(q, &create->capabilities); >> - clear_consistency_attr(q, create->memory, &create->flags); >> create->index = q->num_buffers; >> if (create->count == 0) >> return ret != -EBUSY ? ret : 0; >> @@ -998,7 +988,6 @@ int vb2_ioctl_reqbufs(struct file *file, void *priv, >> int res = vb2_verify_memory_type(vdev->queue, p->memory, p->type); >> >> fill_buf_caps(vdev->queue, &p->capabilities); >> - clear_consistency_attr(vdev->queue, p->memory, &p->flags); >> if (res) >> return res; >> if (vb2_queue_is_busy(vdev, file)) >> @@ -1021,7 +1010,6 @@ int vb2_ioctl_create_bufs(struct file *file, void *priv, >> >> p->index = vdev->queue->num_buffers; >> fill_buf_caps(vdev->queue, &p->capabilities); >> - clear_consistency_attr(vdev->queue, p->memory, &p->flags); >> /* >> * If count == 0, then just check if memory and type are valid. >> * Any -EBUSY result from vb2_verify_memory_type can be mapped to 0. >> diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h >> index 52ef92049073e3..4c7f25b07e9375 100644 >> --- a/include/media/videobuf2-core.h >> +++ b/include/media/videobuf2-core.h >> @@ -744,8 +744,7 @@ void vb2_core_querybuf(struct vb2_queue *q, unsigned int index, void *pb); >> * vb2_core_reqbufs() - Initiate streaming. >> * @q: pointer to &struct vb2_queue with videobuf2 queue. >> * @memory: memory type, as defined by &enum vb2_memory. >> - * @flags: auxiliary queue/buffer management flags. Currently, the only >> - * used flag is %V4L2_FLAG_MEMORY_NON_CONSISTENT. >> + * @flags: auxiliary queue/buffer management flags. >> * @count: requested buffer count. >> * >> * Videobuf2 core helper to implement VIDIOC_REQBUF() operation. It is called >> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h >> index c7b70ff53bc1dd..5c00f63d9c1b58 100644 >> --- a/include/uapi/linux/videodev2.h >> +++ b/include/uapi/linux/videodev2.h >> @@ -191,8 +191,6 @@ enum v4l2_memory { >> V4L2_MEMORY_DMABUF = 4, >> }; >> >> -#define V4L2_FLAG_MEMORY_NON_CONSISTENT (1 << 0) >> - >> /* see also http://vektor.theorem.ca/graphics/ycbcr/ */ >> enum v4l2_colorspace { >> /* >> -- >> 2.28.0 >> >> _______________________________________________ >> iommu mailing list >> iommu@lists.linux-foundation.org >> https://lists.linuxfoundation.org/mailman/listinfo/iommu > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >
On Wed, Aug 19, 2020 at 1:51 PM Robin Murphy <robin.murphy@arm.com> wrote: > > Hi Tomasz, > > On 2020-08-19 12:16, Tomasz Figa wrote: > > Hi Christoph, > > > > On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig <hch@lst.de> wrote: > >> > >> The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused, > > > > Could you explain what makes you think it's unused? It's a feature of > > the UAPI generally supported by the videobuf2 framework and relied on > > by Chromium OS to get any kind of reasonable performance when > > accessing V4L2 buffers in the userspace. > > > >> and causes > >> weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is > >> unimplemented except on PARISC and some MIPS configs, and about to be > >> removed. > > > > It is implemented by the generic DMA mapping layer [1], which is used > > by a number of architectures including ARM64 and supposed to be used > > by new architectures going forward. > > AFAICS all that V4L2_FLAG_MEMORY_NON_CONSISTENT does is end up > controling whether DMA_ATTR_NON_CONSISTENT is added to vb2_queue::dma_attrs. > > Please can you point to where DMA_ATTR_NON_CONSISTENT does anything at > all on arm64? > With the default config it doesn't, but with CONFIG_DMA_NONCOHERENT_CACHE_SYNC enabled it makes dma_pgprot() keep the pgprot value as is, without enforcing coherence attributes. > Also, I posit that videobuf2 is not actually relying on > DMA_ATTR_NON_CONSISTENT anyway, since it's clearly not using it properly: > > "By using this API, you are guaranteeing to the platform > that you have all the correct and necessary sync points for this memory > in the driver should it choose to return non-consistent memory." > > $ git grep dma_cache_sync drivers/media > $ AFAIK dma_cache_sync() isn't the only way to perform the cache synchronization. The earlier patch series that I reviewed relied on dma_get_sgtable() and then dma_sync_sg_*() (which existed in the vb2-dc since forever [1]). However, it looks like with the final code the sgtable isn't acquired and the synchronization isn't happening, so you have a point. FWIW, I asked back in time what the plan is for non-coherent allocations and it seemed like DMA_ATTR_NON_CONSISTENT and dma_sync_*() was supposed to be the right thing to go with. [2] The same thread also explains why dma_alloc_pages() isn't suitable for the users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT. I think we could make a deal here. We could revert back the parts using DMA_ATTR_NON_CONSISTENT, keeping the UAPI intact, but just rendering it no-op, since it's just a hint after all. Then, you would propose a proper, functionally equivalent and working for ARM64, replacement for dma_alloc_attrs(..., DMA_ATTR_NON_CONSISTENT), which we could then use to enable the functionality expected by this UAPI. Does it sound like something that could work as a way forward here? By the way, as a videobuf2 reviewer, I'd appreciate being CC'd on any series related to the subsystem-facing DMA API changes, since videobuf2 is one of the biggest users of it. [1] https://elixir.bootlin.com/linux/v5.9-rc1/source/drivers/media/common/videobuf2/videobuf2-dma-contig.c#L98 [2] https://patchwork.kernel.org/comment/23312203/ Best regards, Tomasz > > Robin. > > > [1] https://elixir.bootlin.com/linux/v5.9-rc1/source/kernel/dma/mapping.c#L341 > > > > When removing features from generic kernel code, I'd suggest first > > providing viable alternatives for its users, rather than killing the > > users altogether. > > > > Given the above, I'm afraid I have to NAK this. > > > > Best regards, > > Tomasz > > > >> > >> Signed-off-by: Christoph Hellwig <hch@lst.de> > >> --- > >> .../userspace-api/media/v4l/buffer.rst | 17 --------- > >> .../media/v4l/vidioc-reqbufs.rst | 1 - > >> .../media/common/videobuf2/videobuf2-core.c | 36 +------------------ > >> .../common/videobuf2/videobuf2-dma-contig.c | 19 ---------- > >> .../media/common/videobuf2/videobuf2-dma-sg.c | 3 +- > >> .../media/common/videobuf2/videobuf2-v4l2.c | 12 ------- > >> include/media/videobuf2-core.h | 3 +- > >> include/uapi/linux/videodev2.h | 2 -- > >> 8 files changed, 3 insertions(+), 90 deletions(-) > >> > >> diff --git a/Documentation/userspace-api/media/v4l/buffer.rst b/Documentation/userspace-api/media/v4l/buffer.rst > >> index 57e752aaf414a7..2044ed13cd9d7d 100644 > >> --- a/Documentation/userspace-api/media/v4l/buffer.rst > >> +++ b/Documentation/userspace-api/media/v4l/buffer.rst > >> @@ -701,23 +701,6 @@ Memory Consistency Flags > >> :stub-columns: 0 > >> :widths: 3 1 4 > >> > >> - * .. _`V4L2-FLAG-MEMORY-NON-CONSISTENT`: > >> - > >> - - ``V4L2_FLAG_MEMORY_NON_CONSISTENT`` > >> - - 0x00000001 > >> - - A buffer is allocated either in consistent (it will be automatically > >> - coherent between the CPU and the bus) or non-consistent memory. The > >> - latter can provide performance gains, for instance the CPU cache > >> - sync/flush operations can be avoided if the buffer is accessed by the > >> - corresponding device only and the CPU does not read/write to/from that > >> - buffer. However, this requires extra care from the driver -- it must > >> - guarantee memory consistency by issuing a cache flush/sync when > >> - consistency is needed. If this flag is set V4L2 will attempt to > >> - allocate the buffer in non-consistent memory. The flag takes effect > >> - only if the buffer is used for :ref:`memory mapping <mmap>` I/O and the > >> - queue reports the :ref:`V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS > >> - <V4L2-BUF-CAP-SUPPORTS-MMAP-CACHE-HINTS>` capability. > >> - > >> .. c:type:: v4l2_memory > >> > >> enum v4l2_memory > >> diff --git a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst > >> index 75d894d9c36c42..3180c111d368ee 100644 > >> --- a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst > >> +++ b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst > >> @@ -169,7 +169,6 @@ aborting or finishing any DMA in progress, an implicit > >> - This capability is set by the driver to indicate that the queue supports > >> cache and memory management hints. However, it's only valid when the > >> queue is used for :ref:`memory mapping <mmap>` streaming I/O. See > >> - :ref:`V4L2_FLAG_MEMORY_NON_CONSISTENT <V4L2-FLAG-MEMORY-NON-CONSISTENT>`, > >> :ref:`V4L2_BUF_FLAG_NO_CACHE_INVALIDATE <V4L2-BUF-FLAG-NO-CACHE-INVALIDATE>` and > >> :ref:`V4L2_BUF_FLAG_NO_CACHE_CLEAN <V4L2-BUF-FLAG-NO-CACHE-CLEAN>`. > >> > >> diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c > >> index f544d3393e9d6b..66a41cef33c1b1 100644 > >> --- a/drivers/media/common/videobuf2/videobuf2-core.c > >> +++ b/drivers/media/common/videobuf2/videobuf2-core.c > >> @@ -721,39 +721,14 @@ int vb2_verify_memory_type(struct vb2_queue *q, > >> } > >> EXPORT_SYMBOL(vb2_verify_memory_type); > >> > >> -static void set_queue_consistency(struct vb2_queue *q, bool consistent_mem) > >> -{ > >> - q->dma_attrs &= ~DMA_ATTR_NON_CONSISTENT; > >> - > >> - if (!vb2_queue_allows_cache_hints(q)) > >> - return; > >> - if (!consistent_mem) > >> - q->dma_attrs |= DMA_ATTR_NON_CONSISTENT; > >> -} > >> - > >> -static bool verify_consistency_attr(struct vb2_queue *q, bool consistent_mem) > >> -{ > >> - bool queue_is_consistent = !(q->dma_attrs & DMA_ATTR_NON_CONSISTENT); > >> - > >> - if (consistent_mem != queue_is_consistent) { > >> - dprintk(q, 1, "memory consistency model mismatch\n"); > >> - return false; > >> - } > >> - return true; > >> -} > >> - > >> int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, > >> unsigned int flags, unsigned int *count) > >> { > >> unsigned int num_buffers, allocated_buffers, num_planes = 0; > >> unsigned plane_sizes[VB2_MAX_PLANES] = { }; > >> - bool consistent_mem = true; > >> unsigned int i; > >> int ret; > >> > >> - if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT) > >> - consistent_mem = false; > >> - > >> if (q->streaming) { > >> dprintk(q, 1, "streaming active\n"); > >> return -EBUSY; > >> @@ -765,8 +740,7 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, > >> } > >> > >> if (*count == 0 || q->num_buffers != 0 || > >> - (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory) || > >> - !verify_consistency_attr(q, consistent_mem)) { > >> + (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory)) { > >> /* > >> * We already have buffers allocated, so first check if they > >> * are not in use and can be freed. > >> @@ -803,7 +777,6 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, > >> num_buffers = min_t(unsigned int, num_buffers, VB2_MAX_FRAME); > >> memset(q->alloc_devs, 0, sizeof(q->alloc_devs)); > >> q->memory = memory; > >> - set_queue_consistency(q, consistent_mem); > >> > >> /* > >> * Ask the driver how many buffers and planes per buffer it requires. > >> @@ -894,12 +867,8 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory, > >> { > >> unsigned int num_planes = 0, num_buffers, allocated_buffers; > >> unsigned plane_sizes[VB2_MAX_PLANES] = { }; > >> - bool consistent_mem = true; > >> int ret; > >> > >> - if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT) > >> - consistent_mem = false; > >> - > >> if (q->num_buffers == VB2_MAX_FRAME) { > >> dprintk(q, 1, "maximum number of buffers already allocated\n"); > >> return -ENOBUFS; > >> @@ -912,15 +881,12 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory, > >> } > >> memset(q->alloc_devs, 0, sizeof(q->alloc_devs)); > >> q->memory = memory; > >> - set_queue_consistency(q, consistent_mem); > >> q->waiting_for_buffers = !q->is_output; > >> } else { > >> if (q->memory != memory) { > >> dprintk(q, 1, "memory model mismatch\n"); > >> return -EINVAL; > >> } > >> - if (!verify_consistency_attr(q, consistent_mem)) > >> - return -EINVAL; > >> } > >> > >> num_buffers = min(*count, VB2_MAX_FRAME - q->num_buffers); > >> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c > >> index ec3446cc45b8da..7b1b86ec942d7d 100644 > >> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c > >> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c > >> @@ -42,11 +42,6 @@ struct vb2_dc_buf { > >> struct dma_buf_attachment *db_attach; > >> }; > >> > >> -static inline bool vb2_dc_buffer_consistent(unsigned long attr) > >> -{ > >> - return !(attr & DMA_ATTR_NON_CONSISTENT); > >> -} > >> - > >> /*********************************************/ > >> /* scatterlist table functions */ > >> /*********************************************/ > >> @@ -341,13 +336,6 @@ static int > >> vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf, > >> enum dma_data_direction direction) > >> { > >> - struct vb2_dc_buf *buf = dbuf->priv; > >> - struct sg_table *sgt = buf->dma_sgt; > >> - > >> - if (vb2_dc_buffer_consistent(buf->attrs)) > >> - return 0; > >> - > >> - dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir); > >> return 0; > >> } > >> > >> @@ -355,13 +343,6 @@ static int > >> vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf, > >> enum dma_data_direction direction) > >> { > >> - struct vb2_dc_buf *buf = dbuf->priv; > >> - struct sg_table *sgt = buf->dma_sgt; > >> - > >> - if (vb2_dc_buffer_consistent(buf->attrs)) > >> - return 0; > >> - > >> - dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir); > >> return 0; > >> } > >> > >> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c > >> index 0a40e00f0d7e5c..a86fce5d8ea8bf 100644 > >> --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c > >> +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c > >> @@ -123,8 +123,7 @@ static void *vb2_dma_sg_alloc(struct device *dev, unsigned long dma_attrs, > >> /* > >> * NOTE: dma-sg allocates memory using the page allocator directly, so > >> * there is no memory consistency guarantee, hence dma-sg ignores DMA > >> - * attributes passed from the upper layer. That means that > >> - * V4L2_FLAG_MEMORY_NON_CONSISTENT has no effect on dma-sg buffers. > >> + * attributes passed from the upper layer. > >> */ > >> buf->pages = kvmalloc_array(buf->num_pages, sizeof(struct page *), > >> GFP_KERNEL | __GFP_ZERO); > >> diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c > >> index 30caad27281e1a..de83ad48783821 100644 > >> --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c > >> +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c > >> @@ -722,20 +722,11 @@ static void fill_buf_caps(struct vb2_queue *q, u32 *caps) > >> #endif > >> } > >> > >> -static void clear_consistency_attr(struct vb2_queue *q, > >> - int memory, > >> - unsigned int *flags) > >> -{ > >> - if (!q->allow_cache_hints || memory != V4L2_MEMORY_MMAP) > >> - *flags &= ~V4L2_FLAG_MEMORY_NON_CONSISTENT; > >> -} > >> - > >> int vb2_reqbufs(struct vb2_queue *q, struct v4l2_requestbuffers *req) > >> { > >> int ret = vb2_verify_memory_type(q, req->memory, req->type); > >> > >> fill_buf_caps(q, &req->capabilities); > >> - clear_consistency_attr(q, req->memory, &req->flags); > >> return ret ? ret : vb2_core_reqbufs(q, req->memory, > >> req->flags, &req->count); > >> } > >> @@ -769,7 +760,6 @@ int vb2_create_bufs(struct vb2_queue *q, struct v4l2_create_buffers *create) > >> unsigned i; > >> > >> fill_buf_caps(q, &create->capabilities); > >> - clear_consistency_attr(q, create->memory, &create->flags); > >> create->index = q->num_buffers; > >> if (create->count == 0) > >> return ret != -EBUSY ? ret : 0; > >> @@ -998,7 +988,6 @@ int vb2_ioctl_reqbufs(struct file *file, void *priv, > >> int res = vb2_verify_memory_type(vdev->queue, p->memory, p->type); > >> > >> fill_buf_caps(vdev->queue, &p->capabilities); > >> - clear_consistency_attr(vdev->queue, p->memory, &p->flags); > >> if (res) > >> return res; > >> if (vb2_queue_is_busy(vdev, file)) > >> @@ -1021,7 +1010,6 @@ int vb2_ioctl_create_bufs(struct file *file, void *priv, > >> > >> p->index = vdev->queue->num_buffers; > >> fill_buf_caps(vdev->queue, &p->capabilities); > >> - clear_consistency_attr(vdev->queue, p->memory, &p->flags); > >> /* > >> * If count == 0, then just check if memory and type are valid. > >> * Any -EBUSY result from vb2_verify_memory_type can be mapped to 0. > >> diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h > >> index 52ef92049073e3..4c7f25b07e9375 100644 > >> --- a/include/media/videobuf2-core.h > >> +++ b/include/media/videobuf2-core.h > >> @@ -744,8 +744,7 @@ void vb2_core_querybuf(struct vb2_queue *q, unsigned int index, void *pb); > >> * vb2_core_reqbufs() - Initiate streaming. > >> * @q: pointer to &struct vb2_queue with videobuf2 queue. > >> * @memory: memory type, as defined by &enum vb2_memory. > >> - * @flags: auxiliary queue/buffer management flags. Currently, the only > >> - * used flag is %V4L2_FLAG_MEMORY_NON_CONSISTENT. > >> + * @flags: auxiliary queue/buffer management flags. > >> * @count: requested buffer count. > >> * > >> * Videobuf2 core helper to implement VIDIOC_REQBUF() operation. It is called > >> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h > >> index c7b70ff53bc1dd..5c00f63d9c1b58 100644 > >> --- a/include/uapi/linux/videodev2.h > >> +++ b/include/uapi/linux/videodev2.h > >> @@ -191,8 +191,6 @@ enum v4l2_memory { > >> V4L2_MEMORY_DMABUF = 4, > >> }; > >> > >> -#define V4L2_FLAG_MEMORY_NON_CONSISTENT (1 << 0) > >> - > >> /* see also http://vektor.theorem.ca/graphics/ycbcr/ */ > >> enum v4l2_colorspace { > >> /* > >> -- > >> 2.28.0 > >> > >> _______________________________________________ > >> iommu mailing list > >> iommu@lists.linux-foundation.org > >> https://lists.linuxfoundation.org/mailman/listinfo/iommu > > > > _______________________________________________ > > linux-arm-kernel mailing list > > linux-arm-kernel@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > >
On Wed, Aug 19, 2020 at 01:16:51PM +0200, Tomasz Figa wrote: > Hi Christoph, > > On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig <hch@lst.de> wrote: > > > > The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused, > > Could you explain what makes you think it's unused? It's a feature of > the UAPI generally supported by the videobuf2 framework and relied on > by Chromium OS to get any kind of reasonable performance when > accessing V4L2 buffers in the userspace. Because it doesn't do anything except on PARISC and non-coherent MIPS, so by definition it isn't used by any of these media drivers.
On Wed, Aug 19, 2020 at 02:49:01PM +0200, Tomasz Figa wrote: > With the default config it doesn't, but with > CONFIG_DMA_NONCOHERENT_CACHE_SYNC enabled it makes dma_pgprot() keep > the pgprot value as is, without enforcing coherence attributes. Which isn't selected on arm64, and that is for a good reason. > AFAIK dma_cache_sync() isn't the only way to perform the cache > synchronization. Yes, it is the only documented way to do it. And if you read the whole series instead of screaming you'd see that it provides a proper way to deal with non-coherent memory which will also work with arm64. instead of screaming > By the way, as a videobuf2 reviewer, I'd appreciate being CC'd on any > series related to the subsystem-facing DMA API changes, since > videobuf2 is one of the biggest users of it. The cc list is too long - I cc lists and key maintainers. As a reviewer should should watch your subsystems lists closely.
On Wed, Aug 19, 2020 at 3:55 PM Christoph Hellwig <hch@lst.de> wrote: > > On Wed, Aug 19, 2020 at 01:16:51PM +0200, Tomasz Figa wrote: > > Hi Christoph, > > > > On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig <hch@lst.de> wrote: > > > > > > The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused, > > > > Could you explain what makes you think it's unused? It's a feature of > > the UAPI generally supported by the videobuf2 framework and relied on > > by Chromium OS to get any kind of reasonable performance when > > accessing V4L2 buffers in the userspace. > > Because it doesn't do anything except on PARISC and non-coherent MIPS, > so by definition it isn't used by any of these media drivers. It's still an UAPI feature, so we can't simply remove the flag, it must stay there as a no-op, until the problem is resolved. Also, it of course might be disputable as an out-of-tree usage, but selecting CONFIG_DMA_NONCOHERENT_CACHE_SYNC makes the flag actually do something on other platforms, including ARM64. Best regards, Tomasz
On 2020-08-19 13:49, Tomasz Figa wrote: > On Wed, Aug 19, 2020 at 1:51 PM Robin Murphy <robin.murphy@arm.com> wrote: >> >> Hi Tomasz, >> >> On 2020-08-19 12:16, Tomasz Figa wrote: >>> Hi Christoph, >>> >>> On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig <hch@lst.de> wrote: >>>> >>>> The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused, >>> >>> Could you explain what makes you think it's unused? It's a feature of >>> the UAPI generally supported by the videobuf2 framework and relied on >>> by Chromium OS to get any kind of reasonable performance when >>> accessing V4L2 buffers in the userspace. >>> >>>> and causes >>>> weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is >>>> unimplemented except on PARISC and some MIPS configs, and about to be >>>> removed. >>> >>> It is implemented by the generic DMA mapping layer [1], which is used >>> by a number of architectures including ARM64 and supposed to be used >>> by new architectures going forward. >> >> AFAICS all that V4L2_FLAG_MEMORY_NON_CONSISTENT does is end up >> controling whether DMA_ATTR_NON_CONSISTENT is added to vb2_queue::dma_attrs. >> >> Please can you point to where DMA_ATTR_NON_CONSISTENT does anything at >> all on arm64? >> > > With the default config it doesn't, but with > CONFIG_DMA_NONCOHERENT_CACHE_SYNC enabled it makes dma_pgprot() keep > the pgprot value as is, without enforcing coherence attributes. How active are the PA-RISC and MIPS ports of Chromium OS? Hacking CONFIG_DMA_NONCOHERENT_CACHE_SYNC into an architecture that doesn't provide dma_cache_sync() is wrong, since at worst it may break other drivers. If downstream is wildly misusing an API then so be it, but it's hardly a strong basis for an upstream argument. >> Also, I posit that videobuf2 is not actually relying on >> DMA_ATTR_NON_CONSISTENT anyway, since it's clearly not using it properly: >> >> "By using this API, you are guaranteeing to the platform >> that you have all the correct and necessary sync points for this memory >> in the driver should it choose to return non-consistent memory." >> >> $ git grep dma_cache_sync drivers/media >> $ > > AFAIK dma_cache_sync() isn't the only way to perform the cache > synchronization. The earlier patch series that I reviewed relied on > dma_get_sgtable() and then dma_sync_sg_*() (which existed in the > vb2-dc since forever [1]). However, it looks like with the final code > the sgtable isn't acquired and the synchronization isn't happening, so > you have a point. Using the streaming sync calls on coherent allocations has also always been wrong per the API, regardless of the bodies of code that have happened to get away with it for so long. > FWIW, I asked back in time what the plan is for non-coherent > allocations and it seemed like DMA_ATTR_NON_CONSISTENT and > dma_sync_*() was supposed to be the right thing to go with. [2] The > same thread also explains why dma_alloc_pages() isn't suitable for the > users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT. AFAICS even back then Christoph was implying getting rid of NON_CONSISTENT and *replacing* it with something streaming-API-based - i.e. this series - not encouraging mixing the existing APIs. It doesn't seem impossible to implement a remapping version of this new dma_alloc_pages() for IOMMU-backed ops if it's really warranted (although at that point it seems like "non-coherent" vb2-dc starts to have significant conceptual overlap with vb2-sg). Robin.
On Wed, Aug 19, 2020 at 3:57 PM Christoph Hellwig <hch@lst.de> wrote: > > On Wed, Aug 19, 2020 at 02:49:01PM +0200, Tomasz Figa wrote: > > With the default config it doesn't, but with > > CONFIG_DMA_NONCOHERENT_CACHE_SYNC enabled it makes dma_pgprot() keep > > the pgprot value as is, without enforcing coherence attributes. > > Which isn't selected on arm64, and that is for a good reason. > > > AFAIK dma_cache_sync() isn't the only way to perform the cache > > synchronization. > > Yes, it is the only documented way to do it. And if you read the whole > series instead of screaming you'd see that it provides a proper way > to deal with non-coherent memory which will also work with arm64. > instead of screaming > I'm sorry if I have offended you in any way, but would also appreciate it if a less aggressive tone was directed towards me as well. I have valid reasons to object to this patch, as stated in my previous emails. The fact that the original feature has problems is of course another story and, as I mentioned too, I'm willing to look into fixing them. I'm of course happy to review the rest of the series and even more happy to help migrating this code to whatever is added there, as long as the functionality is preserved. > > By the way, as a videobuf2 reviewer, I'd appreciate being CC'd on any > > series related to the subsystem-facing DMA API changes, since > > videobuf2 is one of the biggest users of it. > > The cc list is too long - I cc lists and key maintainers. As a reviewer > should should watch your subsystems lists closely. Well, I guess we can disagree on this, because there is no clear policy. I'm listed in the MAINTAINERS file for the subsystem and I believe the purpose of the file is to list the people to CC on relevant patches. We're all overloaded with work and having to look through the huge volume of mailing lists like linux-media doesn't help and thus I'd still appreciate being added on CC. Best regards, Tomasz
On Wed, Aug 19, 2020 at 4:07 PM Robin Murphy <robin.murphy@arm.com> wrote: > > On 2020-08-19 13:49, Tomasz Figa wrote: > > On Wed, Aug 19, 2020 at 1:51 PM Robin Murphy <robin.murphy@arm.com> wrote: > >> > >> Hi Tomasz, > >> > >> On 2020-08-19 12:16, Tomasz Figa wrote: > >>> Hi Christoph, > >>> > >>> On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig <hch@lst.de> wrote: > >>>> > >>>> The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused, > >>> > >>> Could you explain what makes you think it's unused? It's a feature of > >>> the UAPI generally supported by the videobuf2 framework and relied on > >>> by Chromium OS to get any kind of reasonable performance when > >>> accessing V4L2 buffers in the userspace. > >>> > >>>> and causes > >>>> weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is > >>>> unimplemented except on PARISC and some MIPS configs, and about to be > >>>> removed. > >>> > >>> It is implemented by the generic DMA mapping layer [1], which is used > >>> by a number of architectures including ARM64 and supposed to be used > >>> by new architectures going forward. > >> > >> AFAICS all that V4L2_FLAG_MEMORY_NON_CONSISTENT does is end up > >> controling whether DMA_ATTR_NON_CONSISTENT is added to vb2_queue::dma_attrs. > >> > >> Please can you point to where DMA_ATTR_NON_CONSISTENT does anything at > >> all on arm64? > >> > > > > With the default config it doesn't, but with > > CONFIG_DMA_NONCOHERENT_CACHE_SYNC enabled it makes dma_pgprot() keep > > the pgprot value as is, without enforcing coherence attributes. > > How active are the PA-RISC and MIPS ports of Chromium OS? Not active. We enable CONFIG_DMA_NONCOHERENT_CACHE_SYNC for ARM64, given the directions received back in April when discussing the noncoherent memory functionality on the mailing list in the thread I pointed out in my previous message and no clarification on why it is disabled for ARM64 in upstream, despite making several attempts to get some. > > Hacking CONFIG_DMA_NONCOHERENT_CACHE_SYNC into an architecture that > doesn't provide dma_cache_sync() is wrong, since at worst it may break > other drivers. If downstream is wildly misusing an API then so be it, > but it's hardly a strong basis for an upstream argument. I guess it means that we're wildly misusing the API, but it still does work. Could you explain how it could break other drivers? > > >> Also, I posit that videobuf2 is not actually relying on > >> DMA_ATTR_NON_CONSISTENT anyway, since it's clearly not using it properly: > >> > >> "By using this API, you are guaranteeing to the platform > >> that you have all the correct and necessary sync points for this memory > >> in the driver should it choose to return non-consistent memory." > >> > >> $ git grep dma_cache_sync drivers/media > >> $ > > > > AFAIK dma_cache_sync() isn't the only way to perform the cache > > synchronization. The earlier patch series that I reviewed relied on > > dma_get_sgtable() and then dma_sync_sg_*() (which existed in the > > vb2-dc since forever [1]). However, it looks like with the final code > > the sgtable isn't acquired and the synchronization isn't happening, so > > you have a point. > > Using the streaming sync calls on coherent allocations has also always > been wrong per the API, regardless of the bodies of code that have > happened to get away with it for so long. > > > FWIW, I asked back in time what the plan is for non-coherent > > allocations and it seemed like DMA_ATTR_NON_CONSISTENT and > > dma_sync_*() was supposed to be the right thing to go with. [2] The > > same thread also explains why dma_alloc_pages() isn't suitable for the > > users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT. > > AFAICS even back then Christoph was implying getting rid of > NON_CONSISTENT and *replacing* it with something streaming-API-based - That's not how I read his reply from the thread I pointed to, but that might of course be my misunderstanding. > i.e. this series - not encouraging mixing the existing APIs. It doesn't > seem impossible to implement a remapping version of this new > dma_alloc_pages() for IOMMU-backed ops if it's really warranted > (although at that point it seems like "non-coherent" vb2-dc starts to > have significant conceptual overlap with vb2-sg). No, there is no overlap between vb2-dc and vb2-sg. They differ on another level - the former is to be used by devices without scatter-gather or internal mapping capabilities and gives the driver a single DMA address for the whole buffer, regardless of whether it's IOVA-contiguous (for devices behind an IOMMU) or physically contiguous (for the others), while the latter gives the driver an sgtable, which of course may be DMA-contiguous internally, but doesn't have to and usually isn't. This model makes it possible to hide the SoC implementation details from particular drivers, since those are very often reused on many SoCs which differ in the availability of IOMMU, DMA addressing restrictions and so on. Best regards, Tomasz
On Wed, Aug 19, 2020 at 03:57:53PM +0200, Tomasz Figa wrote: > > > Could you explain what makes you think it's unused? It's a feature of > > > the UAPI generally supported by the videobuf2 framework and relied on > > > by Chromium OS to get any kind of reasonable performance when > > > accessing V4L2 buffers in the userspace. > > > > Because it doesn't do anything except on PARISC and non-coherent MIPS, > > so by definition it isn't used by any of these media drivers. > > It's still an UAPI feature, so we can't simply remove the flag, it > must stay there as a no-op, until the problem is resolved. Ok, I'll switch to just ignoring it for the next version. > Also, it of course might be disputable as an out-of-tree usage, but > selecting CONFIG_DMA_NONCOHERENT_CACHE_SYNC makes the flag actually do > something on other platforms, including ARM64. It isn't just disputable, but by kernel policies simply is not relevant.
On Wed, Aug 19, 2020 at 04:11:52PM +0200, Tomasz Figa wrote: > > > By the way, as a videobuf2 reviewer, I'd appreciate being CC'd on any > > > series related to the subsystem-facing DMA API changes, since > > > videobuf2 is one of the biggest users of it. > > > > The cc list is too long - I cc lists and key maintainers. As a reviewer > > should should watch your subsystems lists closely. > > Well, I guess we can disagree on this, because there is no clear > policy. I'm listed in the MAINTAINERS file for the subsystem and I > believe the purpose of the file is to list the people to CC on > relevant patches. We're all overloaded with work and having to look > through the huge volume of mailing lists like linux-media doesn't help > and thus I'd still appreciate being added on CC. I'm happy to Cc and active participant in the discussion. I'm not going to add all reviewers because even with the trimmed CC list I'm already hitting the number of receipients limit on various lists.
On Wed, Aug 19, 2020 at 04:22:29PM +0200, Tomasz Figa wrote: > > > FWIW, I asked back in time what the plan is for non-coherent > > > allocations and it seemed like DMA_ATTR_NON_CONSISTENT and > > > dma_sync_*() was supposed to be the right thing to go with. [2] The > > > same thread also explains why dma_alloc_pages() isn't suitable for the > > > users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT. > > > > AFAICS even back then Christoph was implying getting rid of > > NON_CONSISTENT and *replacing* it with something streaming-API-based - > > That's not how I read his reply from the thread I pointed to, but that > might of course be my misunderstanding. Yes. Without changes like in this series just calling dma_sync_single_* will break in various cases, e.g. because dma_alloc_attrs returns memory remapped in the vmalloc space, and the dma_sync_single_* implementation implementation can't cope with vmalloc addresses.
On Wed, Aug 19, 2020 at 03:07:04PM +0100, Robin Murphy wrote: >> FWIW, I asked back in time what the plan is for non-coherent >> allocations and it seemed like DMA_ATTR_NON_CONSISTENT and >> dma_sync_*() was supposed to be the right thing to go with. [2] The >> same thread also explains why dma_alloc_pages() isn't suitable for the >> users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT. > > AFAICS even back then Christoph was implying getting rid of NON_CONSISTENT > and *replacing* it with something streaming-API-based - i.e. this series - > not encouraging mixing the existing APIs. It doesn't seem impossible to > implement a remapping version of this new dma_alloc_pages() for > IOMMU-backed ops if it's really warranted (although at that point it seems > like "non-coherent" vb2-dc starts to have significant conceptual overlap > with vb2-sg). You can alway vmap the returned pages from dma_alloc_pages, but it will make cache invalidation hell - you'll need to use invalidate_kernel_vmap_range and flush_kernel_vmap_range to properly handle virtually indexed caches. Or with remapping you mean using the iommu do de-scatter/gather? You can implement that trivially implement it yourself for the iommu case: { merge_boundary = dma_get_merge_boundary(dev); if (!merge_boundary || merge_boundary > chunk_size - 1) { /* can't coalesce */ return -EINVAL; } nents = DIV_ROUND_UP(total_size, chunk_size); sg = sgl_alloc(); for_each_sgl() { sg->page = __alloc_pages(get_order(chunk_size)) sg->len = chunk_size; } dma_map_sg(sg, DMA_ATTR_SKIP_CPU_SYNC); // you are guaranteed to get a single dma_addr out } Of course this still uses the scatterlist structure with its annoying mix of input and output parametes, so I'd rather not expose it as an official API at the DMA layer.
On Thu, Aug 20, 2020 at 06:43:47AM +0200, Christoph Hellwig wrote: > On Wed, Aug 19, 2020 at 03:57:53PM +0200, Tomasz Figa wrote: > > > > Could you explain what makes you think it's unused? It's a feature of > > > > the UAPI generally supported by the videobuf2 framework and relied on > > > > by Chromium OS to get any kind of reasonable performance when > > > > accessing V4L2 buffers in the userspace. > > > > > > Because it doesn't do anything except on PARISC and non-coherent MIPS, > > > so by definition it isn't used by any of these media drivers. > > > > It's still an UAPI feature, so we can't simply remove the flag, it > > must stay there as a no-op, until the problem is resolved. > > Ok, I'll switch to just ignoring it for the next version. So I took a deeper look. I don't really think it qualifies as a UAPI in our traditional sense. For one it only appeared in 5.9-rc1, so we can trivially expedite the patch into 5.9-rc and not actually make it show up in any released kernel version. And even as of the current Linus' tree the only user is a test driver. So I really think the best way to go ahead is to just revert it ASAP as the design wasn't thought out at all.
On Thu, Aug 20, 2020 at 7:20 AM Christoph Hellwig <hch@lst.de> wrote: > > On Thu, Aug 20, 2020 at 06:43:47AM +0200, Christoph Hellwig wrote: > > On Wed, Aug 19, 2020 at 03:57:53PM +0200, Tomasz Figa wrote: > > > > > Could you explain what makes you think it's unused? It's a feature of > > > > > the UAPI generally supported by the videobuf2 framework and relied on > > > > > by Chromium OS to get any kind of reasonable performance when > > > > > accessing V4L2 buffers in the userspace. > > > > > > > > Because it doesn't do anything except on PARISC and non-coherent MIPS, > > > > so by definition it isn't used by any of these media drivers. > > > > > > It's still an UAPI feature, so we can't simply remove the flag, it > > > must stay there as a no-op, until the problem is resolved. > > > > Ok, I'll switch to just ignoring it for the next version. > > So I took a deeper look. I don't really think it qualifies as a UAPI > in our traditional sense. For one it only appeared in 5.9-rc1, so we > can trivially expedite the patch into 5.9-rc and not actually make it > show up in any released kernel version. And even as of the current > Linus' tree the only user is a test driver. So I really think the best > way to go ahead is to just revert it ASAP as the design wasn't thought > out at all. The UAPI and V4L2/videobuf2 changes are in good shape and the only wrong part is the use of DMA API, which was based on an earlier email guidance anyway, and a change to the synchronization part . I find conclusions like the above insulting for people who put many hours into designing and implementing the related functionality, given the complexity of the videobuf2 framework and how ill-defined the DMA API was, and would feel better if such could be avoided in future communication. That said, we can revert it on the basis of the implementation issues, but I feel like we wouldn't get anything by doing so, because as I said, the design is sane and most of the implementation is fine as well. Instead. I'd suggest simply removing the use of the attribute being removed, so that the feature stays no-op until the DMA API provides a way to implement it or we just migrate videobuf2 to stop using the DMA API as much as possible, like many drivers in the DRM subsystem did. Best regards, Tomasz
On Thu, Aug 20, 2020 at 6:45 AM Christoph Hellwig <hch@lst.de> wrote: > > On Wed, Aug 19, 2020 at 04:11:52PM +0200, Tomasz Figa wrote: > > > > By the way, as a videobuf2 reviewer, I'd appreciate being CC'd on any > > > > series related to the subsystem-facing DMA API changes, since > > > > videobuf2 is one of the biggest users of it. > > > > > > The cc list is too long - I cc lists and key maintainers. As a reviewer > > > should should watch your subsystems lists closely. > > > > Well, I guess we can disagree on this, because there is no clear > > policy. I'm listed in the MAINTAINERS file for the subsystem and I > > believe the purpose of the file is to list the people to CC on > > relevant patches. We're all overloaded with work and having to look > > through the huge volume of mailing lists like linux-media doesn't help > > and thus I'd still appreciate being added on CC. > > I'm happy to Cc and active participant in the discussion. I'm not > going to add all reviewers because even with the trimmed CC list > I'm already hitting the number of receipients limit on various lists. Fair enough. We'll make your job easier and just turn my MAINTAINERS entry into a maintainer. :) Best regards, Tomasz
On Thu, Aug 20, 2020 at 7:02 AM Christoph Hellwig <hch@lst.de> wrote: > > On Wed, Aug 19, 2020 at 03:07:04PM +0100, Robin Murphy wrote: > >> FWIW, I asked back in time what the plan is for non-coherent > >> allocations and it seemed like DMA_ATTR_NON_CONSISTENT and > >> dma_sync_*() was supposed to be the right thing to go with. [2] The > >> same thread also explains why dma_alloc_pages() isn't suitable for the > >> users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT. > > > > AFAICS even back then Christoph was implying getting rid of NON_CONSISTENT > > and *replacing* it with something streaming-API-based - i.e. this series - > > not encouraging mixing the existing APIs. It doesn't seem impossible to > > implement a remapping version of this new dma_alloc_pages() for > > IOMMU-backed ops if it's really warranted (although at that point it seems > > like "non-coherent" vb2-dc starts to have significant conceptual overlap > > with vb2-sg). > > You can alway vmap the returned pages from dma_alloc_pages, but it will > make cache invalidation hell - you'll need to use > invalidate_kernel_vmap_range and flush_kernel_vmap_range to properly > handle virtually indexed caches. > > Or with remapping you mean using the iommu do de-scatter/gather? Ideally, both. For remapping in the CPU sense, there are drivers which rely on a contiguous kernel mapping of the vb2 buffers, which was provided by dma_alloc_attrs(). I think they could be reworked to work on single pages, but that would significantly complicate the code. At the same time, such drivers would actually benefit from a cached mapping, because they often have non-bursty, random access patterns. Then, in the IOMMU sense, the whole idea of videobuf2-dma-contig is to rely on the DMA API to always provide device-contiguous memory, as required by the hardware which only has a single pointer and size. > > You can implement that trivially implement it yourself for the iommu > case: > > { > merge_boundary = dma_get_merge_boundary(dev); > if (!merge_boundary || merge_boundary > chunk_size - 1) { > /* can't coalesce */ > return -EINVAL; > } > > > nents = DIV_ROUND_UP(total_size, chunk_size); > sg = sgl_alloc(); > for_each_sgl() { > sg->page = __alloc_pages(get_order(chunk_size)) > sg->len = chunk_size; > } > dma_map_sg(sg, DMA_ATTR_SKIP_CPU_SYNC); > // you are guaranteed to get a single dma_addr out > } > > Of course this still uses the scatterlist structure with its annoying > mix of input and output parametes, so I'd rather not expose it as > an official API at the DMA layer. The problem with the above open coded approach is that it requires explicit handling of the non-IOMMU and IOMMU cases and this is exactly what we don't want to have in vb2 and what was actually the job of the DMA API to hide. Is the plan to actually move the IOMMU handling out of the DMA API? Do you think we could instead turn it into a dma_alloc_noncoherent() helper, which has similar semantics as dma_alloc_attrs() and handles the various corner cases (e.g. invalidate_kernel_vmap_range and flush_kernel_vmap_range) to achieve the desired functionality without delegating the "hell", as you called it, to the users? Best regards, Tomasz
On Thu, Aug 20, 2020 at 12:09:34PM +0200, Tomasz Figa wrote: > > I'm happy to Cc and active participant in the discussion. I'm not > > going to add all reviewers because even with the trimmed CC list > > I'm already hitting the number of receipients limit on various lists. > > Fair enough. > > We'll make your job easier and just turn my MAINTAINERS entry into a > maintainer. :) Sounds like a plan.
On Thu, Aug 20, 2020 at 12:24:31PM +0200, Tomasz Figa wrote: > > Of course this still uses the scatterlist structure with its annoying > > mix of input and output parametes, so I'd rather not expose it as > > an official API at the DMA layer. > > The problem with the above open coded approach is that it requires > explicit handling of the non-IOMMU and IOMMU cases and this is exactly > what we don't want to have in vb2 and what was actually the job of the > DMA API to hide. Is the plan to actually move the IOMMU handling out > of the DMA API? > > Do you think we could instead turn it into a dma_alloc_noncoherent() > helper, which has similar semantics as dma_alloc_attrs() and handles > the various corner cases (e.g. invalidate_kernel_vmap_range and > flush_kernel_vmap_range) to achieve the desired functionality without > delegating the "hell", as you called it, to the users? Yes, I guess I could do something in that direction. At least for dma-iommu, which thanks to Robin should be all you'll need in the foreseeable future.
On Thu, Aug 20, 2020 at 12:05:29PM +0200, Tomasz Figa wrote: > The UAPI and V4L2/videobuf2 changes are in good shape and the only > wrong part is the use of DMA API, which was based on an earlier email > guidance anyway, and a change to the synchronization part . I find > conclusions like the above insulting for people who put many hours > into designing and implementing the related functionality, given the > complexity of the videobuf2 framework and how ill-defined the DMA API > was, and would feel better if such could be avoided in future > communication. It wasn't meant to be too insulting, but I found this out when trying to figure out how to just disable it. But it also ends up using the actual dma attr flags for it's own consistency checks, so just not setting the flag did not turn out to work that easily. But in general it helps to add a few more people to the Cc list for such things that do stranger things. Especially if you think you did it based on the advice of those people.
On Thu, Aug 20, 2020 at 6:54 PM Christoph Hellwig <hch@lst.de> wrote: > > On Thu, Aug 20, 2020 at 12:05:29PM +0200, Tomasz Figa wrote: > > The UAPI and V4L2/videobuf2 changes are in good shape and the only > > wrong part is the use of DMA API, which was based on an earlier email > > guidance anyway, and a change to the synchronization part . I find > > conclusions like the above insulting for people who put many hours > > into designing and implementing the related functionality, given the > > complexity of the videobuf2 framework and how ill-defined the DMA API > > was, and would feel better if such could be avoided in future > > communication. > > It wasn't meant to be too insulting, but I found this out when trying > to figure out how to just disable it. But it also ends up using > the actual dma attr flags for it's own consistency checks, so just > not setting the flag did not turn out to work that easily. > Yes, sadly the videobuf2 ended up becoming quite counterintuitive after growing for the long years and that is reflected in the design of this feature as well. I think we need to do something about it. > But in general it helps to add a few more people to the Cc list for > such things that do stranger things. Especially if you think you did > it based on the advice of those people. Indeed, we should have CCed you and other DMA folks. Sergey who worked on this series is quite new to these areas of the kernel (although not to the kernel itself) and it's my fault for not explicitly letting him know to do that. Best regards, Tomasz
On Thu, Aug 20, 2020 at 6:52 PM Christoph Hellwig <hch@lst.de> wrote: > > On Thu, Aug 20, 2020 at 12:24:31PM +0200, Tomasz Figa wrote: > > > Of course this still uses the scatterlist structure with its annoying > > > mix of input and output parametes, so I'd rather not expose it as > > > an official API at the DMA layer. > > > > The problem with the above open coded approach is that it requires > > explicit handling of the non-IOMMU and IOMMU cases and this is exactly > > what we don't want to have in vb2 and what was actually the job of the > > DMA API to hide. Is the plan to actually move the IOMMU handling out > > of the DMA API? > > > > Do you think we could instead turn it into a dma_alloc_noncoherent() > > helper, which has similar semantics as dma_alloc_attrs() and handles > > the various corner cases (e.g. invalidate_kernel_vmap_range and > > flush_kernel_vmap_range) to achieve the desired functionality without > > delegating the "hell", as you called it, to the users? > > Yes, I guess I could do something in that direction. At least for > dma-iommu, which thanks to Robin should be all you'll need in the > foreseeable future. That would be really great. Let me know if we can help by testing with V4L2/vb2 or in any other way. Best regards, Tomasz
On Thu, Aug 20, 2020 at 07:33:48PM +0200, Tomasz Figa wrote: > > It wasn't meant to be too insulting, but I found this out when trying > > to figure out how to just disable it. But it also ends up using > > the actual dma attr flags for it's own consistency checks, so just > > not setting the flag did not turn out to work that easily. > > > > Yes, sadly the videobuf2 ended up becoming quite counterintuitive > after growing for the long years and that is reflected in the design > of this feature as well. I think we need to do something about it. So I'm about to respin the series and wonder how we should proceed. I've failed to come up with a clean patch to keep the flag and make it a no-op. Can you or your team give it a spin? Also I wonder if the flag should be renamed from NON_CONSISTENT to NON_COHERENT - the consistent thing is a weird wart from the times the old PCI DMA API that is mostly gone now.
On Tue, Sep 1, 2020 at 1:06 PM Christoph Hellwig <hch@lst.de> wrote: > > On Thu, Aug 20, 2020 at 07:33:48PM +0200, Tomasz Figa wrote: > > > It wasn't meant to be too insulting, but I found this out when trying > > > to figure out how to just disable it. But it also ends up using > > > the actual dma attr flags for it's own consistency checks, so just > > > not setting the flag did not turn out to work that easily. > > > > > > > Yes, sadly the videobuf2 ended up becoming quite counterintuitive > > after growing for the long years and that is reflected in the design > > of this feature as well. I think we need to do something about it. > > So I'm about to respin the series and wonder how we should proceed. > I've failed to come up with a clean patch to keep the flag and make > it a no-op. Can you or your team give it a spin? > Okay, I'll take a look. > Also I wonder if the flag should be renamed from NON_CONSISTENT > to NON_COHERENT - the consistent thing is a weird wart from the times > the old PCI DMA API that is mostly gone now. It originated from the DMA_ATTR_NON_CONSISTENT flag, but agreed that NON_COHERENT would be more consistent (pun not intended) with the rest of the DMA API given the removal of that flag. Let me see if we can still change it. Best regards, Tomasz
diff --git a/Documentation/userspace-api/media/v4l/buffer.rst b/Documentation/userspace-api/media/v4l/buffer.rst index 57e752aaf414a7..2044ed13cd9d7d 100644 --- a/Documentation/userspace-api/media/v4l/buffer.rst +++ b/Documentation/userspace-api/media/v4l/buffer.rst @@ -701,23 +701,6 @@ Memory Consistency Flags :stub-columns: 0 :widths: 3 1 4 - * .. _`V4L2-FLAG-MEMORY-NON-CONSISTENT`: - - - ``V4L2_FLAG_MEMORY_NON_CONSISTENT`` - - 0x00000001 - - A buffer is allocated either in consistent (it will be automatically - coherent between the CPU and the bus) or non-consistent memory. The - latter can provide performance gains, for instance the CPU cache - sync/flush operations can be avoided if the buffer is accessed by the - corresponding device only and the CPU does not read/write to/from that - buffer. However, this requires extra care from the driver -- it must - guarantee memory consistency by issuing a cache flush/sync when - consistency is needed. If this flag is set V4L2 will attempt to - allocate the buffer in non-consistent memory. The flag takes effect - only if the buffer is used for :ref:`memory mapping <mmap>` I/O and the - queue reports the :ref:`V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS - <V4L2-BUF-CAP-SUPPORTS-MMAP-CACHE-HINTS>` capability. - .. c:type:: v4l2_memory enum v4l2_memory diff --git a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst index 75d894d9c36c42..3180c111d368ee 100644 --- a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst +++ b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst @@ -169,7 +169,6 @@ aborting or finishing any DMA in progress, an implicit - This capability is set by the driver to indicate that the queue supports cache and memory management hints. However, it's only valid when the queue is used for :ref:`memory mapping <mmap>` streaming I/O. See - :ref:`V4L2_FLAG_MEMORY_NON_CONSISTENT <V4L2-FLAG-MEMORY-NON-CONSISTENT>`, :ref:`V4L2_BUF_FLAG_NO_CACHE_INVALIDATE <V4L2-BUF-FLAG-NO-CACHE-INVALIDATE>` and :ref:`V4L2_BUF_FLAG_NO_CACHE_CLEAN <V4L2-BUF-FLAG-NO-CACHE-CLEAN>`. diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c index f544d3393e9d6b..66a41cef33c1b1 100644 --- a/drivers/media/common/videobuf2/videobuf2-core.c +++ b/drivers/media/common/videobuf2/videobuf2-core.c @@ -721,39 +721,14 @@ int vb2_verify_memory_type(struct vb2_queue *q, } EXPORT_SYMBOL(vb2_verify_memory_type); -static void set_queue_consistency(struct vb2_queue *q, bool consistent_mem) -{ - q->dma_attrs &= ~DMA_ATTR_NON_CONSISTENT; - - if (!vb2_queue_allows_cache_hints(q)) - return; - if (!consistent_mem) - q->dma_attrs |= DMA_ATTR_NON_CONSISTENT; -} - -static bool verify_consistency_attr(struct vb2_queue *q, bool consistent_mem) -{ - bool queue_is_consistent = !(q->dma_attrs & DMA_ATTR_NON_CONSISTENT); - - if (consistent_mem != queue_is_consistent) { - dprintk(q, 1, "memory consistency model mismatch\n"); - return false; - } - return true; -} - int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, unsigned int flags, unsigned int *count) { unsigned int num_buffers, allocated_buffers, num_planes = 0; unsigned plane_sizes[VB2_MAX_PLANES] = { }; - bool consistent_mem = true; unsigned int i; int ret; - if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT) - consistent_mem = false; - if (q->streaming) { dprintk(q, 1, "streaming active\n"); return -EBUSY; @@ -765,8 +740,7 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, } if (*count == 0 || q->num_buffers != 0 || - (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory) || - !verify_consistency_attr(q, consistent_mem)) { + (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory)) { /* * We already have buffers allocated, so first check if they * are not in use and can be freed. @@ -803,7 +777,6 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, num_buffers = min_t(unsigned int, num_buffers, VB2_MAX_FRAME); memset(q->alloc_devs, 0, sizeof(q->alloc_devs)); q->memory = memory; - set_queue_consistency(q, consistent_mem); /* * Ask the driver how many buffers and planes per buffer it requires. @@ -894,12 +867,8 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory, { unsigned int num_planes = 0, num_buffers, allocated_buffers; unsigned plane_sizes[VB2_MAX_PLANES] = { }; - bool consistent_mem = true; int ret; - if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT) - consistent_mem = false; - if (q->num_buffers == VB2_MAX_FRAME) { dprintk(q, 1, "maximum number of buffers already allocated\n"); return -ENOBUFS; @@ -912,15 +881,12 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory, } memset(q->alloc_devs, 0, sizeof(q->alloc_devs)); q->memory = memory; - set_queue_consistency(q, consistent_mem); q->waiting_for_buffers = !q->is_output; } else { if (q->memory != memory) { dprintk(q, 1, "memory model mismatch\n"); return -EINVAL; } - if (!verify_consistency_attr(q, consistent_mem)) - return -EINVAL; } num_buffers = min(*count, VB2_MAX_FRAME - q->num_buffers); diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c index ec3446cc45b8da..7b1b86ec942d7d 100644 --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c @@ -42,11 +42,6 @@ struct vb2_dc_buf { struct dma_buf_attachment *db_attach; }; -static inline bool vb2_dc_buffer_consistent(unsigned long attr) -{ - return !(attr & DMA_ATTR_NON_CONSISTENT); -} - /*********************************************/ /* scatterlist table functions */ /*********************************************/ @@ -341,13 +336,6 @@ static int vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf, enum dma_data_direction direction) { - struct vb2_dc_buf *buf = dbuf->priv; - struct sg_table *sgt = buf->dma_sgt; - - if (vb2_dc_buffer_consistent(buf->attrs)) - return 0; - - dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir); return 0; } @@ -355,13 +343,6 @@ static int vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf, enum dma_data_direction direction) { - struct vb2_dc_buf *buf = dbuf->priv; - struct sg_table *sgt = buf->dma_sgt; - - if (vb2_dc_buffer_consistent(buf->attrs)) - return 0; - - dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir); return 0; } diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c index 0a40e00f0d7e5c..a86fce5d8ea8bf 100644 --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c @@ -123,8 +123,7 @@ static void *vb2_dma_sg_alloc(struct device *dev, unsigned long dma_attrs, /* * NOTE: dma-sg allocates memory using the page allocator directly, so * there is no memory consistency guarantee, hence dma-sg ignores DMA - * attributes passed from the upper layer. That means that - * V4L2_FLAG_MEMORY_NON_CONSISTENT has no effect on dma-sg buffers. + * attributes passed from the upper layer. */ buf->pages = kvmalloc_array(buf->num_pages, sizeof(struct page *), GFP_KERNEL | __GFP_ZERO); diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c index 30caad27281e1a..de83ad48783821 100644 --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c @@ -722,20 +722,11 @@ static void fill_buf_caps(struct vb2_queue *q, u32 *caps) #endif } -static void clear_consistency_attr(struct vb2_queue *q, - int memory, - unsigned int *flags) -{ - if (!q->allow_cache_hints || memory != V4L2_MEMORY_MMAP) - *flags &= ~V4L2_FLAG_MEMORY_NON_CONSISTENT; -} - int vb2_reqbufs(struct vb2_queue *q, struct v4l2_requestbuffers *req) { int ret = vb2_verify_memory_type(q, req->memory, req->type); fill_buf_caps(q, &req->capabilities); - clear_consistency_attr(q, req->memory, &req->flags); return ret ? ret : vb2_core_reqbufs(q, req->memory, req->flags, &req->count); } @@ -769,7 +760,6 @@ int vb2_create_bufs(struct vb2_queue *q, struct v4l2_create_buffers *create) unsigned i; fill_buf_caps(q, &create->capabilities); - clear_consistency_attr(q, create->memory, &create->flags); create->index = q->num_buffers; if (create->count == 0) return ret != -EBUSY ? ret : 0; @@ -998,7 +988,6 @@ int vb2_ioctl_reqbufs(struct file *file, void *priv, int res = vb2_verify_memory_type(vdev->queue, p->memory, p->type); fill_buf_caps(vdev->queue, &p->capabilities); - clear_consistency_attr(vdev->queue, p->memory, &p->flags); if (res) return res; if (vb2_queue_is_busy(vdev, file)) @@ -1021,7 +1010,6 @@ int vb2_ioctl_create_bufs(struct file *file, void *priv, p->index = vdev->queue->num_buffers; fill_buf_caps(vdev->queue, &p->capabilities); - clear_consistency_attr(vdev->queue, p->memory, &p->flags); /* * If count == 0, then just check if memory and type are valid. * Any -EBUSY result from vb2_verify_memory_type can be mapped to 0. diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h index 52ef92049073e3..4c7f25b07e9375 100644 --- a/include/media/videobuf2-core.h +++ b/include/media/videobuf2-core.h @@ -744,8 +744,7 @@ void vb2_core_querybuf(struct vb2_queue *q, unsigned int index, void *pb); * vb2_core_reqbufs() - Initiate streaming. * @q: pointer to &struct vb2_queue with videobuf2 queue. * @memory: memory type, as defined by &enum vb2_memory. - * @flags: auxiliary queue/buffer management flags. Currently, the only - * used flag is %V4L2_FLAG_MEMORY_NON_CONSISTENT. + * @flags: auxiliary queue/buffer management flags. * @count: requested buffer count. * * Videobuf2 core helper to implement VIDIOC_REQBUF() operation. It is called diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h index c7b70ff53bc1dd..5c00f63d9c1b58 100644 --- a/include/uapi/linux/videodev2.h +++ b/include/uapi/linux/videodev2.h @@ -191,8 +191,6 @@ enum v4l2_memory { V4L2_MEMORY_DMABUF = 4, }; -#define V4L2_FLAG_MEMORY_NON_CONSISTENT (1 << 0) - /* see also http://vektor.theorem.ca/graphics/ycbcr/ */ enum v4l2_colorspace { /*
The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused, and causes weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is unimplemented except on PARISC and some MIPS configs, and about to be removed. Signed-off-by: Christoph Hellwig <hch@lst.de> --- .../userspace-api/media/v4l/buffer.rst | 17 --------- .../media/v4l/vidioc-reqbufs.rst | 1 - .../media/common/videobuf2/videobuf2-core.c | 36 +------------------ .../common/videobuf2/videobuf2-dma-contig.c | 19 ---------- .../media/common/videobuf2/videobuf2-dma-sg.c | 3 +- .../media/common/videobuf2/videobuf2-v4l2.c | 12 ------- include/media/videobuf2-core.h | 3 +- include/uapi/linux/videodev2.h | 2 -- 8 files changed, 3 insertions(+), 90 deletions(-)