diff mbox series

[v2,2/2] media: imx: vdic: Introduce mem2mem VDI deinterlacer driver

Message ID 20240724002044.112544-2-marex@denx.de (mailing list archive)
State Not Applicable
Headers show
Series [v2,1/2] gpu: ipu-v3: vdic: Simplify ipu_vdi_setup() | expand

Commit Message

Marek Vasut July 24, 2024, 12:19 a.m. UTC
Introduce dedicated memory-to-memory IPUv3 VDI deinterlacer driver.
Currently the IPUv3 can operate VDI in DIRECT mode, from sensor to
memory. This only works for single stream, that is, one input from
one camera is deinterlaced on the fly with a helper buffer in DRAM
and the result is written into memory.

The i.MX6Q/QP does support up to four analog cameras via two IPUv3
instances, each containing one VDI deinterlacer block. In order to
deinterlace all four streams from all four analog cameras live, it
is necessary to operate VDI in INDIRECT mode, where the interlaced
streams are written to buffers in memory, and then deinterlaced in
memory using VDI in INDIRECT memory-to-memory mode.

This driver also makes use of the IDMAC->VDI->IC->IDMAC data path
to provide pixel format conversion from input YUV formats to both
output YUV or RGB formats. The later is useful in case the data
are imported into the GPU, which on this platform cannot directly
sample YUV buffers.

This is derived from previous work by Steve Longerbeam and from the
IPUv3 CSC Scaler mem2mem driver.

Signed-off-by: Marek Vasut <marex@denx.de>
---
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@gmail.com>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Steve Longerbeam <slongerbeam@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: imx@lists.linux.dev
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-fbdev@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: linux-staging@lists.linux.dev
---
V2: - Add complementary imx_media_mem2mem_vdic_uninit()
    - Drop uninitiaized ret from ipu_mem2mem_vdic_device_run()
    - Drop duplicate nbuffers assignment in ipu_mem2mem_vdic_queue_setup()
    - Fix %u formatting string in ipu_mem2mem_vdic_queue_setup()
    - Drop devm_*free from ipu_mem2mem_vdic_get_ipu_resources() fail path
      and ipu_mem2mem_vdic_put_ipu_resources()
    - Add missing video_device_release()
---
 drivers/staging/media/imx/Makefile            |   2 +-
 drivers/staging/media/imx/imx-media-dev.c     |  55 +
 .../media/imx/imx-media-mem2mem-vdic.c        | 997 ++++++++++++++++++
 drivers/staging/media/imx/imx-media.h         |  10 +
 4 files changed, 1063 insertions(+), 1 deletion(-)
 create mode 100644 drivers/staging/media/imx/imx-media-mem2mem-vdic.c

Comments

Nicolas Dufresne July 24, 2024, 4:08 p.m. UTC | #1
Hi Marek,

Le mercredi 24 juillet 2024 à 02:19 +0200, Marek Vasut a écrit :
> Introduce dedicated memory-to-memory IPUv3 VDI deinterlacer driver.
> Currently the IPUv3 can operate VDI in DIRECT mode, from sensor to
> memory. This only works for single stream, that is, one input from
> one camera is deinterlaced on the fly with a helper buffer in DRAM
> and the result is written into memory.
> 
> The i.MX6Q/QP does support up to four analog cameras via two IPUv3
> instances, each containing one VDI deinterlacer block. In order to
> deinterlace all four streams from all four analog cameras live, it
> is necessary to operate VDI in INDIRECT mode, where the interlaced
> streams are written to buffers in memory, and then deinterlaced in
> memory using VDI in INDIRECT memory-to-memory mode.

Just a quick design question. Is it possible to chain the deinterlacer and the
csc-scaler ? If so, it would be much more efficient if all this could be
combined into the existing m2m driver, since you could save a memory rountrip
when needing to deinterlace, change the colorspace and possibly scale too.

Nicolas

> 
> This driver also makes use of the IDMAC->VDI->IC->IDMAC data path
> to provide pixel format conversion from input YUV formats to both
> output YUV or RGB formats. The later is useful in case the data
> are imported into the GPU, which on this platform cannot directly
> sample YUV buffers.
> 
> This is derived from previous work by Steve Longerbeam and from the
> IPUv3 CSC Scaler mem2mem driver.
> 
> Signed-off-by: Marek Vasut <marex@denx.de>
> ---
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: David Airlie <airlied@gmail.com>
> Cc: Fabio Estevam <festevam@gmail.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Helge Deller <deller@gmx.de>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
> Cc: Philipp Zabel <p.zabel@pengutronix.de>
> Cc: Sascha Hauer <s.hauer@pengutronix.de>
> Cc: Shawn Guo <shawnguo@kernel.org>
> Cc: Steve Longerbeam <slongerbeam@gmail.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: imx@lists.linux.dev
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: linux-staging@lists.linux.dev
> ---
> V2: - Add complementary imx_media_mem2mem_vdic_uninit()
>     - Drop uninitiaized ret from ipu_mem2mem_vdic_device_run()
>     - Drop duplicate nbuffers assignment in ipu_mem2mem_vdic_queue_setup()
>     - Fix %u formatting string in ipu_mem2mem_vdic_queue_setup()
>     - Drop devm_*free from ipu_mem2mem_vdic_get_ipu_resources() fail path
>       and ipu_mem2mem_vdic_put_ipu_resources()
>     - Add missing video_device_release()
> ---
>  drivers/staging/media/imx/Makefile            |   2 +-
>  drivers/staging/media/imx/imx-media-dev.c     |  55 +
>  .../media/imx/imx-media-mem2mem-vdic.c        | 997 ++++++++++++++++++
>  drivers/staging/media/imx/imx-media.h         |  10 +
>  4 files changed, 1063 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/staging/media/imx/imx-media-mem2mem-vdic.c
> 
> diff --git a/drivers/staging/media/imx/Makefile b/drivers/staging/media/imx/Makefile
> index 330e0825f506b..0cad87123b590 100644
> --- a/drivers/staging/media/imx/Makefile
> +++ b/drivers/staging/media/imx/Makefile
> @@ -4,7 +4,7 @@ imx-media-common-objs := imx-media-capture.o imx-media-dev-common.o \
>  
>  imx6-media-objs := imx-media-dev.o imx-media-internal-sd.o \
>  	imx-ic-common.o imx-ic-prp.o imx-ic-prpencvf.o imx-media-vdic.o \
> -	imx-media-csc-scaler.o
> +	imx-media-mem2mem-vdic.o imx-media-csc-scaler.o
>  
>  imx6-media-csi-objs := imx-media-csi.o imx-media-fim.o
>  
> diff --git a/drivers/staging/media/imx/imx-media-dev.c b/drivers/staging/media/imx/imx-media-dev.c
> index be54dca11465d..a841fdb4c2394 100644
> --- a/drivers/staging/media/imx/imx-media-dev.c
> +++ b/drivers/staging/media/imx/imx-media-dev.c
> @@ -57,7 +57,52 @@ static int imx6_media_probe_complete(struct v4l2_async_notifier *notifier)
>  		goto unlock;
>  	}
>  
> +	imxmd->m2m_vdic[0] = imx_media_mem2mem_vdic_init(imxmd, 0);
> +	if (IS_ERR(imxmd->m2m_vdic[0])) {
> +		ret = PTR_ERR(imxmd->m2m_vdic[0]);
> +		imxmd->m2m_vdic[0] = NULL;
> +		goto unlock;
> +	}
> +
> +	/* MX6S/DL has one IPUv3, init second VDI only on MX6Q/QP */
> +	if (imxmd->ipu[1]) {
> +		imxmd->m2m_vdic[1] = imx_media_mem2mem_vdic_init(imxmd, 1);
> +		if (IS_ERR(imxmd->m2m_vdic[1])) {
> +			ret = PTR_ERR(imxmd->m2m_vdic[1]);
> +			imxmd->m2m_vdic[1] = NULL;
> +			goto uninit_vdi0;
> +		}
> +	}
> +
>  	ret = imx_media_csc_scaler_device_register(imxmd->m2m_vdev);
> +	if (ret)
> +		goto uninit_vdi1;
> +
> +	ret = imx_media_mem2mem_vdic_register(imxmd->m2m_vdic[0]);
> +	if (ret)
> +		goto unreg_csc;
> +
> +	/* MX6S/DL has one IPUv3, init second VDI only on MX6Q/QP */
> +	if (imxmd->ipu[1]) {
> +		ret = imx_media_mem2mem_vdic_register(imxmd->m2m_vdic[1]);
> +		if (ret)
> +			goto unreg_vdic;
> +	}
> +
> +	mutex_unlock(&imxmd->mutex);
> +	return ret;
> +
> +unreg_vdic:
> +	imx_media_mem2mem_vdic_unregister(imxmd->m2m_vdic[0]);
> +	imxmd->m2m_vdic[0] = NULL;
> +unreg_csc:
> +	imx_media_csc_scaler_device_unregister(imxmd->m2m_vdev);
> +	imxmd->m2m_vdev = NULL;
> +uninit_vdi1:
> +	if (imxmd->ipu[1])
> +		imx_media_mem2mem_vdic_uninit(imxmd->m2m_vdic[1]);
> +uninit_vdi0:
> +	imx_media_mem2mem_vdic_uninit(imxmd->m2m_vdic[0]);
>  unlock:
>  	mutex_unlock(&imxmd->mutex);
>  	return ret;
> @@ -108,6 +153,16 @@ static void imx_media_remove(struct platform_device *pdev)
>  
>  	v4l2_info(&imxmd->v4l2_dev, "Removing imx-media\n");
>  
> +	if (imxmd->m2m_vdic[1]) {	/* MX6Q/QP only */
> +		imx_media_mem2mem_vdic_unregister(imxmd->m2m_vdic[1]);
> +		imxmd->m2m_vdic[1] = NULL;
> +	}
> +
> +	if (imxmd->m2m_vdic[0]) {
> +		imx_media_mem2mem_vdic_unregister(imxmd->m2m_vdic[0]);
> +		imxmd->m2m_vdic[0] = NULL;
> +	}
> +
>  	if (imxmd->m2m_vdev) {
>  		imx_media_csc_scaler_device_unregister(imxmd->m2m_vdev);
>  		imxmd->m2m_vdev = NULL;
> diff --git a/drivers/staging/media/imx/imx-media-mem2mem-vdic.c b/drivers/staging/media/imx/imx-media-mem2mem-vdic.c
> new file mode 100644
> index 0000000000000..71c6c023d2bf8
> --- /dev/null
> +++ b/drivers/staging/media/imx/imx-media-mem2mem-vdic.c
> @@ -0,0 +1,997 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * i.MX VDIC mem2mem de-interlace driver
> + *
> + * Copyright (c) 2024 Marek Vasut <marex@denx.de>
> + *
> + * Based on previous VDIC mem2mem work by Steve Longerbeam that is:
> + * Copyright (c) 2018 Mentor Graphics Inc.
> + */
> +
> +#include <linux/delay.h>
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/version.h>
> +
> +#include <media/media-device.h>
> +#include <media/v4l2-ctrls.h>
> +#include <media/v4l2-device.h>
> +#include <media/v4l2-event.h>
> +#include <media/v4l2-ioctl.h>
> +#include <media/v4l2-mem2mem.h>
> +#include <media/videobuf2-dma-contig.h>
> +
> +#include "imx-media.h"
> +
> +#define fh_to_ctx(__fh)	container_of(__fh, struct ipu_mem2mem_vdic_ctx, fh)
> +
> +#define to_mem2mem_priv(v) container_of(v, struct ipu_mem2mem_vdic_priv, vdev)
> +
> +enum {
> +	V4L2_M2M_SRC = 0,
> +	V4L2_M2M_DST = 1,
> +};
> +
> +struct ipu_mem2mem_vdic_ctx;
> +
> +struct ipu_mem2mem_vdic_priv {
> +	struct imx_media_video_dev	vdev;
> +	struct imx_media_dev		*md;
> +	struct device			*dev;
> +	struct ipu_soc			*ipu_dev;
> +	int				ipu_id;
> +
> +	struct v4l2_m2m_dev		*m2m_dev;
> +	struct mutex			mutex;		/* mem2mem device mutex */
> +
> +	/* VDI resources */
> +	struct ipu_vdi			*vdi;
> +	struct ipu_ic			*ic;
> +	struct ipuv3_channel		*vdi_in_ch_p;
> +	struct ipuv3_channel		*vdi_in_ch;
> +	struct ipuv3_channel		*vdi_in_ch_n;
> +	struct ipuv3_channel		*vdi_out_ch;
> +	int				eof_irq;
> +	int				nfb4eof_irq;
> +	spinlock_t			irqlock;	/* protect eof_irq handler */
> +
> +	atomic_t			stream_count;
> +
> +	struct ipu_mem2mem_vdic_ctx	*curr_ctx;
> +
> +	struct v4l2_pix_format		fmt[2];
> +};
> +
> +struct ipu_mem2mem_vdic_ctx {
> +	struct ipu_mem2mem_vdic_priv	*priv;
> +	struct v4l2_fh			fh;
> +	unsigned int			sequence;
> +	struct vb2_v4l2_buffer		*prev_buf;
> +	struct vb2_v4l2_buffer		*curr_buf;
> +};
> +
> +static struct v4l2_pix_format *
> +ipu_mem2mem_vdic_get_format(struct ipu_mem2mem_vdic_priv *priv,
> +			    enum v4l2_buf_type type)
> +{
> +	return &priv->fmt[V4L2_TYPE_IS_OUTPUT(type) ? V4L2_M2M_SRC : V4L2_M2M_DST];
> +}
> +
> +static bool ipu_mem2mem_vdic_format_is_yuv420(const u32 pixelformat)
> +{
> +	/* All 4:2:0 subsampled formats supported by this hardware */
> +	return pixelformat == V4L2_PIX_FMT_YUV420 ||
> +	       pixelformat == V4L2_PIX_FMT_YVU420 ||
> +	       pixelformat == V4L2_PIX_FMT_NV12;
> +}
> +
> +static bool ipu_mem2mem_vdic_format_is_yuv422(const u32 pixelformat)
> +{
> +	/* All 4:2:2 subsampled formats supported by this hardware */
> +	return pixelformat == V4L2_PIX_FMT_UYVY ||
> +	       pixelformat == V4L2_PIX_FMT_YUYV ||
> +	       pixelformat == V4L2_PIX_FMT_YUV422P ||
> +	       pixelformat == V4L2_PIX_FMT_NV16;
> +}
> +
> +static bool ipu_mem2mem_vdic_format_is_yuv(const u32 pixelformat)
> +{
> +	return ipu_mem2mem_vdic_format_is_yuv420(pixelformat) ||
> +	       ipu_mem2mem_vdic_format_is_yuv422(pixelformat);
> +}
> +
> +static bool ipu_mem2mem_vdic_format_is_rgb16(const u32 pixelformat)
> +{
> +	/* All 16-bit RGB formats supported by this hardware */
> +	return pixelformat == V4L2_PIX_FMT_RGB565;
> +}
> +
> +static bool ipu_mem2mem_vdic_format_is_rgb24(const u32 pixelformat)
> +{
> +	/* All 24-bit RGB formats supported by this hardware */
> +	return pixelformat == V4L2_PIX_FMT_RGB24 ||
> +	       pixelformat == V4L2_PIX_FMT_BGR24;
> +}
> +
> +static bool ipu_mem2mem_vdic_format_is_rgb32(const u32 pixelformat)
> +{
> +	/* All 32-bit RGB formats supported by this hardware */
> +	return pixelformat == V4L2_PIX_FMT_XRGB32 ||
> +	       pixelformat == V4L2_PIX_FMT_XBGR32 ||
> +	       pixelformat == V4L2_PIX_FMT_BGRX32 ||
> +	       pixelformat == V4L2_PIX_FMT_RGBX32;
> +}
> +
> +/*
> + * mem2mem callbacks
> + */
> +static irqreturn_t ipu_mem2mem_vdic_eof_interrupt(int irq, void *dev_id)
> +{
> +	struct ipu_mem2mem_vdic_priv *priv = dev_id;
> +	struct ipu_mem2mem_vdic_ctx *ctx = priv->curr_ctx;
> +	struct vb2_v4l2_buffer *src_buf, *dst_buf;
> +
> +	spin_lock(&priv->irqlock);
> +
> +	src_buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
> +	dst_buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
> +
> +	v4l2_m2m_buf_copy_metadata(src_buf, dst_buf, true);
> +
> +	src_buf->sequence = ctx->sequence++;
> +	dst_buf->sequence = src_buf->sequence;
> +
> +	v4l2_m2m_buf_done(src_buf, VB2_BUF_STATE_DONE);
> +	v4l2_m2m_buf_done(dst_buf, VB2_BUF_STATE_DONE);
> +
> +	v4l2_m2m_job_finish(priv->m2m_dev, ctx->fh.m2m_ctx);
> +
> +	spin_unlock(&priv->irqlock);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t ipu_mem2mem_vdic_nfb4eof_interrupt(int irq, void *dev_id)
> +{
> +	struct ipu_mem2mem_vdic_priv *priv = dev_id;
> +
> +	/* That is about all we can do about it, report it. */
> +	dev_warn_ratelimited(priv->dev, "NFB4EOF error interrupt occurred\n");
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static void ipu_mem2mem_vdic_device_run(void *_ctx)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = _ctx;
> +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
> +	struct vb2_v4l2_buffer *curr_buf, *dst_buf;
> +	dma_addr_t prev_phys, curr_phys, out_phys;
> +	struct v4l2_pix_format *infmt;
> +	u32 phys_offset = 0;
> +	unsigned long flags;
> +
> +	infmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_OUTPUT);
> +	if (V4L2_FIELD_IS_SEQUENTIAL(infmt->field))
> +		phys_offset = infmt->sizeimage / 2;
> +	else if (V4L2_FIELD_IS_INTERLACED(infmt->field))
> +		phys_offset = infmt->bytesperline;
> +	else
> +		dev_err(priv->dev, "Invalid field %d\n", infmt->field);
> +
> +	dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
> +	out_phys = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
> +
> +	curr_buf = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
> +	if (!curr_buf) {
> +		dev_err(priv->dev, "Not enough buffers\n");
> +		return;
> +	}
> +
> +	spin_lock_irqsave(&priv->irqlock, flags);
> +
> +	if (ctx->curr_buf) {
> +		ctx->prev_buf = ctx->curr_buf;
> +		ctx->curr_buf = curr_buf;
> +	} else {
> +		ctx->prev_buf = curr_buf;
> +		ctx->curr_buf = curr_buf;
> +		dev_warn(priv->dev, "Single-buffer mode, fix your userspace\n");
> +	}
> +
> +	prev_phys = vb2_dma_contig_plane_dma_addr(&ctx->prev_buf->vb2_buf, 0);
> +	curr_phys = vb2_dma_contig_plane_dma_addr(&ctx->curr_buf->vb2_buf, 0);
> +
> +	priv->curr_ctx = ctx;
> +	spin_unlock_irqrestore(&priv->irqlock, flags);
> +
> +	ipu_cpmem_set_buffer(priv->vdi_out_ch,  0, out_phys);
> +	ipu_cpmem_set_buffer(priv->vdi_in_ch_p, 0, prev_phys + phys_offset);
> +	ipu_cpmem_set_buffer(priv->vdi_in_ch,   0, curr_phys);
> +	ipu_cpmem_set_buffer(priv->vdi_in_ch_n, 0, curr_phys + phys_offset);
> +
> +	/* No double buffering, always pick buffer 0 */
> +	ipu_idmac_select_buffer(priv->vdi_out_ch, 0);
> +	ipu_idmac_select_buffer(priv->vdi_in_ch_p, 0);
> +	ipu_idmac_select_buffer(priv->vdi_in_ch, 0);
> +	ipu_idmac_select_buffer(priv->vdi_in_ch_n, 0);
> +
> +	/* Enable the channels */
> +	ipu_idmac_enable_channel(priv->vdi_out_ch);
> +	ipu_idmac_enable_channel(priv->vdi_in_ch_p);
> +	ipu_idmac_enable_channel(priv->vdi_in_ch);
> +	ipu_idmac_enable_channel(priv->vdi_in_ch_n);
> +}
> +
> +/*
> + * Video ioctls
> + */
> +static int ipu_mem2mem_vdic_querycap(struct file *file, void *priv,
> +				     struct v4l2_capability *cap)
> +{
> +	strscpy(cap->driver, "imx-m2m-vdic", sizeof(cap->driver));
> +	strscpy(cap->card, "imx-m2m-vdic", sizeof(cap->card));
> +	strscpy(cap->bus_info, "platform:imx-m2m-vdic", sizeof(cap->bus_info));
> +	cap->device_caps = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING;
> +	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
> +
> +	return 0;
> +}
> +
> +static int ipu_mem2mem_vdic_enum_fmt(struct file *file, void *fh, struct v4l2_fmtdesc *f)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = fh_to_ctx(fh);
> +	struct vb2_queue *vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
> +	enum imx_pixfmt_sel cs = vq->type == V4L2_BUF_TYPE_VIDEO_CAPTURE ?
> +				 PIXFMT_SEL_YUV_RGB : PIXFMT_SEL_YUV;
> +	u32 fourcc;
> +	int ret;
> +
> +	ret = imx_media_enum_pixel_formats(&fourcc, f->index, cs, 0);
> +	if (ret)
> +		return ret;
> +
> +	f->pixelformat = fourcc;
> +
> +	return 0;
> +}
> +
> +static int ipu_mem2mem_vdic_g_fmt(struct file *file, void *fh, struct v4l2_format *f)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = fh_to_ctx(fh);
> +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
> +	struct v4l2_pix_format *fmt = ipu_mem2mem_vdic_get_format(priv, f->type);
> +
> +	f->fmt.pix = *fmt;
> +
> +	return 0;
> +}
> +
> +static int ipu_mem2mem_vdic_try_fmt(struct file *file, void *fh,
> +				    struct v4l2_format *f)
> +{
> +	const struct imx_media_pixfmt *cc;
> +	enum imx_pixfmt_sel cs;
> +	u32 fourcc;
> +
> +	if (f->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) {	/* Output */
> +		cs = PIXFMT_SEL_YUV_RGB;	/* YUV direct / RGB via IC */
> +
> +		f->fmt.pix.field = V4L2_FIELD_NONE;
> +	} else {
> +		cs = PIXFMT_SEL_YUV;		/* YUV input only */
> +
> +		/*
> +		 * Input must be interlaced with frame order.
> +		 * Fall back to SEQ_TB otherwise.
> +		 */
> +		if (!V4L2_FIELD_HAS_BOTH(f->fmt.pix.field) ||
> +		    f->fmt.pix.field == V4L2_FIELD_INTERLACED)
> +			f->fmt.pix.field = V4L2_FIELD_SEQ_TB;
> +	}
> +
> +	fourcc = f->fmt.pix.pixelformat;
> +	cc = imx_media_find_pixel_format(fourcc, cs);
> +	if (!cc) {
> +		imx_media_enum_pixel_formats(&fourcc, 0, cs, 0);
> +		cc = imx_media_find_pixel_format(fourcc, cs);
> +	}
> +
> +	f->fmt.pix.pixelformat = cc->fourcc;
> +
> +	v4l_bound_align_image(&f->fmt.pix.width,
> +			      1, 968, 1,
> +			      &f->fmt.pix.height,
> +			      1, 1024, 1, 1);
> +
> +	if (ipu_mem2mem_vdic_format_is_yuv420(f->fmt.pix.pixelformat))
> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 3 / 2;
> +	else if (ipu_mem2mem_vdic_format_is_yuv422(f->fmt.pix.pixelformat))
> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 2;
> +	else if (ipu_mem2mem_vdic_format_is_rgb16(f->fmt.pix.pixelformat))
> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 2;
> +	else if (ipu_mem2mem_vdic_format_is_rgb24(f->fmt.pix.pixelformat))
> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 3;
> +	else if (ipu_mem2mem_vdic_format_is_rgb32(f->fmt.pix.pixelformat))
> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 4;
> +	else
> +		f->fmt.pix.bytesperline = f->fmt.pix.width;
> +
> +	f->fmt.pix.sizeimage = f->fmt.pix.height * f->fmt.pix.bytesperline;
> +
> +	return 0;
> +}
> +
> +static int ipu_mem2mem_vdic_s_fmt(struct file *file, void *fh, struct v4l2_format *f)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = fh_to_ctx(fh);
> +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
> +	struct v4l2_pix_format *fmt, *infmt, *outfmt;
> +	struct vb2_queue *vq;
> +	int ret;
> +
> +	vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
> +	if (vb2_is_busy(vq)) {
> +		dev_err(priv->dev, "%s queue busy\n",  __func__);
> +		return -EBUSY;
> +	}
> +
> +	ret = ipu_mem2mem_vdic_try_fmt(file, fh, f);
> +	if (ret < 0)
> +		return ret;
> +
> +	fmt = ipu_mem2mem_vdic_get_format(priv, f->type);
> +	*fmt = f->fmt.pix;
> +
> +	/* Propagate colorimetry to the capture queue */
> +	infmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_OUTPUT);
> +	outfmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_CAPTURE);
> +	outfmt->colorspace = infmt->colorspace;
> +	outfmt->ycbcr_enc = infmt->ycbcr_enc;
> +	outfmt->xfer_func = infmt->xfer_func;
> +	outfmt->quantization = infmt->quantization;
> +
> +	return 0;
> +}
> +
> +static const struct v4l2_ioctl_ops mem2mem_ioctl_ops = {
> +	.vidioc_querycap		= ipu_mem2mem_vdic_querycap,
> +
> +	.vidioc_enum_fmt_vid_cap	= ipu_mem2mem_vdic_enum_fmt,
> +	.vidioc_g_fmt_vid_cap		= ipu_mem2mem_vdic_g_fmt,
> +	.vidioc_try_fmt_vid_cap		= ipu_mem2mem_vdic_try_fmt,
> +	.vidioc_s_fmt_vid_cap		= ipu_mem2mem_vdic_s_fmt,
> +
> +	.vidioc_enum_fmt_vid_out	= ipu_mem2mem_vdic_enum_fmt,
> +	.vidioc_g_fmt_vid_out		= ipu_mem2mem_vdic_g_fmt,
> +	.vidioc_try_fmt_vid_out		= ipu_mem2mem_vdic_try_fmt,
> +	.vidioc_s_fmt_vid_out		= ipu_mem2mem_vdic_s_fmt,
> +
> +	.vidioc_reqbufs			= v4l2_m2m_ioctl_reqbufs,
> +	.vidioc_querybuf		= v4l2_m2m_ioctl_querybuf,
> +
> +	.vidioc_qbuf			= v4l2_m2m_ioctl_qbuf,
> +	.vidioc_expbuf			= v4l2_m2m_ioctl_expbuf,
> +	.vidioc_dqbuf			= v4l2_m2m_ioctl_dqbuf,
> +	.vidioc_create_bufs		= v4l2_m2m_ioctl_create_bufs,
> +
> +	.vidioc_streamon		= v4l2_m2m_ioctl_streamon,
> +	.vidioc_streamoff		= v4l2_m2m_ioctl_streamoff,
> +
> +	.vidioc_subscribe_event		= v4l2_ctrl_subscribe_event,
> +	.vidioc_unsubscribe_event	= v4l2_event_unsubscribe,
> +};
> +
> +/*
> + * Queue operations
> + */
> +static int ipu_mem2mem_vdic_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
> +					unsigned int *nplanes, unsigned int sizes[],
> +					struct device *alloc_devs[])
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(vq);
> +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
> +	struct v4l2_pix_format *fmt = ipu_mem2mem_vdic_get_format(priv, vq->type);
> +	unsigned int count = *nbuffers;
> +
> +	if (*nplanes)
> +		return sizes[0] < fmt->sizeimage ? -EINVAL : 0;
> +
> +	*nplanes = 1;
> +	sizes[0] = fmt->sizeimage;
> +
> +	dev_dbg(ctx->priv->dev, "get %u buffer(s) of size %d each.\n",
> +		count, fmt->sizeimage);
> +
> +	return 0;
> +}
> +
> +static int ipu_mem2mem_vdic_buf_prepare(struct vb2_buffer *vb)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
> +	struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
> +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
> +	struct vb2_queue *vq = vb->vb2_queue;
> +	struct v4l2_pix_format *fmt;
> +	unsigned long size;
> +
> +	dev_dbg(ctx->priv->dev, "type: %d\n", vb->vb2_queue->type);
> +
> +	if (V4L2_TYPE_IS_OUTPUT(vq->type)) {
> +		if (vbuf->field == V4L2_FIELD_ANY)
> +			vbuf->field = V4L2_FIELD_SEQ_TB;
> +		if (!V4L2_FIELD_HAS_BOTH(vbuf->field)) {
> +			dev_dbg(ctx->priv->dev, "%s: field isn't supported\n",
> +				__func__);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	fmt = ipu_mem2mem_vdic_get_format(priv, vb->vb2_queue->type);
> +	size = fmt->sizeimage;
> +
> +	if (vb2_plane_size(vb, 0) < size) {
> +		dev_dbg(ctx->priv->dev,
> +			"%s: data will not fit into plane (%lu < %lu)\n",
> +			__func__, vb2_plane_size(vb, 0), size);
> +		return -EINVAL;
> +	}
> +
> +	vb2_set_plane_payload(vb, 0, fmt->sizeimage);
> +
> +	return 0;
> +}
> +
> +static void ipu_mem2mem_vdic_buf_queue(struct vb2_buffer *vb)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
> +
> +	v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, to_vb2_v4l2_buffer(vb));
> +}
> +
> +/* VDIC hardware setup */
> +static int ipu_mem2mem_vdic_setup_channel(struct ipu_mem2mem_vdic_priv *priv,
> +					  struct ipuv3_channel *channel,
> +					  struct v4l2_pix_format *fmt,
> +					  bool in)
> +{
> +	struct ipu_image image = { 0 };
> +	unsigned int burst_size;
> +	int ret;
> +
> +	image.pix = *fmt;
> +	image.rect.width = image.pix.width;
> +	image.rect.height = image.pix.height;
> +
> +	ipu_cpmem_zero(channel);
> +
> +	if (in) {
> +		/* One field to VDIC channels */
> +		image.pix.height /= 2;
> +		image.rect.height /= 2;
> +	} else {
> +		/* Skip writing U and V components to odd rows */
> +		if (ipu_mem2mem_vdic_format_is_yuv420(image.pix.pixelformat))
> +			ipu_cpmem_skip_odd_chroma_rows(channel);
> +	}
> +
> +	ret = ipu_cpmem_set_image(channel, &image);
> +	if (ret)
> +		return ret;
> +
> +	burst_size = (image.pix.width & 0xf) ? 8 : 16;
> +	ipu_cpmem_set_burstsize(channel, burst_size);
> +
> +	if (!ipu_prg_present(priv->ipu_dev))
> +		ipu_cpmem_set_axi_id(channel, 1);
> +
> +	ipu_idmac_set_double_buffer(channel, false);
> +
> +	return 0;
> +}
> +
> +static int ipu_mem2mem_vdic_setup_hardware(struct ipu_mem2mem_vdic_priv *priv)
> +{
> +	struct v4l2_pix_format *infmt, *outfmt;
> +	struct ipu_ic_csc csc;
> +	bool in422, outyuv;
> +	int ret;
> +
> +	infmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_OUTPUT);
> +	outfmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_CAPTURE);
> +	in422 = ipu_mem2mem_vdic_format_is_yuv422(infmt->pixelformat);
> +	outyuv = ipu_mem2mem_vdic_format_is_yuv(outfmt->pixelformat);
> +
> +	ipu_vdi_setup(priv->vdi, in422, infmt->width, infmt->height);
> +	ipu_vdi_set_field_order(priv->vdi, V4L2_STD_UNKNOWN, infmt->field);
> +	ipu_vdi_set_motion(priv->vdi, HIGH_MOTION);
> +
> +	/* Initialize the VDI IDMAC channels */
> +	ret = ipu_mem2mem_vdic_setup_channel(priv, priv->vdi_in_ch_p, infmt, true);
> +	if (ret)
> +		return ret;
> +
> +	ret = ipu_mem2mem_vdic_setup_channel(priv, priv->vdi_in_ch, infmt, true);
> +	if (ret)
> +		return ret;
> +
> +	ret = ipu_mem2mem_vdic_setup_channel(priv, priv->vdi_in_ch_n, infmt, true);
> +	if (ret)
> +		return ret;
> +
> +	ret = ipu_mem2mem_vdic_setup_channel(priv, priv->vdi_out_ch, outfmt, false);
> +	if (ret)
> +		return ret;
> +
> +	ret = ipu_ic_calc_csc(&csc,
> +			      infmt->ycbcr_enc, infmt->quantization,
> +			      IPUV3_COLORSPACE_YUV,
> +			      outfmt->ycbcr_enc, outfmt->quantization,
> +			      outyuv ? IPUV3_COLORSPACE_YUV :
> +				       IPUV3_COLORSPACE_RGB);
> +	if (ret)
> +		return ret;
> +
> +	/* Enable the IC */
> +	ipu_ic_task_init(priv->ic, &csc,
> +			 infmt->width, infmt->height,
> +			 outfmt->width, outfmt->height);
> +	ipu_ic_task_idma_init(priv->ic, priv->vdi_out_ch,
> +			      infmt->width, infmt->height, 16, 0);
> +	ipu_ic_enable(priv->ic);
> +	ipu_ic_task_enable(priv->ic);
> +
> +	/* Enable the VDI */
> +	ipu_vdi_enable(priv->vdi);
> +
> +	return 0;
> +}
> +
> +static struct vb2_queue *ipu_mem2mem_vdic_get_other_q(struct vb2_queue *q)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(q);
> +	enum v4l2_buf_type type = q->type == V4L2_BUF_TYPE_VIDEO_CAPTURE ?
> +				  V4L2_BUF_TYPE_VIDEO_OUTPUT :
> +				  V4L2_BUF_TYPE_VIDEO_CAPTURE;
> +
> +	return v4l2_m2m_get_vq(ctx->fh.m2m_ctx, type);
> +}
> +
> +static void ipu_mem2mem_vdic_return_bufs(struct vb2_queue *q)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(q);
> +	struct vb2_v4l2_buffer *buf;
> +
> +	if (q->type == V4L2_BUF_TYPE_VIDEO_OUTPUT)
> +		while ((buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx)))
> +			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_QUEUED);
> +	else
> +		while ((buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx)))
> +			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_QUEUED);
> +}
> +
> +static int ipu_mem2mem_vdic_start_streaming(struct vb2_queue *q, unsigned int count)
> +{
> +	struct vb2_queue *other_q = ipu_mem2mem_vdic_get_other_q(q);
> +	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(q);
> +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
> +	int ret;
> +
> +	if (!vb2_is_streaming(other_q))
> +		return 0;
> +
> +	/* Already streaming, do not reconfigure the VDI. */
> +	if (atomic_inc_return(&priv->stream_count) != 1)
> +		return 0;
> +
> +	/* Start streaming */
> +	ret = ipu_mem2mem_vdic_setup_hardware(priv);
> +	if (ret)
> +		ipu_mem2mem_vdic_return_bufs(q);
> +
> +	return ret;
> +}
> +
> +static void ipu_mem2mem_vdic_stop_streaming(struct vb2_queue *q)
> +{
> +	struct vb2_queue *other_q = ipu_mem2mem_vdic_get_other_q(q);
> +	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(q);
> +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
> +
> +	if (vb2_is_streaming(other_q)) {
> +		ipu_mem2mem_vdic_return_bufs(q);
> +		return;
> +	}
> +
> +	if (atomic_dec_return(&priv->stream_count) == 0) {
> +		/* Stop streaming */
> +		ipu_idmac_disable_channel(priv->vdi_in_ch_p);
> +		ipu_idmac_disable_channel(priv->vdi_in_ch);
> +		ipu_idmac_disable_channel(priv->vdi_in_ch_n);
> +		ipu_idmac_disable_channel(priv->vdi_out_ch);
> +
> +		ipu_vdi_disable(priv->vdi);
> +		ipu_ic_task_disable(priv->ic);
> +		ipu_ic_disable(priv->ic);
> +	}
> +
> +	ctx->sequence = 0;
> +
> +	ipu_mem2mem_vdic_return_bufs(q);
> +}
> +
> +static const struct vb2_ops mem2mem_qops = {
> +	.queue_setup	= ipu_mem2mem_vdic_queue_setup,
> +	.buf_prepare	= ipu_mem2mem_vdic_buf_prepare,
> +	.buf_queue	= ipu_mem2mem_vdic_buf_queue,
> +	.wait_prepare	= vb2_ops_wait_prepare,
> +	.wait_finish	= vb2_ops_wait_finish,
> +	.start_streaming = ipu_mem2mem_vdic_start_streaming,
> +	.stop_streaming = ipu_mem2mem_vdic_stop_streaming,
> +};
> +
> +static int ipu_mem2mem_vdic_queue_init(void *priv, struct vb2_queue *src_vq,
> +				       struct vb2_queue *dst_vq)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = priv;
> +	int ret;
> +
> +	memset(src_vq, 0, sizeof(*src_vq));
> +	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT;
> +	src_vq->io_modes = VB2_MMAP | VB2_DMABUF;
> +	src_vq->drv_priv = ctx;
> +	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
> +	src_vq->ops = &mem2mem_qops;
> +	src_vq->mem_ops = &vb2_dma_contig_memops;
> +	src_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
> +	src_vq->lock = &ctx->priv->mutex;
> +	src_vq->dev = ctx->priv->dev;
> +
> +	ret = vb2_queue_init(src_vq);
> +	if (ret)
> +		return ret;
> +
> +	memset(dst_vq, 0, sizeof(*dst_vq));
> +	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
> +	dst_vq->io_modes = VB2_MMAP | VB2_DMABUF;
> +	dst_vq->drv_priv = ctx;
> +	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
> +	dst_vq->ops = &mem2mem_qops;
> +	dst_vq->mem_ops = &vb2_dma_contig_memops;
> +	dst_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
> +	dst_vq->lock = &ctx->priv->mutex;
> +	dst_vq->dev = ctx->priv->dev;
> +
> +	return vb2_queue_init(dst_vq);
> +}
> +
> +#define DEFAULT_WIDTH	720
> +#define DEFAULT_HEIGHT	576
> +static const struct v4l2_pix_format ipu_mem2mem_vdic_default = {
> +	.width		= DEFAULT_WIDTH,
> +	.height		= DEFAULT_HEIGHT,
> +	.pixelformat	= V4L2_PIX_FMT_YUV420,
> +	.field		= V4L2_FIELD_SEQ_TB,
> +	.bytesperline	= DEFAULT_WIDTH,
> +	.sizeimage	= DEFAULT_WIDTH * DEFAULT_HEIGHT * 3 / 2,
> +	.colorspace	= V4L2_COLORSPACE_SRGB,
> +	.ycbcr_enc	= V4L2_YCBCR_ENC_601,
> +	.xfer_func	= V4L2_XFER_FUNC_DEFAULT,
> +	.quantization	= V4L2_QUANTIZATION_DEFAULT,
> +};
> +
> +/*
> + * File operations
> + */
> +static int ipu_mem2mem_vdic_open(struct file *file)
> +{
> +	struct ipu_mem2mem_vdic_priv *priv = video_drvdata(file);
> +	struct ipu_mem2mem_vdic_ctx *ctx = NULL;
> +	int ret;
> +
> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> +	if (!ctx)
> +		return -ENOMEM;
> +
> +	v4l2_fh_init(&ctx->fh, video_devdata(file));
> +	file->private_data = &ctx->fh;
> +	v4l2_fh_add(&ctx->fh);
> +	ctx->priv = priv;
> +
> +	ctx->fh.m2m_ctx = v4l2_m2m_ctx_init(priv->m2m_dev, ctx,
> +					    &ipu_mem2mem_vdic_queue_init);
> +	if (IS_ERR(ctx->fh.m2m_ctx)) {
> +		ret = PTR_ERR(ctx->fh.m2m_ctx);
> +		goto err_ctx;
> +	}
> +
> +	dev_dbg(priv->dev, "Created instance %p, m2m_ctx: %p\n",
> +		ctx, ctx->fh.m2m_ctx);
> +
> +	return 0;
> +
> +err_ctx:
> +	v4l2_fh_del(&ctx->fh);
> +	v4l2_fh_exit(&ctx->fh);
> +	kfree(ctx);
> +	return ret;
> +}
> +
> +static int ipu_mem2mem_vdic_release(struct file *file)
> +{
> +	struct ipu_mem2mem_vdic_priv *priv = video_drvdata(file);
> +	struct ipu_mem2mem_vdic_ctx *ctx = fh_to_ctx(file->private_data);
> +
> +	dev_dbg(priv->dev, "Releasing instance %p\n", ctx);
> +
> +	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> +	v4l2_fh_del(&ctx->fh);
> +	v4l2_fh_exit(&ctx->fh);
> +	kfree(ctx);
> +
> +	return 0;
> +}
> +
> +static const struct v4l2_file_operations mem2mem_fops = {
> +	.owner		= THIS_MODULE,
> +	.open		= ipu_mem2mem_vdic_open,
> +	.release	= ipu_mem2mem_vdic_release,
> +	.poll		= v4l2_m2m_fop_poll,
> +	.unlocked_ioctl	= video_ioctl2,
> +	.mmap		= v4l2_m2m_fop_mmap,
> +};
> +
> +static struct v4l2_m2m_ops m2m_ops = {
> +	.device_run	= ipu_mem2mem_vdic_device_run,
> +};
> +
> +static void ipu_mem2mem_vdic_device_release(struct video_device *vdev)
> +{
> +	struct ipu_mem2mem_vdic_priv *priv = video_get_drvdata(vdev);
> +
> +	v4l2_m2m_release(priv->m2m_dev);
> +	video_device_release(vdev);
> +	kfree(priv);
> +}
> +
> +static const struct video_device mem2mem_template = {
> +	.name		= "ipu_vdic",
> +	.fops		= &mem2mem_fops,
> +	.ioctl_ops	= &mem2mem_ioctl_ops,
> +	.minor		= -1,
> +	.release	= ipu_mem2mem_vdic_device_release,
> +	.vfl_dir	= VFL_DIR_M2M,
> +	.tvnorms	= V4L2_STD_NTSC | V4L2_STD_PAL | V4L2_STD_SECAM,
> +	.device_caps	= V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING,
> +};
> +
> +static int ipu_mem2mem_vdic_get_ipu_resources(struct ipu_mem2mem_vdic_priv *priv,
> +					      struct video_device *vfd)
> +{
> +	char *nfbname, *eofname;
> +	int ret;
> +
> +	nfbname = devm_kasprintf(priv->dev, GFP_KERNEL, "%s_nfb4eof:%u",
> +				 vfd->name, priv->ipu_id);
> +	if (!nfbname)
> +		return -ENOMEM;
> +
> +	eofname = devm_kasprintf(priv->dev, GFP_KERNEL, "%s_eof:%u",
> +				 vfd->name, priv->ipu_id);
> +	if (!eofname)
> +		return -ENOMEM;
> +
> +	priv->vdi = ipu_vdi_get(priv->ipu_dev);
> +	if (IS_ERR(priv->vdi)) {
> +		ret = PTR_ERR(priv->vdi);
> +		goto err_vdi;
> +	}
> +
> +	priv->ic = ipu_ic_get(priv->ipu_dev, IC_TASK_VIEWFINDER);
> +	if (IS_ERR(priv->ic)) {
> +		ret = PTR_ERR(priv->ic);
> +		goto err_ic;
> +	}
> +
> +	priv->vdi_in_ch_p = ipu_idmac_get(priv->ipu_dev,
> +					  IPUV3_CHANNEL_MEM_VDI_PREV);
> +	if (IS_ERR(priv->vdi_in_ch_p)) {
> +		ret = PTR_ERR(priv->vdi_in_ch_p);
> +		goto err_prev;
> +	}
> +
> +	priv->vdi_in_ch = ipu_idmac_get(priv->ipu_dev,
> +					IPUV3_CHANNEL_MEM_VDI_CUR);
> +	if (IS_ERR(priv->vdi_in_ch)) {
> +		ret = PTR_ERR(priv->vdi_in_ch);
> +		goto err_curr;
> +	}
> +
> +	priv->vdi_in_ch_n = ipu_idmac_get(priv->ipu_dev,
> +					  IPUV3_CHANNEL_MEM_VDI_NEXT);
> +	if (IS_ERR(priv->vdi_in_ch_n)) {
> +		ret = PTR_ERR(priv->vdi_in_ch_n);
> +		goto err_next;
> +	}
> +
> +	priv->vdi_out_ch = ipu_idmac_get(priv->ipu_dev,
> +					 IPUV3_CHANNEL_IC_PRP_VF_MEM);
> +	if (IS_ERR(priv->vdi_out_ch)) {
> +		ret = PTR_ERR(priv->vdi_out_ch);
> +		goto err_out;
> +	}
> +
> +	priv->nfb4eof_irq = ipu_idmac_channel_irq(priv->ipu_dev,
> +						  priv->vdi_out_ch,
> +						  IPU_IRQ_NFB4EOF);
> +	ret = devm_request_irq(priv->dev, priv->nfb4eof_irq,
> +			       ipu_mem2mem_vdic_nfb4eof_interrupt, 0,
> +			       nfbname, priv);
> +	if (ret)
> +		goto err_irq_eof;
> +
> +	priv->eof_irq = ipu_idmac_channel_irq(priv->ipu_dev,
> +					      priv->vdi_out_ch,
> +					      IPU_IRQ_EOF);
> +	ret = devm_request_irq(priv->dev, priv->eof_irq,
> +			       ipu_mem2mem_vdic_eof_interrupt, 0,
> +			       eofname, priv);
> +	if (ret)
> +		goto err_irq_eof;
> +
> +	/*
> +	 * Enable PRG, without PRG clock enabled (CCGR6:prg_clk_enable[0]
> +	 * and CCGR6:prg_clk_enable[1]), the VDI does not produce any
> +	 * interrupts at all.
> +	 */
> +	if (ipu_prg_present(priv->ipu_dev))
> +		ipu_prg_enable(priv->ipu_dev);
> +
> +	return 0;
> +
> +err_irq_eof:
> +	ipu_idmac_put(priv->vdi_out_ch);
> +err_out:
> +	ipu_idmac_put(priv->vdi_in_ch_n);
> +err_next:
> +	ipu_idmac_put(priv->vdi_in_ch);
> +err_curr:
> +	ipu_idmac_put(priv->vdi_in_ch_p);
> +err_prev:
> +	ipu_ic_put(priv->ic);
> +err_ic:
> +	ipu_vdi_put(priv->vdi);
> +err_vdi:
> +	return ret;
> +}
> +
> +static void ipu_mem2mem_vdic_put_ipu_resources(struct ipu_mem2mem_vdic_priv *priv)
> +{
> +	ipu_idmac_put(priv->vdi_out_ch);
> +	ipu_idmac_put(priv->vdi_in_ch_n);
> +	ipu_idmac_put(priv->vdi_in_ch);
> +	ipu_idmac_put(priv->vdi_in_ch_p);
> +	ipu_ic_put(priv->ic);
> +	ipu_vdi_put(priv->vdi);
> +}
> +
> +int imx_media_mem2mem_vdic_register(struct imx_media_video_dev *vdev)
> +{
> +	struct ipu_mem2mem_vdic_priv *priv = to_mem2mem_priv(vdev);
> +	struct video_device *vfd = vdev->vfd;
> +	int ret;
> +
> +	vfd->v4l2_dev = &priv->md->v4l2_dev;
> +
> +	ret = ipu_mem2mem_vdic_get_ipu_resources(priv, vfd);
> +	if (ret) {
> +		v4l2_err(vfd->v4l2_dev, "Failed to get VDIC resources (%d)\n", ret);
> +		return ret;
> +	}
> +
> +	ret = video_register_device(vfd, VFL_TYPE_VIDEO, -1);
> +	if (ret) {
> +		v4l2_err(vfd->v4l2_dev, "Failed to register video device\n");
> +		goto err_register;
> +	}
> +
> +	v4l2_info(vfd->v4l2_dev, "Registered %s as /dev/%s\n", vfd->name,
> +		  video_device_node_name(vfd));
> +
> +	return 0;
> +
> +err_register:
> +	ipu_mem2mem_vdic_put_ipu_resources(priv);
> +	return ret;
> +}
> +
> +void imx_media_mem2mem_vdic_unregister(struct imx_media_video_dev *vdev)
> +{
> +	struct ipu_mem2mem_vdic_priv *priv = to_mem2mem_priv(vdev);
> +	struct video_device *vfd = priv->vdev.vfd;
> +
> +	video_unregister_device(vfd);
> +
> +	ipu_mem2mem_vdic_put_ipu_resources(priv);
> +}
> +
> +struct imx_media_video_dev *
> +imx_media_mem2mem_vdic_init(struct imx_media_dev *md, int ipu_id)
> +{
> +	struct ipu_mem2mem_vdic_priv *priv;
> +	struct video_device *vfd;
> +	int ret;
> +
> +	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> +	if (!priv)
> +		return ERR_PTR(-ENOMEM);
> +
> +	priv->md = md;
> +	priv->ipu_id = ipu_id;
> +	priv->ipu_dev = md->ipu[ipu_id];
> +	priv->dev = md->md.dev;
> +
> +	mutex_init(&priv->mutex);
> +
> +	vfd = video_device_alloc();
> +	if (!vfd) {
> +		ret = -ENOMEM;
> +		goto err_vfd;
> +	}
> +
> +	*vfd = mem2mem_template;
> +	vfd->lock = &priv->mutex;
> +	priv->vdev.vfd = vfd;
> +
> +	INIT_LIST_HEAD(&priv->vdev.list);
> +	spin_lock_init(&priv->irqlock);
> +	atomic_set(&priv->stream_count, 0);
> +
> +	video_set_drvdata(vfd, priv);
> +
> +	priv->m2m_dev = v4l2_m2m_init(&m2m_ops);
> +	if (IS_ERR(priv->m2m_dev)) {
> +		ret = PTR_ERR(priv->m2m_dev);
> +		v4l2_err(&md->v4l2_dev, "Failed to init mem2mem device: %d\n",
> +			 ret);
> +		goto err_m2m;
> +	}
> +
> +	/* Reset formats */
> +	priv->fmt[V4L2_M2M_SRC] = ipu_mem2mem_vdic_default;
> +	priv->fmt[V4L2_M2M_SRC].pixelformat = V4L2_PIX_FMT_YUV420;
> +	priv->fmt[V4L2_M2M_SRC].field = V4L2_FIELD_SEQ_TB;
> +	priv->fmt[V4L2_M2M_SRC].bytesperline = DEFAULT_WIDTH;
> +	priv->fmt[V4L2_M2M_SRC].sizeimage = DEFAULT_WIDTH * DEFAULT_HEIGHT * 3 / 2;
> +
> +	priv->fmt[V4L2_M2M_DST] = ipu_mem2mem_vdic_default;
> +	priv->fmt[V4L2_M2M_DST].pixelformat = V4L2_PIX_FMT_RGB565;
> +	priv->fmt[V4L2_M2M_DST].field = V4L2_FIELD_NONE;
> +	priv->fmt[V4L2_M2M_DST].bytesperline = DEFAULT_WIDTH * 2;
> +	priv->fmt[V4L2_M2M_DST].sizeimage = DEFAULT_WIDTH * DEFAULT_HEIGHT * 2;
> +
> +	return &priv->vdev;
> +
> +err_m2m:
> +	video_device_release(vfd);
> +	video_set_drvdata(vfd, NULL);
> +err_vfd:
> +	kfree(priv);
> +	return ERR_PTR(ret);
> +}
> +
> +void imx_media_mem2mem_vdic_uninit(struct imx_media_video_dev *vdev)
> +{
> +	struct ipu_mem2mem_vdic_priv *priv = to_mem2mem_priv(vdev);
> +	struct video_device *vfd = priv->vdev.vfd;
> +
> +	video_device_release(vfd);
> +	video_set_drvdata(vfd, NULL);
> +	kfree(priv);
> +}
> +
> +MODULE_DESCRIPTION("i.MX VDIC mem2mem de-interlace driver");
> +MODULE_AUTHOR("Marek Vasut <marex@denx.de>");
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/staging/media/imx/imx-media.h b/drivers/staging/media/imx/imx-media.h
> index f095d9134fee4..9f2388e306727 100644
> --- a/drivers/staging/media/imx/imx-media.h
> +++ b/drivers/staging/media/imx/imx-media.h
> @@ -162,6 +162,9 @@ struct imx_media_dev {
>  	/* IC scaler/CSC mem2mem video device */
>  	struct imx_media_video_dev *m2m_vdev;
>  
> +	/* VDIC mem2mem video device */
> +	struct imx_media_video_dev *m2m_vdic[2];
> +
>  	/* the IPU internal subdev's registered synchronously */
>  	struct v4l2_subdev *sync_sd[2][NUM_IPU_SUBDEVS];
>  };
> @@ -284,6 +287,13 @@ imx_media_csc_scaler_device_init(struct imx_media_dev *dev);
>  int imx_media_csc_scaler_device_register(struct imx_media_video_dev *vdev);
>  void imx_media_csc_scaler_device_unregister(struct imx_media_video_dev *vdev);
>  
> +/* imx-media-mem2mem-vdic.c */
> +struct imx_media_video_dev *
> +imx_media_mem2mem_vdic_init(struct imx_media_dev *dev, int ipu_id);
> +void imx_media_mem2mem_vdic_uninit(struct imx_media_video_dev *vdev);
> +int imx_media_mem2mem_vdic_register(struct imx_media_video_dev *vdev);
> +void imx_media_mem2mem_vdic_unregister(struct imx_media_video_dev *vdev);
> +
>  /* subdev group ids */
>  #define IMX_MEDIA_GRP_ID_CSI2          BIT(8)
>  #define IMX_MEDIA_GRP_ID_IPU_CSI_BIT   10
Dan Carpenter July 24, 2024, 4:16 p.m. UTC | #2
On Wed, Jul 24, 2024 at 02:19:38AM +0200, Marek Vasut wrote:
> diff --git a/drivers/staging/media/imx/imx-media-dev.c b/drivers/staging/media/imx/imx-media-dev.c
> index be54dca11465d..a841fdb4c2394 100644
> --- a/drivers/staging/media/imx/imx-media-dev.c
> +++ b/drivers/staging/media/imx/imx-media-dev.c
> @@ -57,7 +57,52 @@ static int imx6_media_probe_complete(struct v4l2_async_notifier *notifier)
>  		goto unlock;
>  	}
>  
> +	imxmd->m2m_vdic[0] = imx_media_mem2mem_vdic_init(imxmd, 0);
> +	if (IS_ERR(imxmd->m2m_vdic[0])) {
> +		ret = PTR_ERR(imxmd->m2m_vdic[0]);
> +		imxmd->m2m_vdic[0] = NULL;
> +		goto unlock;
> +	}
> +
> +	/* MX6S/DL has one IPUv3, init second VDI only on MX6Q/QP */
> +	if (imxmd->ipu[1]) {
> +		imxmd->m2m_vdic[1] = imx_media_mem2mem_vdic_init(imxmd, 1);
> +		if (IS_ERR(imxmd->m2m_vdic[1])) {
> +			ret = PTR_ERR(imxmd->m2m_vdic[1]);
> +			imxmd->m2m_vdic[1] = NULL;
> +			goto uninit_vdi0;
> +		}
> +	}
> +
>  	ret = imx_media_csc_scaler_device_register(imxmd->m2m_vdev);
> +	if (ret)
> +		goto uninit_vdi1;
> +
> +	ret = imx_media_mem2mem_vdic_register(imxmd->m2m_vdic[0]);
> +	if (ret)
> +		goto unreg_csc;
> +
> +	/* MX6S/DL has one IPUv3, init second VDI only on MX6Q/QP */
> +	if (imxmd->ipu[1]) {
> +		ret = imx_media_mem2mem_vdic_register(imxmd->m2m_vdic[1]);
> +		if (ret)
> +			goto unreg_vdic;
> +	}
> +
> +	mutex_unlock(&imxmd->mutex);
> +	return ret;

Since it looks like you're going to do another version of this, could
you change this to return 0;

> +
> +unreg_vdic:
> +	imx_media_mem2mem_vdic_unregister(imxmd->m2m_vdic[0]);
> +	imxmd->m2m_vdic[0] = NULL;
> +unreg_csc:
> +	imx_media_csc_scaler_device_unregister(imxmd->m2m_vdev);
> +	imxmd->m2m_vdev = NULL;
> +uninit_vdi1:
> +	if (imxmd->ipu[1])
> +		imx_media_mem2mem_vdic_uninit(imxmd->m2m_vdic[1]);
> +uninit_vdi0:
> +	imx_media_mem2mem_vdic_uninit(imxmd->m2m_vdic[0]);
>  unlock:
>  	mutex_unlock(&imxmd->mutex);
>  	return ret;

[ snip ]

> +static int ipu_mem2mem_vdic_querycap(struct file *file, void *priv,
> +				     struct v4l2_capability *cap)
> +{
> +	strscpy(cap->driver, "imx-m2m-vdic", sizeof(cap->driver));
> +	strscpy(cap->card, "imx-m2m-vdic", sizeof(cap->card));
> +	strscpy(cap->bus_info, "platform:imx-m2m-vdic", sizeof(cap->bus_info));

These days strscpy() is a magic function where the third parameter is
optional.

	strscpy(cap->driver, "imx-m2m-vdic");
	strscpy(cap->card, "imx-m2m-vdic");
	strscpy(cap->bus_info, "platform:imx-m2m-vdic");

Shazaaam!  Magic!

> +	cap->device_caps = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING;
> +	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
> +
> +	return 0;
> +}

regards,
dan carpenter
Marek Vasut July 29, 2024, 2:16 a.m. UTC | #3
On 7/24/24 6:08 PM, Nicolas Dufresne wrote:
> Hi Marek,

Hi,

> Le mercredi 24 juillet 2024 à 02:19 +0200, Marek Vasut a écrit :
>> Introduce dedicated memory-to-memory IPUv3 VDI deinterlacer driver.
>> Currently the IPUv3 can operate VDI in DIRECT mode, from sensor to
>> memory. This only works for single stream, that is, one input from
>> one camera is deinterlaced on the fly with a helper buffer in DRAM
>> and the result is written into memory.
>>
>> The i.MX6Q/QP does support up to four analog cameras via two IPUv3
>> instances, each containing one VDI deinterlacer block. In order to
>> deinterlace all four streams from all four analog cameras live, it
>> is necessary to operate VDI in INDIRECT mode, where the interlaced
>> streams are written to buffers in memory, and then deinterlaced in
>> memory using VDI in INDIRECT memory-to-memory mode.
> 
> Just a quick design question. Is it possible to chain the deinterlacer and the
> csc-scaler ?

I think you could do that.

> If so, it would be much more efficient if all this could be
> combined into the existing m2m driver, since you could save a memory rountrip
> when needing to deinterlace, change the colorspace and possibly scale too.

The existing PRP/IC driver is similar to what this driver does, yes, but 
it uses a different DMA path , I believe it is IDMAC->PRP->IC->IDMAC . 
This driver uses IDMAC->VDI->IC->IDMAC . I am not convinced mixing the 
two paths into a single driver would be beneficial, but I am reasonably 
sure it would be very convoluted. Instead, this driver could be extended 
to do deinterlacing and scaling using the IC if that was needed. I think 
that would be the cleaner approach.
Marek Vasut July 29, 2024, 2:19 a.m. UTC | #4
On 7/24/24 6:16 PM, Dan Carpenter wrote:
> On Wed, Jul 24, 2024 at 02:19:38AM +0200, Marek Vasut wrote:
>> diff --git a/drivers/staging/media/imx/imx-media-dev.c b/drivers/staging/media/imx/imx-media-dev.c
>> index be54dca11465d..a841fdb4c2394 100644
>> --- a/drivers/staging/media/imx/imx-media-dev.c
>> +++ b/drivers/staging/media/imx/imx-media-dev.c
>> @@ -57,7 +57,52 @@ static int imx6_media_probe_complete(struct v4l2_async_notifier *notifier)
>>   		goto unlock;
>>   	}
>>   
>> +	imxmd->m2m_vdic[0] = imx_media_mem2mem_vdic_init(imxmd, 0);
>> +	if (IS_ERR(imxmd->m2m_vdic[0])) {
>> +		ret = PTR_ERR(imxmd->m2m_vdic[0]);
>> +		imxmd->m2m_vdic[0] = NULL;
>> +		goto unlock;
>> +	}
>> +
>> +	/* MX6S/DL has one IPUv3, init second VDI only on MX6Q/QP */
>> +	if (imxmd->ipu[1]) {
>> +		imxmd->m2m_vdic[1] = imx_media_mem2mem_vdic_init(imxmd, 1);
>> +		if (IS_ERR(imxmd->m2m_vdic[1])) {
>> +			ret = PTR_ERR(imxmd->m2m_vdic[1]);
>> +			imxmd->m2m_vdic[1] = NULL;
>> +			goto uninit_vdi0;
>> +		}
>> +	}
>> +
>>   	ret = imx_media_csc_scaler_device_register(imxmd->m2m_vdev);
>> +	if (ret)
>> +		goto uninit_vdi1;
>> +
>> +	ret = imx_media_mem2mem_vdic_register(imxmd->m2m_vdic[0]);
>> +	if (ret)
>> +		goto unreg_csc;
>> +
>> +	/* MX6S/DL has one IPUv3, init second VDI only on MX6Q/QP */
>> +	if (imxmd->ipu[1]) {
>> +		ret = imx_media_mem2mem_vdic_register(imxmd->m2m_vdic[1]);
>> +		if (ret)
>> +			goto unreg_vdic;
>> +	}
>> +
>> +	mutex_unlock(&imxmd->mutex);
>> +	return ret;
> 
> Since it looks like you're going to do another version of this, could
> you change this to return 0;

Fixed up both for V3, thanks .
Nicolas Dufresne July 30, 2024, 4:05 p.m. UTC | #5
Le lundi 29 juillet 2024 à 04:16 +0200, Marek Vasut a écrit :
> On 7/24/24 6:08 PM, Nicolas Dufresne wrote:
> > Hi Marek,
> 
> Hi,
> 
> > Le mercredi 24 juillet 2024 à 02:19 +0200, Marek Vasut a écrit :
> > > Introduce dedicated memory-to-memory IPUv3 VDI deinterlacer driver.
> > > Currently the IPUv3 can operate VDI in DIRECT mode, from sensor to
> > > memory. This only works for single stream, that is, one input from
> > > one camera is deinterlaced on the fly with a helper buffer in DRAM
> > > and the result is written into memory.
> > > 
> > > The i.MX6Q/QP does support up to four analog cameras via two IPUv3
> > > instances, each containing one VDI deinterlacer block. In order to
> > > deinterlace all four streams from all four analog cameras live, it
> > > is necessary to operate VDI in INDIRECT mode, where the interlaced
> > > streams are written to buffers in memory, and then deinterlaced in
> > > memory using VDI in INDIRECT memory-to-memory mode.
> > 
> > Just a quick design question. Is it possible to chain the deinterlacer and the
> > csc-scaler ?
> 
> I think you could do that.
> 
> > If so, it would be much more efficient if all this could be
> > combined into the existing m2m driver, since you could save a memory rountrip
> > when needing to deinterlace, change the colorspace and possibly scale too.
> 
> The existing PRP/IC driver is similar to what this driver does, yes, but 
> it uses a different DMA path , I believe it is IDMAC->PRP->IC->IDMAC . 
> This driver uses IDMAC->VDI->IC->IDMAC . I am not convinced mixing the 
> two paths into a single driver would be beneficial, but I am reasonably 
> sure it would be very convoluted. Instead, this driver could be extended 
> to do deinterlacing and scaling using the IC if that was needed. I think 
> that would be the cleaner approach.

Not that I only meant to ask if there was a path to combine
CSC/Scaling/Deinterlacing without a memory rountrip. If a rountrip is needed
anyway, I would rather make separate video nodes, and leave it to userspace to
deal with. Though, if we can avoid it, a combined driver should be highly
beneficial.

cheers,
Nicolas
Philipp Zabel Sept. 6, 2024, 9:01 a.m. UTC | #6
Hi Marek,

On Mi, 2024-07-24 at 02:19 +0200, Marek Vasut wrote:
> Introduce dedicated memory-to-memory IPUv3 VDI deinterlacer driver.
> Currently the IPUv3 can operate VDI in DIRECT mode, from sensor to
> memory. This only works for single stream, that is, one input from
> one camera is deinterlaced on the fly with a helper buffer in DRAM
> and the result is written into memory.
> 
> The i.MX6Q/QP does support up to four analog cameras via two IPUv3
> instances, each containing one VDI deinterlacer block. In order to
> deinterlace all four streams from all four analog cameras live, it
> is necessary to operate VDI in INDIRECT mode, where the interlaced
> streams are written to buffers in memory, and then deinterlaced in
> memory using VDI in INDIRECT memory-to-memory mode.
> 
> This driver also makes use of the IDMAC->VDI->IC->IDMAC data path
> to provide pixel format conversion from input YUV formats to both
> output YUV or RGB formats. The later is useful in case the data
> are imported into the GPU, which on this platform cannot directly
> sample YUV buffers.
> 
> This is derived from previous work by Steve Longerbeam and from the
> IPUv3 CSC Scaler mem2mem driver.
> 
> Signed-off-by: Marek Vasut <marex@denx.de>
> ---
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: David Airlie <airlied@gmail.com>
> Cc: Fabio Estevam <festevam@gmail.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Helge Deller <deller@gmx.de>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
> Cc: Philipp Zabel <p.zabel@pengutronix.de>
> Cc: Sascha Hauer <s.hauer@pengutronix.de>
> Cc: Shawn Guo <shawnguo@kernel.org>
> Cc: Steve Longerbeam <slongerbeam@gmail.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: imx@lists.linux.dev
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: linux-staging@lists.linux.dev
> ---
> V2: - Add complementary imx_media_mem2mem_vdic_uninit()
>     - Drop uninitiaized ret from ipu_mem2mem_vdic_device_run()
>     - Drop duplicate nbuffers assignment in ipu_mem2mem_vdic_queue_setup()
>     - Fix %u formatting string in ipu_mem2mem_vdic_queue_setup()
>     - Drop devm_*free from ipu_mem2mem_vdic_get_ipu_resources() fail path
>       and ipu_mem2mem_vdic_put_ipu_resources()
>     - Add missing video_device_release()
> ---
>  drivers/staging/media/imx/Makefile            |   2 +-
>  drivers/staging/media/imx/imx-media-dev.c     |  55 +
>  .../media/imx/imx-media-mem2mem-vdic.c        | 997 ++++++++++++++++++
>  drivers/staging/media/imx/imx-media.h         |  10 +
>  4 files changed, 1063 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/staging/media/imx/imx-media-mem2mem-vdic.c
> 
> diff --git a/drivers/staging/media/imx/Makefile b/drivers/staging/media/imx/Makefile
> index 330e0825f506b..0cad87123b590 100644
> --- a/drivers/staging/media/imx/Makefile
> +++ b/drivers/staging/media/imx/Makefile
> @@ -4,7 +4,7 @@ imx-media-common-objs := imx-media-capture.o imx-media-dev-common.o \
>  
>  imx6-media-objs := imx-media-dev.o imx-media-internal-sd.o \
>  	imx-ic-common.o imx-ic-prp.o imx-ic-prpencvf.o imx-media-vdic.o \
> -	imx-media-csc-scaler.o
> +	imx-media-mem2mem-vdic.o imx-media-csc-scaler.o
>  
>  imx6-media-csi-objs := imx-media-csi.o imx-media-fim.o
>  
> diff --git a/drivers/staging/media/imx/imx-media-dev.c b/drivers/staging/media/imx/imx-media-dev.c
> index be54dca11465d..a841fdb4c2394 100644
> --- a/drivers/staging/media/imx/imx-media-dev.c
> +++ b/drivers/staging/media/imx/imx-media-dev.c
> @@ -57,7 +57,52 @@ static int imx6_media_probe_complete(struct v4l2_async_notifier *notifier)
>  		goto unlock;
>  	}
>  
> +	imxmd->m2m_vdic[0] = imx_media_mem2mem_vdic_init(imxmd, 0);
> +	if (IS_ERR(imxmd->m2m_vdic[0])) {
> +		ret = PTR_ERR(imxmd->m2m_vdic[0]);
> +		imxmd->m2m_vdic[0] = NULL;
> +		goto unlock;
> +	}
> +
> +	/* MX6S/DL has one IPUv3, init second VDI only on MX6Q/QP */
> +	if (imxmd->ipu[1]) {
> +		imxmd->m2m_vdic[1] = imx_media_mem2mem_vdic_init(imxmd, 1);
> +		if (IS_ERR(imxmd->m2m_vdic[1])) {
> +			ret = PTR_ERR(imxmd->m2m_vdic[1]);
> +			imxmd->m2m_vdic[1] = NULL;
> +			goto uninit_vdi0;
> +		}
> +	}

Instead of presenting two devices to userspace, it would be better to
have a single video device that can distribute work to both IPUs.
To be fair, we never implemented that for the CSC/scaler mem2mem device
either.

> +
>  	ret = imx_media_csc_scaler_device_register(imxmd->m2m_vdev);
> +	if (ret)
> +		goto uninit_vdi1;
> +
> +	ret = imx_media_mem2mem_vdic_register(imxmd->m2m_vdic[0]);
> +	if (ret)
> +		goto unreg_csc;
> +
> +	/* MX6S/DL has one IPUv3, init second VDI only on MX6Q/QP */
> +	if (imxmd->ipu[1]) {
> +		ret = imx_media_mem2mem_vdic_register(imxmd->m2m_vdic[1]);
> +		if (ret)
> +			goto unreg_vdic;
> +	}
> +
> +	mutex_unlock(&imxmd->mutex);
> +	return ret;
> +
> +unreg_vdic:
> +	imx_media_mem2mem_vdic_unregister(imxmd->m2m_vdic[0]);
> +	imxmd->m2m_vdic[0] = NULL;
> +unreg_csc:
> +	imx_media_csc_scaler_device_unregister(imxmd->m2m_vdev);
> +	imxmd->m2m_vdev = NULL;
> +uninit_vdi1:
> +	if (imxmd->ipu[1])
> +		imx_media_mem2mem_vdic_uninit(imxmd->m2m_vdic[1]);
> +uninit_vdi0:
> +	imx_media_mem2mem_vdic_uninit(imxmd->m2m_vdic[0]);
>  unlock:
>  	mutex_unlock(&imxmd->mutex);
>  	return ret;
> @@ -108,6 +153,16 @@ static void imx_media_remove(struct platform_device *pdev)
>  
>  	v4l2_info(&imxmd->v4l2_dev, "Removing imx-media\n");
>  
> +	if (imxmd->m2m_vdic[1]) {	/* MX6Q/QP only */
> +		imx_media_mem2mem_vdic_unregister(imxmd->m2m_vdic[1]);
> +		imxmd->m2m_vdic[1] = NULL;
> +	}
> +
> +	if (imxmd->m2m_vdic[0]) {
> +		imx_media_mem2mem_vdic_unregister(imxmd->m2m_vdic[0]);
> +		imxmd->m2m_vdic[0] = NULL;
> +	}
> +
>  	if (imxmd->m2m_vdev) {
>  		imx_media_csc_scaler_device_unregister(imxmd->m2m_vdev);
>  		imxmd->m2m_vdev = NULL;
> diff --git a/drivers/staging/media/imx/imx-media-mem2mem-vdic.c b/drivers/staging/media/imx/imx-media-mem2mem-vdic.c
> new file mode 100644
> index 0000000000000..71c6c023d2bf8
> --- /dev/null
> +++ b/drivers/staging/media/imx/imx-media-mem2mem-vdic.c
> @@ -0,0 +1,997 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * i.MX VDIC mem2mem de-interlace driver
> + *
> + * Copyright (c) 2024 Marek Vasut <marex@denx.de>
> + *
> + * Based on previous VDIC mem2mem work by Steve Longerbeam that is:
> + * Copyright (c) 2018 Mentor Graphics Inc.
> + */
> +
> +#include <linux/delay.h>
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/version.h>
> +
> +#include <media/media-device.h>
> +#include <media/v4l2-ctrls.h>
> +#include <media/v4l2-device.h>
> +#include <media/v4l2-event.h>
> +#include <media/v4l2-ioctl.h>
> +#include <media/v4l2-mem2mem.h>
> +#include <media/videobuf2-dma-contig.h>
> +
> +#include "imx-media.h"
> +
> +#define fh_to_ctx(__fh)	container_of(__fh, struct ipu_mem2mem_vdic_ctx, fh)
> +
> +#define to_mem2mem_priv(v) container_of(v, struct ipu_mem2mem_vdic_priv, vdev)

These could be inline functions for added type safety.

> +
> +enum {
> +	V4L2_M2M_SRC = 0,
> +	V4L2_M2M_DST = 1,
> +};
> +
> +struct ipu_mem2mem_vdic_ctx;
> +
> +struct ipu_mem2mem_vdic_priv {
> +	struct imx_media_video_dev	vdev;
> +	struct imx_media_dev		*md;
> +	struct device			*dev;
> +	struct ipu_soc			*ipu_dev;
> +	int				ipu_id;
> +
> +	struct v4l2_m2m_dev		*m2m_dev;
> +	struct mutex			mutex;		/* mem2mem device mutex */
> +
> +	/* VDI resources */
> +	struct ipu_vdi			*vdi;
> +	struct ipu_ic			*ic;
> +	struct ipuv3_channel		*vdi_in_ch_p;
> +	struct ipuv3_channel		*vdi_in_ch;
> +	struct ipuv3_channel		*vdi_in_ch_n;
> +	struct ipuv3_channel		*vdi_out_ch;
> +	int				eof_irq;
> +	int				nfb4eof_irq;
> +	spinlock_t			irqlock;	/* protect eof_irq handler */
> +
> +	atomic_t			stream_count;
> +
> +	struct ipu_mem2mem_vdic_ctx	*curr_ctx;
> +
> +	struct v4l2_pix_format		fmt[2];
> +};
> +
> +struct ipu_mem2mem_vdic_ctx {
> +	struct ipu_mem2mem_vdic_priv	*priv;
> +	struct v4l2_fh			fh;
> +	unsigned int			sequence;
> +	struct vb2_v4l2_buffer		*prev_buf;
> +	struct vb2_v4l2_buffer		*curr_buf;
> +};
> +
> +static struct v4l2_pix_format *
> +ipu_mem2mem_vdic_get_format(struct ipu_mem2mem_vdic_priv *priv,
> +			    enum v4l2_buf_type type)
> +{
> +	return &priv->fmt[V4L2_TYPE_IS_OUTPUT(type) ? V4L2_M2M_SRC : V4L2_M2M_DST];
> +}
> +
> +static bool ipu_mem2mem_vdic_format_is_yuv420(const u32 pixelformat)
> +{
> +	/* All 4:2:0 subsampled formats supported by this hardware */
> +	return pixelformat == V4L2_PIX_FMT_YUV420 ||
> +	       pixelformat == V4L2_PIX_FMT_YVU420 ||
> +	       pixelformat == V4L2_PIX_FMT_NV12;
> +}
> +
> +static bool ipu_mem2mem_vdic_format_is_yuv422(const u32 pixelformat)
> +{
> +	/* All 4:2:2 subsampled formats supported by this hardware */
> +	return pixelformat == V4L2_PIX_FMT_UYVY ||
> +	       pixelformat == V4L2_PIX_FMT_YUYV ||
> +	       pixelformat == V4L2_PIX_FMT_YUV422P ||
> +	       pixelformat == V4L2_PIX_FMT_NV16;
> +}
> +
> +static bool ipu_mem2mem_vdic_format_is_yuv(const u32 pixelformat)
> +{
> +	return ipu_mem2mem_vdic_format_is_yuv420(pixelformat) ||
> +	       ipu_mem2mem_vdic_format_is_yuv422(pixelformat);
> +}
> +
> +static bool ipu_mem2mem_vdic_format_is_rgb16(const u32 pixelformat)
> +{
> +	/* All 16-bit RGB formats supported by this hardware */
> +	return pixelformat == V4L2_PIX_FMT_RGB565;
> +}
> +
> +static bool ipu_mem2mem_vdic_format_is_rgb24(const u32 pixelformat)
> +{
> +	/* All 24-bit RGB formats supported by this hardware */
> +	return pixelformat == V4L2_PIX_FMT_RGB24 ||
> +	       pixelformat == V4L2_PIX_FMT_BGR24;
> +}
> +
> +static bool ipu_mem2mem_vdic_format_is_rgb32(const u32 pixelformat)
> +{
> +	/* All 32-bit RGB formats supported by this hardware */
> +	return pixelformat == V4L2_PIX_FMT_XRGB32 ||
> +	       pixelformat == V4L2_PIX_FMT_XBGR32 ||
> +	       pixelformat == V4L2_PIX_FMT_BGRX32 ||
> +	       pixelformat == V4L2_PIX_FMT_RGBX32;
> +}
> +
> +/*
> + * mem2mem callbacks
> + */
> +static irqreturn_t ipu_mem2mem_vdic_eof_interrupt(int irq, void *dev_id)
> +{
> +	struct ipu_mem2mem_vdic_priv *priv = dev_id;
> +	struct ipu_mem2mem_vdic_ctx *ctx = priv->curr_ctx;
> +	struct vb2_v4l2_buffer *src_buf, *dst_buf;
> +
> +	spin_lock(&priv->irqlock);
> +
> +	src_buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
> +	dst_buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
> +
> +	v4l2_m2m_buf_copy_metadata(src_buf, dst_buf, true);
> +
> +	src_buf->sequence = ctx->sequence++;
> +	dst_buf->sequence = src_buf->sequence;
> +
> +	v4l2_m2m_buf_done(src_buf, VB2_BUF_STATE_DONE);
> +	v4l2_m2m_buf_done(dst_buf, VB2_BUF_STATE_DONE);
> +
> +	v4l2_m2m_job_finish(priv->m2m_dev, ctx->fh.m2m_ctx);
> +
> +	spin_unlock(&priv->irqlock);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t ipu_mem2mem_vdic_nfb4eof_interrupt(int irq, void *dev_id)
> +{
> +	struct ipu_mem2mem_vdic_priv *priv = dev_id;
> +
> +	/* That is about all we can do about it, report it. */
> +	dev_warn_ratelimited(priv->dev, "NFB4EOF error interrupt occurred\n");
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static void ipu_mem2mem_vdic_device_run(void *_ctx)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = _ctx;
> +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
> +	struct vb2_v4l2_buffer *curr_buf, *dst_buf;
> +	dma_addr_t prev_phys, curr_phys, out_phys;
> +	struct v4l2_pix_format *infmt;
> +	u32 phys_offset = 0;
> +	unsigned long flags;
> +
> +	infmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_OUTPUT);
> +	if (V4L2_FIELD_IS_SEQUENTIAL(infmt->field))
> +		phys_offset = infmt->sizeimage / 2;
> +	else if (V4L2_FIELD_IS_INTERLACED(infmt->field))
> +		phys_offset = infmt->bytesperline;
> +	else
> +		dev_err(priv->dev, "Invalid field %d\n", infmt->field);
> +
> +	dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
> +	out_phys = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
> +
> +	curr_buf = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
> +	if (!curr_buf) {
> +		dev_err(priv->dev, "Not enough buffers\n");
> +		return;
> +	}
> +
> +	spin_lock_irqsave(&priv->irqlock, flags);
> +
> +	if (ctx->curr_buf) {
> +		ctx->prev_buf = ctx->curr_buf;
> +		ctx->curr_buf = curr_buf;
> +	} else {
> +		ctx->prev_buf = curr_buf;
> +		ctx->curr_buf = curr_buf;
> +		dev_warn(priv->dev, "Single-buffer mode, fix your userspace\n");
> +	}
> +
> +	prev_phys = vb2_dma_contig_plane_dma_addr(&ctx->prev_buf->vb2_buf, 0);
> +	curr_phys = vb2_dma_contig_plane_dma_addr(&ctx->curr_buf->vb2_buf, 0);
> +
> +	priv->curr_ctx = ctx;
> +	spin_unlock_irqrestore(&priv->irqlock, flags);
> +
> +	ipu_cpmem_set_buffer(priv->vdi_out_ch,  0, out_phys);
> +	ipu_cpmem_set_buffer(priv->vdi_in_ch_p, 0, prev_phys + phys_offset);
> +	ipu_cpmem_set_buffer(priv->vdi_in_ch,   0, curr_phys);
> +	ipu_cpmem_set_buffer(priv->vdi_in_ch_n, 0, curr_phys + phys_offset);

This always outputs at a frame rate of half the field rate, and only
top fields are ever used as current field, and bottom fields as
previous/next fields, right?

I think it would be good to add a mode that doesn't drop the

	ipu_cpmem_set_buffer(priv->vdi_in_ch_p, 0, prev_phys);
	ipu_cpmem_set_buffer(priv->vdi_in_ch,   0, prev_phys + phys_offset);
	ipu_cpmem_set_buffer(priv->vdi_in_ch_n, 0, curr_phys);

output frames, right from the start.

If we don't start with that supported, I fear userspace will make
assumptions and be surprised when a full rate mode is added later.

> +
> +	/* No double buffering, always pick buffer 0 */
> +	ipu_idmac_select_buffer(priv->vdi_out_ch, 0);
> +	ipu_idmac_select_buffer(priv->vdi_in_ch_p, 0);
> +	ipu_idmac_select_buffer(priv->vdi_in_ch, 0);
> +	ipu_idmac_select_buffer(priv->vdi_in_ch_n, 0);
> +
> +	/* Enable the channels */
> +	ipu_idmac_enable_channel(priv->vdi_out_ch);
> +	ipu_idmac_enable_channel(priv->vdi_in_ch_p);
> +	ipu_idmac_enable_channel(priv->vdi_in_ch);
> +	ipu_idmac_enable_channel(priv->vdi_in_ch_n);
> +}
> +
> +/*
> + * Video ioctls
> + */
> +static int ipu_mem2mem_vdic_querycap(struct file *file, void *priv,
> +				     struct v4l2_capability *cap)
> +{
> +	strscpy(cap->driver, "imx-m2m-vdic", sizeof(cap->driver));
> +	strscpy(cap->card, "imx-m2m-vdic", sizeof(cap->card));
> +	strscpy(cap->bus_info, "platform:imx-m2m-vdic", sizeof(cap->bus_info));
> +	cap->device_caps = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING;
> +	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
> +
> +	return 0;
> +}
> +
> +static int ipu_mem2mem_vdic_enum_fmt(struct file *file, void *fh, struct v4l2_fmtdesc *f)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = fh_to_ctx(fh);
> +	struct vb2_queue *vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
> +	enum imx_pixfmt_sel cs = vq->type == V4L2_BUF_TYPE_VIDEO_CAPTURE ?
> +				 PIXFMT_SEL_YUV_RGB : PIXFMT_SEL_YUV;
> +	u32 fourcc;
> +	int ret;
> +
> +	ret = imx_media_enum_pixel_formats(&fourcc, f->index, cs, 0);
> +	if (ret)
> +		return ret;
> +
> +	f->pixelformat = fourcc;
> +
> +	return 0;
> +}
> +
> +static int ipu_mem2mem_vdic_g_fmt(struct file *file, void *fh, struct v4l2_format *f)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = fh_to_ctx(fh);
> +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
> +	struct v4l2_pix_format *fmt = ipu_mem2mem_vdic_get_format(priv, f->type);
> +
> +	f->fmt.pix = *fmt;
> +
> +	return 0;
> +}
> +
> +static int ipu_mem2mem_vdic_try_fmt(struct file *file, void *fh,
> +				    struct v4l2_format *f)
> +{
> +	const struct imx_media_pixfmt *cc;
> +	enum imx_pixfmt_sel cs;
> +	u32 fourcc;
> +
> +	if (f->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) {	/* Output */
> +		cs = PIXFMT_SEL_YUV_RGB;	/* YUV direct / RGB via IC */
> +
> +		f->fmt.pix.field = V4L2_FIELD_NONE;
> +	} else {
> +		cs = PIXFMT_SEL_YUV;		/* YUV input only */
> +
> +		/*
> +		 * Input must be interlaced with frame order.
> +		 * Fall back to SEQ_TB otherwise.
> +		 */
> +		if (!V4L2_FIELD_HAS_BOTH(f->fmt.pix.field) ||
> +		    f->fmt.pix.field == V4L2_FIELD_INTERLACED)
> +			f->fmt.pix.field = V4L2_FIELD_SEQ_TB;
> +	}
> +
> +	fourcc = f->fmt.pix.pixelformat;
> +	cc = imx_media_find_pixel_format(fourcc, cs);
> +	if (!cc) {
> +		imx_media_enum_pixel_formats(&fourcc, 0, cs, 0);
> +		cc = imx_media_find_pixel_format(fourcc, cs);
> +	}
> +
> +	f->fmt.pix.pixelformat = cc->fourcc;
> +
> +	v4l_bound_align_image(&f->fmt.pix.width,
> +			      1, 968, 1,
> +			      &f->fmt.pix.height,
> +			      1, 1024, 1, 1);
> +
> +	if (ipu_mem2mem_vdic_format_is_yuv420(f->fmt.pix.pixelformat))
> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 3 / 2;
> +	else if (ipu_mem2mem_vdic_format_is_yuv422(f->fmt.pix.pixelformat))
> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 2;
> +	else if (ipu_mem2mem_vdic_format_is_rgb16(f->fmt.pix.pixelformat))
> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 2;
> +	else if (ipu_mem2mem_vdic_format_is_rgb24(f->fmt.pix.pixelformat))
> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 3;
> +	else if (ipu_mem2mem_vdic_format_is_rgb32(f->fmt.pix.pixelformat))
> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 4;
> +	else
> +		f->fmt.pix.bytesperline = f->fmt.pix.width;
> +
> +	f->fmt.pix.sizeimage = f->fmt.pix.height * f->fmt.pix.bytesperline;
> +
> +	return 0;
> +}
> +
> +static int ipu_mem2mem_vdic_s_fmt(struct file *file, void *fh, struct v4l2_format *f)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = fh_to_ctx(fh);
> +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
> +	struct v4l2_pix_format *fmt, *infmt, *outfmt;
> +	struct vb2_queue *vq;
> +	int ret;
> +
> +	vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
> +	if (vb2_is_busy(vq)) {
> +		dev_err(priv->dev, "%s queue busy\n",  __func__);
> +		return -EBUSY;
> +	}
> +
> +	ret = ipu_mem2mem_vdic_try_fmt(file, fh, f);
> +	if (ret < 0)
> +		return ret;
> +
> +	fmt = ipu_mem2mem_vdic_get_format(priv, f->type);
> +	*fmt = f->fmt.pix;
> +
> +	/* Propagate colorimetry to the capture queue */
> +	infmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_OUTPUT);
> +	outfmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_CAPTURE);
> +	outfmt->colorspace = infmt->colorspace;
> +	outfmt->ycbcr_enc = infmt->ycbcr_enc;
> +	outfmt->xfer_func = infmt->xfer_func;
> +	outfmt->quantization = infmt->quantization;
> +
> +	return 0;
> +}
> +
> +static const struct v4l2_ioctl_ops mem2mem_ioctl_ops = {
> +	.vidioc_querycap		= ipu_mem2mem_vdic_querycap,
> +
> +	.vidioc_enum_fmt_vid_cap	= ipu_mem2mem_vdic_enum_fmt,
> +	.vidioc_g_fmt_vid_cap		= ipu_mem2mem_vdic_g_fmt,
> +	.vidioc_try_fmt_vid_cap		= ipu_mem2mem_vdic_try_fmt,
> +	.vidioc_s_fmt_vid_cap		= ipu_mem2mem_vdic_s_fmt,
> +
> +	.vidioc_enum_fmt_vid_out	= ipu_mem2mem_vdic_enum_fmt,
> +	.vidioc_g_fmt_vid_out		= ipu_mem2mem_vdic_g_fmt,
> +	.vidioc_try_fmt_vid_out		= ipu_mem2mem_vdic_try_fmt,
> +	.vidioc_s_fmt_vid_out		= ipu_mem2mem_vdic_s_fmt,
> +
> +	.vidioc_reqbufs			= v4l2_m2m_ioctl_reqbufs,
> +	.vidioc_querybuf		= v4l2_m2m_ioctl_querybuf,
> +
> +	.vidioc_qbuf			= v4l2_m2m_ioctl_qbuf,
> +	.vidioc_expbuf			= v4l2_m2m_ioctl_expbuf,
> +	.vidioc_dqbuf			= v4l2_m2m_ioctl_dqbuf,
> +	.vidioc_create_bufs		= v4l2_m2m_ioctl_create_bufs,
> +
> +	.vidioc_streamon		= v4l2_m2m_ioctl_streamon,
> +	.vidioc_streamoff		= v4l2_m2m_ioctl_streamoff,
> +
> +	.vidioc_subscribe_event		= v4l2_ctrl_subscribe_event,
> +	.vidioc_unsubscribe_event	= v4l2_event_unsubscribe,
> +};
> +
> +/*
> + * Queue operations
> + */
> +static int ipu_mem2mem_vdic_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
> +					unsigned int *nplanes, unsigned int sizes[],
> +					struct device *alloc_devs[])
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(vq);
> +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
> +	struct v4l2_pix_format *fmt = ipu_mem2mem_vdic_get_format(priv, vq->type);
> +	unsigned int count = *nbuffers;
> +
> +	if (*nplanes)
> +		return sizes[0] < fmt->sizeimage ? -EINVAL : 0;
> +
> +	*nplanes = 1;
> +	sizes[0] = fmt->sizeimage;
> +
> +	dev_dbg(ctx->priv->dev, "get %u buffer(s) of size %d each.\n",
> +		count, fmt->sizeimage);
> +
> +	return 0;
> +}
> +
> +static int ipu_mem2mem_vdic_buf_prepare(struct vb2_buffer *vb)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
> +	struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
> +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
> +	struct vb2_queue *vq = vb->vb2_queue;
> +	struct v4l2_pix_format *fmt;
> +	unsigned long size;
> +
> +	dev_dbg(ctx->priv->dev, "type: %d\n", vb->vb2_queue->type);
> +
> +	if (V4L2_TYPE_IS_OUTPUT(vq->type)) {
> +		if (vbuf->field == V4L2_FIELD_ANY)
> +			vbuf->field = V4L2_FIELD_SEQ_TB;
> +		if (!V4L2_FIELD_HAS_BOTH(vbuf->field)) {
> +			dev_dbg(ctx->priv->dev, "%s: field isn't supported\n",
> +				__func__);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	fmt = ipu_mem2mem_vdic_get_format(priv, vb->vb2_queue->type);
> +	size = fmt->sizeimage;
> +
> +	if (vb2_plane_size(vb, 0) < size) {
> +		dev_dbg(ctx->priv->dev,
> +			"%s: data will not fit into plane (%lu < %lu)\n",
> +			__func__, vb2_plane_size(vb, 0), size);
> +		return -EINVAL;
> +	}
> +
> +	vb2_set_plane_payload(vb, 0, fmt->sizeimage);
> +
> +	return 0;
> +}
> +
> +static void ipu_mem2mem_vdic_buf_queue(struct vb2_buffer *vb)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
> +
> +	v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, to_vb2_v4l2_buffer(vb));
> +}
> +
> +/* VDIC hardware setup */
> +static int ipu_mem2mem_vdic_setup_channel(struct ipu_mem2mem_vdic_priv *priv,
> +					  struct ipuv3_channel *channel,
> +					  struct v4l2_pix_format *fmt,
> +					  bool in)
> +{
> +	struct ipu_image image = { 0 };
> +	unsigned int burst_size;
> +	int ret;
> +
> +	image.pix = *fmt;
> +	image.rect.width = image.pix.width;
> +	image.rect.height = image.pix.height;
> +
> +	ipu_cpmem_zero(channel);
> +
> +	if (in) {
> +		/* One field to VDIC channels */
> +		image.pix.height /= 2;
> +		image.rect.height /= 2;
> +	} else {
> +		/* Skip writing U and V components to odd rows */
> +		if (ipu_mem2mem_vdic_format_is_yuv420(image.pix.pixelformat))
> +			ipu_cpmem_skip_odd_chroma_rows(channel);
> +	}
> +
> +	ret = ipu_cpmem_set_image(channel, &image);
> +	if (ret)
> +		return ret;
> +
> +	burst_size = (image.pix.width & 0xf) ? 8 : 16;
> +	ipu_cpmem_set_burstsize(channel, burst_size);
> +
> +	if (!ipu_prg_present(priv->ipu_dev))
> +		ipu_cpmem_set_axi_id(channel, 1);
> +
> +	ipu_idmac_set_double_buffer(channel, false);
> +
> +	return 0;
> +}
> +
> +static int ipu_mem2mem_vdic_setup_hardware(struct ipu_mem2mem_vdic_priv *priv)
> +{
> +	struct v4l2_pix_format *infmt, *outfmt;
> +	struct ipu_ic_csc csc;
> +	bool in422, outyuv;
> +	int ret;
> +
> +	infmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_OUTPUT);
> +	outfmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_CAPTURE);
> +	in422 = ipu_mem2mem_vdic_format_is_yuv422(infmt->pixelformat);
> +	outyuv = ipu_mem2mem_vdic_format_is_yuv(outfmt->pixelformat);
> +
> +	ipu_vdi_setup(priv->vdi, in422, infmt->width, infmt->height);
> +	ipu_vdi_set_field_order(priv->vdi, V4L2_STD_UNKNOWN, infmt->field);
> +	ipu_vdi_set_motion(priv->vdi, HIGH_MOTION);

This maps to VDI_C_MOT_SEL_FULL aka VDI_MOT_SEL=2, which is documented
as "full motion, only vertical filter is used". Doesn't this completely
ignore the previous/next fields and only use the output of the di_vfilt
four tap vertical filter block to fill in missing lines from the
surrounding pixels (above and below) of the current field?

I think this should at least be configurable, and probably default to
MED_MOTION.

regards
Philipp
Marek Vasut Sept. 24, 2024, 3:28 p.m. UTC | #7
On 9/6/24 11:01 AM, Philipp Zabel wrote:

Hi,

>> diff --git a/drivers/staging/media/imx/imx-media-dev.c b/drivers/staging/media/imx/imx-media-dev.c
>> index be54dca11465d..a841fdb4c2394 100644
>> --- a/drivers/staging/media/imx/imx-media-dev.c
>> +++ b/drivers/staging/media/imx/imx-media-dev.c
>> @@ -57,7 +57,52 @@ static int imx6_media_probe_complete(struct v4l2_async_notifier *notifier)
>>   		goto unlock;
>>   	}
>>   
>> +	imxmd->m2m_vdic[0] = imx_media_mem2mem_vdic_init(imxmd, 0);
>> +	if (IS_ERR(imxmd->m2m_vdic[0])) {
>> +		ret = PTR_ERR(imxmd->m2m_vdic[0]);
>> +		imxmd->m2m_vdic[0] = NULL;
>> +		goto unlock;
>> +	}
>> +
>> +	/* MX6S/DL has one IPUv3, init second VDI only on MX6Q/QP */
>> +	if (imxmd->ipu[1]) {
>> +		imxmd->m2m_vdic[1] = imx_media_mem2mem_vdic_init(imxmd, 1);
>> +		if (IS_ERR(imxmd->m2m_vdic[1])) {
>> +			ret = PTR_ERR(imxmd->m2m_vdic[1]);
>> +			imxmd->m2m_vdic[1] = NULL;
>> +			goto uninit_vdi0;
>> +		}
>> +	}
> 
> Instead of presenting two devices to userspace, it would be better to
> have a single video device that can distribute work to both IPUs.

Why do you think so ?

I think it is better to keep the kernel code as simple as possible, i.e. 
provide the device node for each m2m device to userspace and handle the 
m2m device hardware interaction in the kernel driver, but let userspace 
take care of policy like job scheduling, access permissions assignment 
to each device (e.g. if different user accounts should have access to 
different VDICs), or other such topics.

> To be fair, we never implemented that for the CSC/scaler mem2mem device
> either.

I don't think that is actually a good idea. Instead, it would be better 
to have two scaler nodes in userspace.

[...]

>> +++ b/drivers/staging/media/imx/imx-media-mem2mem-vdic.c
>> @@ -0,0 +1,997 @@
>> +// SPDX-License-Identifier: GPL-2.0-or-later
>> +/*
>> + * i.MX VDIC mem2mem de-interlace driver
>> + *
>> + * Copyright (c) 2024 Marek Vasut <marex@denx.de>
>> + *
>> + * Based on previous VDIC mem2mem work by Steve Longerbeam that is:
>> + * Copyright (c) 2018 Mentor Graphics Inc.
>> + */
>> +
>> +#include <linux/delay.h>
>> +#include <linux/fs.h>
>> +#include <linux/module.h>
>> +#include <linux/sched.h>
>> +#include <linux/slab.h>
>> +#include <linux/version.h>
>> +
>> +#include <media/media-device.h>
>> +#include <media/v4l2-ctrls.h>
>> +#include <media/v4l2-device.h>
>> +#include <media/v4l2-event.h>
>> +#include <media/v4l2-ioctl.h>
>> +#include <media/v4l2-mem2mem.h>
>> +#include <media/videobuf2-dma-contig.h>
>> +
>> +#include "imx-media.h"
>> +
>> +#define fh_to_ctx(__fh)	container_of(__fh, struct ipu_mem2mem_vdic_ctx, fh)
>> +
>> +#define to_mem2mem_priv(v) container_of(v, struct ipu_mem2mem_vdic_priv, vdev)
> 
> These could be inline functions for added type safety.

Fixed in v3

[...]

>> +static void ipu_mem2mem_vdic_device_run(void *_ctx)
>> +{
>> +	struct ipu_mem2mem_vdic_ctx *ctx = _ctx;
>> +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
>> +	struct vb2_v4l2_buffer *curr_buf, *dst_buf;
>> +	dma_addr_t prev_phys, curr_phys, out_phys;
>> +	struct v4l2_pix_format *infmt;
>> +	u32 phys_offset = 0;
>> +	unsigned long flags;
>> +
>> +	infmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_OUTPUT);
>> +	if (V4L2_FIELD_IS_SEQUENTIAL(infmt->field))
>> +		phys_offset = infmt->sizeimage / 2;
>> +	else if (V4L2_FIELD_IS_INTERLACED(infmt->field))
>> +		phys_offset = infmt->bytesperline;
>> +	else
>> +		dev_err(priv->dev, "Invalid field %d\n", infmt->field);
>> +
>> +	dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
>> +	out_phys = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
>> +
>> +	curr_buf = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
>> +	if (!curr_buf) {
>> +		dev_err(priv->dev, "Not enough buffers\n");
>> +		return;
>> +	}
>> +
>> +	spin_lock_irqsave(&priv->irqlock, flags);
>> +
>> +	if (ctx->curr_buf) {
>> +		ctx->prev_buf = ctx->curr_buf;
>> +		ctx->curr_buf = curr_buf;
>> +	} else {
>> +		ctx->prev_buf = curr_buf;
>> +		ctx->curr_buf = curr_buf;
>> +		dev_warn(priv->dev, "Single-buffer mode, fix your userspace\n");
>> +	}
>> +
>> +	prev_phys = vb2_dma_contig_plane_dma_addr(&ctx->prev_buf->vb2_buf, 0);
>> +	curr_phys = vb2_dma_contig_plane_dma_addr(&ctx->curr_buf->vb2_buf, 0);
>> +
>> +	priv->curr_ctx = ctx;
>> +	spin_unlock_irqrestore(&priv->irqlock, flags);
>> +
>> +	ipu_cpmem_set_buffer(priv->vdi_out_ch,  0, out_phys);
>> +	ipu_cpmem_set_buffer(priv->vdi_in_ch_p, 0, prev_phys + phys_offset);
>> +	ipu_cpmem_set_buffer(priv->vdi_in_ch,   0, curr_phys);
>> +	ipu_cpmem_set_buffer(priv->vdi_in_ch_n, 0, curr_phys + phys_offset);
> 
> This always outputs at a frame rate of half the field rate, and only
> top fields are ever used as current field, and bottom fields as
> previous/next fields, right?

Yes, currently the driver extracts 1 frame from two consecutive incoming 
fields (previous Bottom, and current Top and Bottom):

(frame 1 and 3 below is omitted)

     1  2  3  4
...|T |T |T |T |...
...| B| B| B| B|...
      | ||  | ||
      '-''  '-''
       ||    ||
       ||    \/
       \/  Frame#4
     Frame#2

As far as I understand it, this is how the current VDI implementation 
behaves too, right ?

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/media/imx/imx-media-vdic.c#n207

> I think it would be good to add a mode that doesn't drop the
> 
> 	ipu_cpmem_set_buffer(priv->vdi_in_ch_p, 0, prev_phys);
> 	ipu_cpmem_set_buffer(priv->vdi_in_ch,   0, prev_phys + phys_offset);
> 	ipu_cpmem_set_buffer(priv->vdi_in_ch_n, 0, curr_phys);
> 
> output frames, right from the start.

This would make the VDI act as a frame-rate doubler, which would spend a 
lot more memory bandwidth, which is limited on MX6, so I would also like 
to have a frame-drop mode (i.e. current behavior).

Can we make that behavior configurable ? Since this is a mem2mem device, 
we do not really have any notion of input and output frame-rate, so I 
suspect this would need some VIDIOC_* ioctl ?

> If we don't start with that supported, I fear userspace will make
> assumptions and be surprised when a full rate mode is added later.

I'm afraid that since the current VDI already does retain input frame 
rate instead of doubling it, the userspace already makes an assumption, 
so that ship has sailed.

But I think we can make the frame doubling configurable ?

>> +	/* No double buffering, always pick buffer 0 */
>> +	ipu_idmac_select_buffer(priv->vdi_out_ch, 0);
>> +	ipu_idmac_select_buffer(priv->vdi_in_ch_p, 0);
>> +	ipu_idmac_select_buffer(priv->vdi_in_ch, 0);
>> +	ipu_idmac_select_buffer(priv->vdi_in_ch_n, 0);
>> +
>> +	/* Enable the channels */
>> +	ipu_idmac_enable_channel(priv->vdi_out_ch);
>> +	ipu_idmac_enable_channel(priv->vdi_in_ch_p);
>> +	ipu_idmac_enable_channel(priv->vdi_in_ch);
>> +	ipu_idmac_enable_channel(priv->vdi_in_ch_n);
>> +}

[...]

>> +static int ipu_mem2mem_vdic_setup_hardware(struct ipu_mem2mem_vdic_priv *priv)
>> +{
>> +	struct v4l2_pix_format *infmt, *outfmt;
>> +	struct ipu_ic_csc csc;
>> +	bool in422, outyuv;
>> +	int ret;
>> +
>> +	infmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_OUTPUT);
>> +	outfmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_CAPTURE);
>> +	in422 = ipu_mem2mem_vdic_format_is_yuv422(infmt->pixelformat);
>> +	outyuv = ipu_mem2mem_vdic_format_is_yuv(outfmt->pixelformat);
>> +
>> +	ipu_vdi_setup(priv->vdi, in422, infmt->width, infmt->height);
>> +	ipu_vdi_set_field_order(priv->vdi, V4L2_STD_UNKNOWN, infmt->field);
>> +	ipu_vdi_set_motion(priv->vdi, HIGH_MOTION);
> 
> This maps to VDI_C_MOT_SEL_FULL aka VDI_MOT_SEL=2, which is documented
> as "full motion, only vertical filter is used". Doesn't this completely
> ignore the previous/next fields and only use the output of the di_vfilt
> four tap vertical filter block to fill in missing lines from the
> surrounding pixels (above and below) of the current field?

Is there a suitable knob for this or shall I introduce a device specific 
one, like the vdic_ctrl_motion_menu for the current VDIC direct driver ?

If we introduce such a knob, then it is all the more reason to provide 
one device node per one VDIC hardware instance, since each can be 
configured for different motion settings.

> I think this should at least be configurable, and probably default to
> MED_MOTION.

I think to be compatible with the current VDI behavior and to reduce 
memory bandwidth usage, let's default to the HIGH/full mode. That one 
produces reasonably good results without spending too much memory 
bandwidth which is constrained already on the MX6, and if the user needs 
better image quality, they can configure another mode using the V4L2 
control.

[...]
Marek Vasut Sept. 24, 2024, 3:42 p.m. UTC | #8
On 7/30/24 6:05 PM, Nicolas Dufresne wrote:

Hi,

sorry for the abysmal delay.

>>> Le mercredi 24 juillet 2024 à 02:19 +0200, Marek Vasut a écrit :
>>>> Introduce dedicated memory-to-memory IPUv3 VDI deinterlacer driver.
>>>> Currently the IPUv3 can operate VDI in DIRECT mode, from sensor to
>>>> memory. This only works for single stream, that is, one input from
>>>> one camera is deinterlaced on the fly with a helper buffer in DRAM
>>>> and the result is written into memory.
>>>>
>>>> The i.MX6Q/QP does support up to four analog cameras via two IPUv3
>>>> instances, each containing one VDI deinterlacer block. In order to
>>>> deinterlace all four streams from all four analog cameras live, it
>>>> is necessary to operate VDI in INDIRECT mode, where the interlaced
>>>> streams are written to buffers in memory, and then deinterlaced in
>>>> memory using VDI in INDIRECT memory-to-memory mode.
>>>
>>> Just a quick design question. Is it possible to chain the deinterlacer and the
>>> csc-scaler ?
>>
>> I think you could do that.
>>
>>> If so, it would be much more efficient if all this could be
>>> combined into the existing m2m driver, since you could save a memory rountrip
>>> when needing to deinterlace, change the colorspace and possibly scale too.
>>
>> The existing PRP/IC driver is similar to what this driver does, yes, but
>> it uses a different DMA path , I believe it is IDMAC->PRP->IC->IDMAC .
>> This driver uses IDMAC->VDI->IC->IDMAC . I am not convinced mixing the
>> two paths into a single driver would be beneficial, but I am reasonably
>> sure it would be very convoluted. Instead, this driver could be extended
>> to do deinterlacing and scaling using the IC if that was needed. I think
>> that would be the cleaner approach.
> 
> Not that I only meant to ask if there was a path to combine
> CSC/Scaling/Deinterlacing without a memory rountrip. If a rountrip is needed
> anyway, I would rather make separate video nodes, and leave it to userspace to
> deal with. Though, if we can avoid it, a combined driver should be highly
> beneficial.
The VDI mem2mem driver already uses the IC as an output path from the 
deinterlacer, IC is capable of scaling and it could be configured to do 
scaling. The IC configuration in the VDI mem2mem driver is some 10 lines 
of code (select input and output colorspace, and input and output image 
resolution), the rest of the VDI mem2mem driver is interaction with the 
VDI itself.

Since the IC configuration (i.e. color space conversion and scaling) is 
already well factored out, I think mixing the VDI and CSC drivers 
wouldn't bring any real benefit, it would only make the code more 
complicated.

[...]
Philipp Zabel Sept. 25, 2024, 3:07 p.m. UTC | #9
Hi,

On Di, 2024-09-24 at 17:28 +0200, Marek Vasut wrote:
> On 9/6/24 11:01 AM, Philipp Zabel wrote:
[...]
> > Instead of presenting two devices to userspace, it would be better to
> > have a single video device that can distribute work to both IPUs.
> 
> Why do you think so ?

The scaler/colorspace converter supports frames larger than the
1024x1024 hardware by splitting each frame into multiple tiles. It
currently does so sequentially on a single IC. Speed could be improved
by distributing the tiles to both ICs. This is not an option anymore if
there are two video devices that are fixed to one IC each.

The same would be possible for the deinterlacer, e.g. to support 720i
frames split into two tiles each sent to one of the two VDICs.

> I think it is better to keep the kernel code as simple as possible, i.e. 
> provide the device node for each m2m device to userspace and handle the 
> m2m device hardware interaction in the kernel driver, but let userspace 
> take care of policy like job scheduling, access permissions assignment 
> to each device (e.g. if different user accounts should have access to 
> different VDICs), or other such topics.

I both agree and disagree with you at the same time.

If the programming model were more similar to DRM, I'd agree in a
heartbeat. If the kernel driver just had to do memory/fence handling
and command submission (and parameter sanitization, because there is no
MMU), and there was some userspace API on top, it would make sense to
me to handle parameter calculation and job scheduling in a hardware
specific userspace driver that can just open one device for each IPU.

With the rigid V4L2 model though, where memory handling, parameter
calculation, and job scheduling of tiles in a single frame all have to
be hidden behind the V4L2 API, I don't think requiring userspace to
combine multiple mem2mem video devices to work together on a single
frame is feasible.

Is limiting different users to the different deinterlacer hardware
units a real usecase? I saw the two ICs, when used as mem2mem devices,
as interchangeable resources.

> > To be fair, we never implemented that for the CSC/scaler mem2mem device
> > either.
> 
> I don't think that is actually a good idea. Instead, it would be better 
> to have two scaler nodes in userspace.

See above, that would make it impossible (or rather unreasonably
complicated) to distribute work on a single frame to both IPUs.

[...]
> > > +	ipu_cpmem_set_buffer(priv->vdi_out_ch,  0, out_phys);
> > > +	ipu_cpmem_set_buffer(priv->vdi_in_ch_p, 0, prev_phys + phys_offset);
> > > +	ipu_cpmem_set_buffer(priv->vdi_in_ch,   0, curr_phys);
> > > +	ipu_cpmem_set_buffer(priv->vdi_in_ch_n, 0, curr_phys + phys_offset);
> > 
> > This always outputs at a frame rate of half the field rate, and only
> > top fields are ever used as current field, and bottom fields as
> > previous/next fields, right?
> 
> Yes, currently the driver extracts 1 frame from two consecutive incoming 
> fields (previous Bottom, and current Top and Bottom):
> 
> (frame 1 and 3 below is omitted)
> 
>      1  2  3  4
> ...|T |T |T |T |...
> ...| B| B| B| B|...
>       | ||  | ||
>       '-''  '-''
>        ||    ||
>        ||    \/
>        \/  Frame#4
>      Frame#2
> 
> As far as I understand it, this is how the current VDI implementation 
> behaves too, right ?

Yes, that is a hardware limitation when using the direct CSI->VDIC
direct path. As far as I understand, for each frame (two fields) the
CSI only sends the first ("PREV") field directly to the VDIC, which
therefor can only be run in full motion mode (use the filter to add in
the missing lines).
The second ("CUR") field is just ignored. It could be written to RAM
via IDMAC output channel 13 (IPUV3_CHANNEL_VDI_MEM_RECENT), which can
not be used by the VDIC in direct mode. So this is not implemented.

> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/media/imx/imx-media-vdic.c#n207

That code is unused. The direct hardware path doesn't use
IPUV3_CHANNEL_MEM_VDI_PREV/CUR/NEXT, but is has a similar effect, half
of the incoming fields are dropped. The setup is vdic_setup_direct().

> > I think it would be good to add a mode that doesn't drop the
> > 
> > 	ipu_cpmem_set_buffer(priv->vdi_in_ch_p, 0, prev_phys);
> > 	ipu_cpmem_set_buffer(priv->vdi_in_ch,   0, prev_phys + phys_offset);
> > 	ipu_cpmem_set_buffer(priv->vdi_in_ch_n, 0, curr_phys);
> > 
> > output frames, right from the start.
> 
> This would make the VDI act as a frame-rate doubler, which would spend a 
> lot more memory bandwidth, which is limited on MX6, so I would also like 
> to have a frame-drop mode (i.e. current behavior).
>
> Can we make that behavior configurable ? Since this is a mem2mem device, 
> we do not really have any notion of input and output frame-rate, so I 
> suspect this would need some VIDIOC_* ioctl ?

That would be good. The situation I'd like to avoid is that this device
becomes available without the full frame-rate mode, userspace then
assumes this is a 1:1 frame converter device, and then we can't add the
full frame-rate later without breaking userspace.

> > If we don't start with that supported, I fear userspace will make
> > assumptions and be surprised when a full rate mode is added later.
> 
> I'm afraid that since the current VDI already does retain input frame 
> rate instead of doubling it, the userspace already makes an assumption, 
> so that ship has sailed.

No, this is about the deinterlacer mem2mem device, which doesn't exist
before this series.

The CSI capture path already has configurable framedrops (in the CSI).

> But I think we can make the frame doubling configurable ?

That would be good. Specifically, there must be no guarantee that one
input frame with two fields only produces one deinterlaced output
frame, and userspace should somehow be able to understand this.

This would be an argument against Nicolas' suggestion of including this
in the csc/scaler device, which always must produce one output frame
per input frame.

[...]
> > This maps to VDI_C_MOT_SEL_FULL aka VDI_MOT_SEL=2, which is documented
> > as "full motion, only vertical filter is used". Doesn't this completely
> > ignore the previous/next fields and only use the output of the di_vfilt
> > four tap vertical filter block to fill in missing lines from the
> > surrounding pixels (above and below) of the current field?
> 
> Is there a suitable knob for this or shall I introduce a device specific 
> one, like the vdic_ctrl_motion_menu for the current VDIC direct driver ?
> 
> If we introduce such a knob, then it is all the more reason to provide 
> one device node per one VDIC hardware instance, since each can be 
> configured for different motion settings.

As far as I know, there is no such control yet. I don't think this
should be per-device, but per-stream (or even per-frame).

> > I think this should at least be configurable, and probably default to
> > MED_MOTION.
> 
> I think to be compatible with the current VDI behavior and to reduce 
> memory bandwidth usage, let's default to the HIGH/full mode. That one 
> produces reasonably good results without spending too much memory 
> bandwidth which is constrained already on the MX6, and if the user needs 
> better image quality, they can configure another mode using the V4L2 
> control.

I'd rather not default to the setting that throws away half of the
input data. Not using frame doubling by default is sensible, but now
that using all three input fields to calculate the output frame is
possible, why not make that the default.

regards
Philipp
Nicolas Dufresne Sept. 25, 2024, 5:58 p.m. UTC | #10
Hi,

Le mercredi 24 juillet 2024 à 02:19 +0200, Marek Vasut a écrit :
> Introduce dedicated memory-to-memory IPUv3 VDI deinterlacer driver.
> Currently the IPUv3 can operate VDI in DIRECT mode, from sensor to
> memory. This only works for single stream, that is, one input from
> one camera is deinterlaced on the fly with a helper buffer in DRAM
> and the result is written into memory.
> 
> The i.MX6Q/QP does support up to four analog cameras via two IPUv3
> instances, each containing one VDI deinterlacer block. In order to
> deinterlace all four streams from all four analog cameras live, it
> is necessary to operate VDI in INDIRECT mode, where the interlaced
> streams are written to buffers in memory, and then deinterlaced in
> memory using VDI in INDIRECT memory-to-memory mode.
> 
> This driver also makes use of the IDMAC->VDI->IC->IDMAC data path
> to provide pixel format conversion from input YUV formats to both
> output YUV or RGB formats. The later is useful in case the data
> are imported into the GPU, which on this platform cannot directly
> sample YUV buffers.
> 
> This is derived from previous work by Steve Longerbeam and from the
> IPUv3 CSC Scaler mem2mem driver.
> 
> Signed-off-by: Marek Vasut <marex@denx.de>
> ---
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: David Airlie <airlied@gmail.com>
> Cc: Fabio Estevam <festevam@gmail.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Helge Deller <deller@gmx.de>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
> Cc: Philipp Zabel <p.zabel@pengutronix.de>
> Cc: Sascha Hauer <s.hauer@pengutronix.de>
> Cc: Shawn Guo <shawnguo@kernel.org>
> Cc: Steve Longerbeam <slongerbeam@gmail.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: imx@lists.linux.dev
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: linux-staging@lists.linux.dev
> ---
> V2: - Add complementary imx_media_mem2mem_vdic_uninit()
>     - Drop uninitiaized ret from ipu_mem2mem_vdic_device_run()
>     - Drop duplicate nbuffers assignment in ipu_mem2mem_vdic_queue_setup()
>     - Fix %u formatting string in ipu_mem2mem_vdic_queue_setup()
>     - Drop devm_*free from ipu_mem2mem_vdic_get_ipu_resources() fail path
>       and ipu_mem2mem_vdic_put_ipu_resources()
>     - Add missing video_device_release()
> ---
>  drivers/staging/media/imx/Makefile            |   2 +-
>  drivers/staging/media/imx/imx-media-dev.c     |  55 +
>  .../media/imx/imx-media-mem2mem-vdic.c        | 997 ++++++++++++++++++
>  drivers/staging/media/imx/imx-media.h         |  10 +
>  4 files changed, 1063 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/staging/media/imx/imx-media-mem2mem-vdic.c
> 
> diff --git a/drivers/staging/media/imx/Makefile b/drivers/staging/media/imx/Makefile
> index 330e0825f506b..0cad87123b590 100644
> --- a/drivers/staging/media/imx/Makefile
> +++ b/drivers/staging/media/imx/Makefile
> @@ -4,7 +4,7 @@ imx-media-common-objs := imx-media-capture.o imx-media-dev-common.o \
>  
>  imx6-media-objs := imx-media-dev.o imx-media-internal-sd.o \
>  	imx-ic-common.o imx-ic-prp.o imx-ic-prpencvf.o imx-media-vdic.o \
> -	imx-media-csc-scaler.o
> +	imx-media-mem2mem-vdic.o imx-media-csc-scaler.o
>  
>  imx6-media-csi-objs := imx-media-csi.o imx-media-fim.o
>  
> diff --git a/drivers/staging/media/imx/imx-media-dev.c b/drivers/staging/media/imx/imx-media-dev.c
> index be54dca11465d..a841fdb4c2394 100644
> --- a/drivers/staging/media/imx/imx-media-dev.c
> +++ b/drivers/staging/media/imx/imx-media-dev.c
> @@ -57,7 +57,52 @@ static int imx6_media_probe_complete(struct v4l2_async_notifier *notifier)
>  		goto unlock;
>  	}
>  
> +	imxmd->m2m_vdic[0] = imx_media_mem2mem_vdic_init(imxmd, 0);
> +	if (IS_ERR(imxmd->m2m_vdic[0])) {
> +		ret = PTR_ERR(imxmd->m2m_vdic[0]);
> +		imxmd->m2m_vdic[0] = NULL;
> +		goto unlock;
> +	}
> +
> +	/* MX6S/DL has one IPUv3, init second VDI only on MX6Q/QP */
> +	if (imxmd->ipu[1]) {
> +		imxmd->m2m_vdic[1] = imx_media_mem2mem_vdic_init(imxmd, 1);
> +		if (IS_ERR(imxmd->m2m_vdic[1])) {
> +			ret = PTR_ERR(imxmd->m2m_vdic[1]);
> +			imxmd->m2m_vdic[1] = NULL;
> +			goto uninit_vdi0;
> +		}
> +	}
> +
>  	ret = imx_media_csc_scaler_device_register(imxmd->m2m_vdev);
> +	if (ret)
> +		goto uninit_vdi1;
> +
> +	ret = imx_media_mem2mem_vdic_register(imxmd->m2m_vdic[0]);
> +	if (ret)
> +		goto unreg_csc;
> +
> +	/* MX6S/DL has one IPUv3, init second VDI only on MX6Q/QP */
> +	if (imxmd->ipu[1]) {
> +		ret = imx_media_mem2mem_vdic_register(imxmd->m2m_vdic[1]);
> +		if (ret)
> +			goto unreg_vdic;
> +	}
> +
> +	mutex_unlock(&imxmd->mutex);
> +	return ret;
> +
> +unreg_vdic:
> +	imx_media_mem2mem_vdic_unregister(imxmd->m2m_vdic[0]);
> +	imxmd->m2m_vdic[0] = NULL;
> +unreg_csc:
> +	imx_media_csc_scaler_device_unregister(imxmd->m2m_vdev);
> +	imxmd->m2m_vdev = NULL;
> +uninit_vdi1:
> +	if (imxmd->ipu[1])
> +		imx_media_mem2mem_vdic_uninit(imxmd->m2m_vdic[1]);
> +uninit_vdi0:
> +	imx_media_mem2mem_vdic_uninit(imxmd->m2m_vdic[0]);
>  unlock:
>  	mutex_unlock(&imxmd->mutex);
>  	return ret;
> @@ -108,6 +153,16 @@ static void imx_media_remove(struct platform_device *pdev)
>  
>  	v4l2_info(&imxmd->v4l2_dev, "Removing imx-media\n");
>  
> +	if (imxmd->m2m_vdic[1]) {	/* MX6Q/QP only */
> +		imx_media_mem2mem_vdic_unregister(imxmd->m2m_vdic[1]);
> +		imxmd->m2m_vdic[1] = NULL;
> +	}
> +
> +	if (imxmd->m2m_vdic[0]) {
> +		imx_media_mem2mem_vdic_unregister(imxmd->m2m_vdic[0]);
> +		imxmd->m2m_vdic[0] = NULL;
> +	}
> +
>  	if (imxmd->m2m_vdev) {
>  		imx_media_csc_scaler_device_unregister(imxmd->m2m_vdev);
>  		imxmd->m2m_vdev = NULL;
> diff --git a/drivers/staging/media/imx/imx-media-mem2mem-vdic.c b/drivers/staging/media/imx/imx-media-mem2mem-vdic.c
> new file mode 100644
> index 0000000000000..71c6c023d2bf8
> --- /dev/null
> +++ b/drivers/staging/media/imx/imx-media-mem2mem-vdic.c
> @@ -0,0 +1,997 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * i.MX VDIC mem2mem de-interlace driver
> + *
> + * Copyright (c) 2024 Marek Vasut <marex@denx.de>
> + *
> + * Based on previous VDIC mem2mem work by Steve Longerbeam that is:
> + * Copyright (c) 2018 Mentor Graphics Inc.
> + */
> +
> +#include <linux/delay.h>
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/version.h>
> +
> +#include <media/media-device.h>
> +#include <media/v4l2-ctrls.h>
> +#include <media/v4l2-device.h>
> +#include <media/v4l2-event.h>
> +#include <media/v4l2-ioctl.h>
> +#include <media/v4l2-mem2mem.h>
> +#include <media/videobuf2-dma-contig.h>
> +
> +#include "imx-media.h"
> +
> +#define fh_to_ctx(__fh)	container_of(__fh, struct ipu_mem2mem_vdic_ctx, fh)
> +
> +#define to_mem2mem_priv(v) container_of(v, struct ipu_mem2mem_vdic_priv, vdev)
> +
> +enum {
> +	V4L2_M2M_SRC = 0,
> +	V4L2_M2M_DST = 1,
> +};
> +
> +struct ipu_mem2mem_vdic_ctx;
> +
> +struct ipu_mem2mem_vdic_priv {
> +	struct imx_media_video_dev	vdev;
> +	struct imx_media_dev		*md;
> +	struct device			*dev;
> +	struct ipu_soc			*ipu_dev;
> +	int				ipu_id;
> +
> +	struct v4l2_m2m_dev		*m2m_dev;
> +	struct mutex			mutex;		/* mem2mem device mutex */
> +
> +	/* VDI resources */
> +	struct ipu_vdi			*vdi;
> +	struct ipu_ic			*ic;
> +	struct ipuv3_channel		*vdi_in_ch_p;
> +	struct ipuv3_channel		*vdi_in_ch;
> +	struct ipuv3_channel		*vdi_in_ch_n;
> +	struct ipuv3_channel		*vdi_out_ch;
> +	int				eof_irq;
> +	int				nfb4eof_irq;
> +	spinlock_t			irqlock;	/* protect eof_irq handler */
> +
> +	atomic_t			stream_count;
> +
> +	struct ipu_mem2mem_vdic_ctx	*curr_ctx;
> +
> +	struct v4l2_pix_format		fmt[2];
> +};
> +
> +struct ipu_mem2mem_vdic_ctx {
> +	struct ipu_mem2mem_vdic_priv	*priv;
> +	struct v4l2_fh			fh;
> +	unsigned int			sequence;
> +	struct vb2_v4l2_buffer		*prev_buf;
> +	struct vb2_v4l2_buffer		*curr_buf;
> +};
> +
> +static struct v4l2_pix_format *
> +ipu_mem2mem_vdic_get_format(struct ipu_mem2mem_vdic_priv *priv,
> +			    enum v4l2_buf_type type)
> +{
> +	return &priv->fmt[V4L2_TYPE_IS_OUTPUT(type) ? V4L2_M2M_SRC : V4L2_M2M_DST];
> +}

From here ...

> +
> +static bool ipu_mem2mem_vdic_format_is_yuv420(const u32 pixelformat)
> +{
> +	/* All 4:2:0 subsampled formats supported by this hardware */
> +	return pixelformat == V4L2_PIX_FMT_YUV420 ||
> +	       pixelformat == V4L2_PIX_FMT_YVU420 ||
> +	       pixelformat == V4L2_PIX_FMT_NV12;
> +}
> +
> +static bool ipu_mem2mem_vdic_format_is_yuv422(const u32 pixelformat)
> +{
> +	/* All 4:2:2 subsampled formats supported by this hardware */
> +	return pixelformat == V4L2_PIX_FMT_UYVY ||
> +	       pixelformat == V4L2_PIX_FMT_YUYV ||
> +	       pixelformat == V4L2_PIX_FMT_YUV422P ||
> +	       pixelformat == V4L2_PIX_FMT_NV16;
> +}
> +
> +static bool ipu_mem2mem_vdic_format_is_yuv(const u32 pixelformat)
> +{
> +	return ipu_mem2mem_vdic_format_is_yuv420(pixelformat) ||
> +	       ipu_mem2mem_vdic_format_is_yuv422(pixelformat);
> +}
> +
> +static bool ipu_mem2mem_vdic_format_is_rgb16(const u32 pixelformat)
> +{
> +	/* All 16-bit RGB formats supported by this hardware */
> +	return pixelformat == V4L2_PIX_FMT_RGB565;
> +}
> +
> +static bool ipu_mem2mem_vdic_format_is_rgb24(const u32 pixelformat)
> +{
> +	/* All 24-bit RGB formats supported by this hardware */
> +	return pixelformat == V4L2_PIX_FMT_RGB24 ||
> +	       pixelformat == V4L2_PIX_FMT_BGR24;
> +}
> +
> +static bool ipu_mem2mem_vdic_format_is_rgb32(const u32 pixelformat)
> +{
> +	/* All 32-bit RGB formats supported by this hardware */
> +	return pixelformat == V4L2_PIX_FMT_XRGB32 ||
> +	       pixelformat == V4L2_PIX_FMT_XBGR32 ||
> +	       pixelformat == V4L2_PIX_FMT_BGRX32 ||
> +	       pixelformat == V4L2_PIX_FMT_RGBX32;
> +}

To here, these days, all this information can be derived from v4l2_format_info
in v4l2-common in a way you don't have to create a big barrier to adding more
formats in the future.

> +
> +/*
> + * mem2mem callbacks
> + */
> +static irqreturn_t ipu_mem2mem_vdic_eof_interrupt(int irq, void *dev_id)
> +{
> +	struct ipu_mem2mem_vdic_priv *priv = dev_id;
> +	struct ipu_mem2mem_vdic_ctx *ctx = priv->curr_ctx;
> +	struct vb2_v4l2_buffer *src_buf, *dst_buf;
> +
> +	spin_lock(&priv->irqlock);
> +
> +	src_buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
> +	dst_buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
> +
> +	v4l2_m2m_buf_copy_metadata(src_buf, dst_buf, true);
> +
> +	src_buf->sequence = ctx->sequence++;
> +	dst_buf->sequence = src_buf->sequence;
> +
> +	v4l2_m2m_buf_done(src_buf, VB2_BUF_STATE_DONE);
> +	v4l2_m2m_buf_done(dst_buf, VB2_BUF_STATE_DONE);
> +
> +	v4l2_m2m_job_finish(priv->m2m_dev, ctx->fh.m2m_ctx);
> +
> +	spin_unlock(&priv->irqlock);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t ipu_mem2mem_vdic_nfb4eof_interrupt(int irq, void *dev_id)
> +{
> +	struct ipu_mem2mem_vdic_priv *priv = dev_id;
> +
> +	/* That is about all we can do about it, report it. */
> +	dev_warn_ratelimited(priv->dev, "NFB4EOF error interrupt occurred\n");

Not sure this is right. If that means ipu_mem2mem_vdic_eof_interrupt won't fire,
then it means streamoff/close after that will hang forever, leaving a zombie
process behind.

Perhaps mark the buffers as ERROR, and finish the job.

> +
> +	return IRQ_HANDLED;
> +}
> +
> +static void ipu_mem2mem_vdic_device_run(void *_ctx)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = _ctx;
> +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
> +	struct vb2_v4l2_buffer *curr_buf, *dst_buf;
> +	dma_addr_t prev_phys, curr_phys, out_phys;
> +	struct v4l2_pix_format *infmt;
> +	u32 phys_offset = 0;
> +	unsigned long flags;
> +
> +	infmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_OUTPUT);
> +	if (V4L2_FIELD_IS_SEQUENTIAL(infmt->field))
> +		phys_offset = infmt->sizeimage / 2;
> +	else if (V4L2_FIELD_IS_INTERLACED(infmt->field))
> +		phys_offset = infmt->bytesperline;
> +	else
> +		dev_err(priv->dev, "Invalid field %d\n", infmt->field);
> +
> +	dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
> +	out_phys = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
> +
> +	curr_buf = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
> +	if (!curr_buf) {
> +		dev_err(priv->dev, "Not enough buffers\n");
> +		return;

Impossible branch, has been checked by __v4l2_m2m_try_queue().

> +	}
> +
> +	spin_lock_irqsave(&priv->irqlock, flags);
> +
> +	if (ctx->curr_buf) {
> +		ctx->prev_buf = ctx->curr_buf;
> +		ctx->curr_buf = curr_buf;
> +	} else {
> +		ctx->prev_buf = curr_buf;
> +		ctx->curr_buf = curr_buf;
> +		dev_warn(priv->dev, "Single-buffer mode, fix your userspace\n");
> +	}

The driver is not taking ownership of prev_buf, only curr_buf is guaranteed to
exist until v4l2_m2m_job_finish() is called. Usespace could streamoff, allocate
new buffers, and then an old freed buffer may endup being used.

Its also unclear to me how userspace can avoid this ugly warning, how can you
have curr_buf set the first time ? (I might be missing something you this one
though).

Perhaps what you want is a custom job_ready() callback, that ensure you have 2
buffers in the OUTPUT queue ? You also need to ajust the CID
MIN_BUFFERS_FOR_OUTPUT accordingly.

> +
> +	prev_phys = vb2_dma_contig_plane_dma_addr(&ctx->prev_buf->vb2_buf, 0);
> +	curr_phys = vb2_dma_contig_plane_dma_addr(&ctx->curr_buf->vb2_buf, 0);
> +
> +	priv->curr_ctx = ctx;
> +	spin_unlock_irqrestore(&priv->irqlock, flags);
> +
> +	ipu_cpmem_set_buffer(priv->vdi_out_ch,  0, out_phys);
> +	ipu_cpmem_set_buffer(priv->vdi_in_ch_p, 0, prev_phys + phys_offset);
> +	ipu_cpmem_set_buffer(priv->vdi_in_ch,   0, curr_phys);
> +	ipu_cpmem_set_buffer(priv->vdi_in_ch_n, 0, curr_phys + phys_offset);
> +
> +	/* No double buffering, always pick buffer 0 */
> +	ipu_idmac_select_buffer(priv->vdi_out_ch, 0);
> +	ipu_idmac_select_buffer(priv->vdi_in_ch_p, 0);
> +	ipu_idmac_select_buffer(priv->vdi_in_ch, 0);
> +	ipu_idmac_select_buffer(priv->vdi_in_ch_n, 0);
> +
> +	/* Enable the channels */
> +	ipu_idmac_enable_channel(priv->vdi_out_ch);
> +	ipu_idmac_enable_channel(priv->vdi_in_ch_p);
> +	ipu_idmac_enable_channel(priv->vdi_in_ch);
> +	ipu_idmac_enable_channel(priv->vdi_in_ch_n);
> +}
> +
> +/*
> + * Video ioctls
> + */
> +static int ipu_mem2mem_vdic_querycap(struct file *file, void *priv,
> +				     struct v4l2_capability *cap)
> +{
> +	strscpy(cap->driver, "imx-m2m-vdic", sizeof(cap->driver));
> +	strscpy(cap->card, "imx-m2m-vdic", sizeof(cap->card));
> +	strscpy(cap->bus_info, "platform:imx-m2m-vdic", sizeof(cap->bus_info));
> +	cap->device_caps = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING;
> +	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
> +
> +	return 0;
> +}
> +
> +static int ipu_mem2mem_vdic_enum_fmt(struct file *file, void *fh, struct v4l2_fmtdesc *f)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = fh_to_ctx(fh);
> +	struct vb2_queue *vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
> +	enum imx_pixfmt_sel cs = vq->type == V4L2_BUF_TYPE_VIDEO_CAPTURE ?
> +				 PIXFMT_SEL_YUV_RGB : PIXFMT_SEL_YUV;
> +	u32 fourcc;
> +	int ret;
> +
> +	ret = imx_media_enum_pixel_formats(&fourcc, f->index, cs, 0);
> +	if (ret)
> +		return ret;
> +
> +	f->pixelformat = fourcc;
> +
> +	return 0;
> +}
> +
> +static int ipu_mem2mem_vdic_g_fmt(struct file *file, void *fh, struct v4l2_format *f)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = fh_to_ctx(fh);
> +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
> +	struct v4l2_pix_format *fmt = ipu_mem2mem_vdic_get_format(priv, f->type);
> +
> +	f->fmt.pix = *fmt;
> +
> +	return 0;
> +}
> +
> +static int ipu_mem2mem_vdic_try_fmt(struct file *file, void *fh,
> +				    struct v4l2_format *f)
> +{
> +	const struct imx_media_pixfmt *cc;
> +	enum imx_pixfmt_sel cs;
> +	u32 fourcc;
> +
> +	if (f->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) {	/* Output */
> +		cs = PIXFMT_SEL_YUV_RGB;	/* YUV direct / RGB via IC */
> +
> +		f->fmt.pix.field = V4L2_FIELD_NONE;
> +	} else {
> +		cs = PIXFMT_SEL_YUV;		/* YUV input only */
> +
> +		/*
> +		 * Input must be interlaced with frame order.
> +		 * Fall back to SEQ_TB otherwise.
> +		 */
> +		if (!V4L2_FIELD_HAS_BOTH(f->fmt.pix.field) ||
> +		    f->fmt.pix.field == V4L2_FIELD_INTERLACED)
> +			f->fmt.pix.field = V4L2_FIELD_SEQ_TB;
> +	}
> +
> +	fourcc = f->fmt.pix.pixelformat;
> +	cc = imx_media_find_pixel_format(fourcc, cs);
> +	if (!cc) {
> +		imx_media_enum_pixel_formats(&fourcc, 0, cs, 0);
> +		cc = imx_media_find_pixel_format(fourcc, cs);
> +	}
> +
> +	f->fmt.pix.pixelformat = cc->fourcc;
> +
> +	v4l_bound_align_image(&f->fmt.pix.width,
> +			      1, 968, 1,
> +			      &f->fmt.pix.height,
> +			      1, 1024, 1, 1);

Perhaps use defines for the magic numbers ?

> +
> +	if (ipu_mem2mem_vdic_format_is_yuv420(f->fmt.pix.pixelformat))
> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 3 / 2;
> +	else if (ipu_mem2mem_vdic_format_is_yuv422(f->fmt.pix.pixelformat))
> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 2;
> +	else if (ipu_mem2mem_vdic_format_is_rgb16(f->fmt.pix.pixelformat))
> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 2;
> +	else if (ipu_mem2mem_vdic_format_is_rgb24(f->fmt.pix.pixelformat))
> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 3;
> +	else if (ipu_mem2mem_vdic_format_is_rgb32(f->fmt.pix.pixelformat))
> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 4;
> +	else
> +		f->fmt.pix.bytesperline = f->fmt.pix.width;
> +
> +	f->fmt.pix.sizeimage = f->fmt.pix.height * f->fmt.pix.bytesperline;

And use v4l2-common ?

> +
> +	return 0;
> +}
> +
> +static int ipu_mem2mem_vdic_s_fmt(struct file *file, void *fh, struct v4l2_format *f)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = fh_to_ctx(fh);
> +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
> +	struct v4l2_pix_format *fmt, *infmt, *outfmt;
> +	struct vb2_queue *vq;
> +	int ret;
> +
> +	vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
> +	if (vb2_is_busy(vq)) {
> +		dev_err(priv->dev, "%s queue busy\n",  __func__);
> +		return -EBUSY;
> +	}
> +
> +	ret = ipu_mem2mem_vdic_try_fmt(file, fh, f);
> +	if (ret < 0)
> +		return ret;
> +
> +	fmt = ipu_mem2mem_vdic_get_format(priv, f->type);
> +	*fmt = f->fmt.pix;
> +
> +	/* Propagate colorimetry to the capture queue */
> +	infmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_OUTPUT);
> +	outfmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_CAPTURE);
> +	outfmt->colorspace = infmt->colorspace;
> +	outfmt->ycbcr_enc = infmt->ycbcr_enc;
> +	outfmt->xfer_func = infmt->xfer_func;
> +	outfmt->quantization = infmt->quantization;

So you can do CSC conversion but not colorimetry ? We have
V4L2_PIX_FMT_FLAG_SET_CSC if you can do colorimetry transforms too. I have
patches that I'll send for the csc-scaler driver.

> +
> +	return 0;
> +}
> +
> +static const struct v4l2_ioctl_ops mem2mem_ioctl_ops = {
> +	.vidioc_querycap		= ipu_mem2mem_vdic_querycap,
> +
> +	.vidioc_enum_fmt_vid_cap	= ipu_mem2mem_vdic_enum_fmt,
> +	.vidioc_g_fmt_vid_cap		= ipu_mem2mem_vdic_g_fmt,
> +	.vidioc_try_fmt_vid_cap		= ipu_mem2mem_vdic_try_fmt,
> +	.vidioc_s_fmt_vid_cap		= ipu_mem2mem_vdic_s_fmt,
> +
> +	.vidioc_enum_fmt_vid_out	= ipu_mem2mem_vdic_enum_fmt,
> +	.vidioc_g_fmt_vid_out		= ipu_mem2mem_vdic_g_fmt,
> +	.vidioc_try_fmt_vid_out		= ipu_mem2mem_vdic_try_fmt,
> +	.vidioc_s_fmt_vid_out		= ipu_mem2mem_vdic_s_fmt,
> +
> +	.vidioc_reqbufs			= v4l2_m2m_ioctl_reqbufs,
> +	.vidioc_querybuf		= v4l2_m2m_ioctl_querybuf,
> +
> +	.vidioc_qbuf			= v4l2_m2m_ioctl_qbuf,
> +	.vidioc_expbuf			= v4l2_m2m_ioctl_expbuf,
> +	.vidioc_dqbuf			= v4l2_m2m_ioctl_dqbuf,
> +	.vidioc_create_bufs		= v4l2_m2m_ioctl_create_bufs,
> +
> +	.vidioc_streamon		= v4l2_m2m_ioctl_streamon,
> +	.vidioc_streamoff		= v4l2_m2m_ioctl_streamoff,
> +
> +	.vidioc_subscribe_event		= v4l2_ctrl_subscribe_event,
> +	.vidioc_unsubscribe_event	= v4l2_event_unsubscribe,
> +};
> +
> +/*
> + * Queue operations
> + */
> +static int ipu_mem2mem_vdic_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
> +					unsigned int *nplanes, unsigned int sizes[],
> +					struct device *alloc_devs[])
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(vq);
> +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
> +	struct v4l2_pix_format *fmt = ipu_mem2mem_vdic_get_format(priv, vq->type);
> +	unsigned int count = *nbuffers;
> +
> +	if (*nplanes)
> +		return sizes[0] < fmt->sizeimage ? -EINVAL : 0;
> +
> +	*nplanes = 1;
> +	sizes[0] = fmt->sizeimage;
> +
> +	dev_dbg(ctx->priv->dev, "get %u buffer(s) of size %d each.\n",
> +		count, fmt->sizeimage);
> +
> +	return 0;
> +}
> +
> +static int ipu_mem2mem_vdic_buf_prepare(struct vb2_buffer *vb)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
> +	struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
> +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
> +	struct vb2_queue *vq = vb->vb2_queue;
> +	struct v4l2_pix_format *fmt;
> +	unsigned long size;
> +
> +	dev_dbg(ctx->priv->dev, "type: %d\n", vb->vb2_queue->type);
> +
> +	if (V4L2_TYPE_IS_OUTPUT(vq->type)) {
> +		if (vbuf->field == V4L2_FIELD_ANY)
> +			vbuf->field = V4L2_FIELD_SEQ_TB;
> +		if (!V4L2_FIELD_HAS_BOTH(vbuf->field)) {
> +			dev_dbg(ctx->priv->dev, "%s: field isn't supported\n",
> +				__func__);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	fmt = ipu_mem2mem_vdic_get_format(priv, vb->vb2_queue->type);
> +	size = fmt->sizeimage;
> +
> +	if (vb2_plane_size(vb, 0) < size) {
> +		dev_dbg(ctx->priv->dev,
> +			"%s: data will not fit into plane (%lu < %lu)\n",
> +			__func__, vb2_plane_size(vb, 0), size);
> +		return -EINVAL;
> +	}
> +
> +	vb2_set_plane_payload(vb, 0, fmt->sizeimage);
> +
> +	return 0;
> +}
> +
> +static void ipu_mem2mem_vdic_buf_queue(struct vb2_buffer *vb)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
> +
> +	v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, to_vb2_v4l2_buffer(vb));
> +}
> +
> +/* VDIC hardware setup */
> +static int ipu_mem2mem_vdic_setup_channel(struct ipu_mem2mem_vdic_priv *priv,
> +					  struct ipuv3_channel *channel,
> +					  struct v4l2_pix_format *fmt,
> +					  bool in)
> +{
> +	struct ipu_image image = { 0 };
> +	unsigned int burst_size;
> +	int ret;
> +
> +	image.pix = *fmt;
> +	image.rect.width = image.pix.width;
> +	image.rect.height = image.pix.height;
> +
> +	ipu_cpmem_zero(channel);
> +
> +	if (in) {
> +		/* One field to VDIC channels */
> +		image.pix.height /= 2;
> +		image.rect.height /= 2;
> +	} else {
> +		/* Skip writing U and V components to odd rows */
> +		if (ipu_mem2mem_vdic_format_is_yuv420(image.pix.pixelformat))
> +			ipu_cpmem_skip_odd_chroma_rows(channel);
> +	}
> +
> +	ret = ipu_cpmem_set_image(channel, &image);
> +	if (ret)
> +		return ret;
> +
> +	burst_size = (image.pix.width & 0xf) ? 8 : 16;
> +	ipu_cpmem_set_burstsize(channel, burst_size);
> +
> +	if (!ipu_prg_present(priv->ipu_dev))
> +		ipu_cpmem_set_axi_id(channel, 1);
> +
> +	ipu_idmac_set_double_buffer(channel, false);
> +
> +	return 0;
> +}
> +
> +static int ipu_mem2mem_vdic_setup_hardware(struct ipu_mem2mem_vdic_priv *priv)
> +{
> +	struct v4l2_pix_format *infmt, *outfmt;
> +	struct ipu_ic_csc csc;
> +	bool in422, outyuv;
> +	int ret;
> +
> +	infmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_OUTPUT);
> +	outfmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_CAPTURE);
> +	in422 = ipu_mem2mem_vdic_format_is_yuv422(infmt->pixelformat);
> +	outyuv = ipu_mem2mem_vdic_format_is_yuv(outfmt->pixelformat);
> +
> +	ipu_vdi_setup(priv->vdi, in422, infmt->width, infmt->height);
> +	ipu_vdi_set_field_order(priv->vdi, V4L2_STD_UNKNOWN, infmt->field);
> +	ipu_vdi_set_motion(priv->vdi, HIGH_MOTION);
> +
> +	/* Initialize the VDI IDMAC channels */
> +	ret = ipu_mem2mem_vdic_setup_channel(priv, priv->vdi_in_ch_p, infmt, true);
> +	if (ret)
> +		return ret;
> +
> +	ret = ipu_mem2mem_vdic_setup_channel(priv, priv->vdi_in_ch, infmt, true);
> +	if (ret)
> +		return ret;
> +
> +	ret = ipu_mem2mem_vdic_setup_channel(priv, priv->vdi_in_ch_n, infmt, true);
> +	if (ret)
> +		return ret;
> +
> +	ret = ipu_mem2mem_vdic_setup_channel(priv, priv->vdi_out_ch, outfmt, false);
> +	if (ret)
> +		return ret;
> +
> +	ret = ipu_ic_calc_csc(&csc,
> +			      infmt->ycbcr_enc, infmt->quantization,
> +			      IPUV3_COLORSPACE_YUV,
> +			      outfmt->ycbcr_enc, outfmt->quantization,
> +			      outyuv ? IPUV3_COLORSPACE_YUV :
> +				       IPUV3_COLORSPACE_RGB);
> +	if (ret)
> +		return ret;
> +
> +	/* Enable the IC */
> +	ipu_ic_task_init(priv->ic, &csc,
> +			 infmt->width, infmt->height,
> +			 outfmt->width, outfmt->height);
> +	ipu_ic_task_idma_init(priv->ic, priv->vdi_out_ch,
> +			      infmt->width, infmt->height, 16, 0);
> +	ipu_ic_enable(priv->ic);
> +	ipu_ic_task_enable(priv->ic);
> +
> +	/* Enable the VDI */
> +	ipu_vdi_enable(priv->vdi);
> +
> +	return 0;
> +}
> +
> +static struct vb2_queue *ipu_mem2mem_vdic_get_other_q(struct vb2_queue *q)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(q);
> +	enum v4l2_buf_type type = q->type == V4L2_BUF_TYPE_VIDEO_CAPTURE ?
> +				  V4L2_BUF_TYPE_VIDEO_OUTPUT :
> +				  V4L2_BUF_TYPE_VIDEO_CAPTURE;
> +
> +	return v4l2_m2m_get_vq(ctx->fh.m2m_ctx, type);
> +}
> +
> +static void ipu_mem2mem_vdic_return_bufs(struct vb2_queue *q)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(q);
> +	struct vb2_v4l2_buffer *buf;
> +
> +	if (q->type == V4L2_BUF_TYPE_VIDEO_OUTPUT)
> +		while ((buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx)))
> +			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_QUEUED);
> +	else
> +		while ((buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx)))
> +			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_QUEUED);
> +}
> +
> +static int ipu_mem2mem_vdic_start_streaming(struct vb2_queue *q, unsigned int count)
> +{
> +	struct vb2_queue *other_q = ipu_mem2mem_vdic_get_other_q(q);
> +	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(q);
> +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
> +	int ret;
> +
> +	if (!vb2_is_streaming(other_q))
> +		return 0;
> +
> +	/* Already streaming, do not reconfigure the VDI. */
> +	if (atomic_inc_return(&priv->stream_count) != 1)
> +		return 0;
> +
> +	/* Start streaming */
> +	ret = ipu_mem2mem_vdic_setup_hardware(priv);
> +	if (ret)
> +		ipu_mem2mem_vdic_return_bufs(q);
> +
> +	return ret;
> +}
> +
> +static void ipu_mem2mem_vdic_stop_streaming(struct vb2_queue *q)
> +{
> +	struct vb2_queue *other_q = ipu_mem2mem_vdic_get_other_q(q);
> +	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(q);
> +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
> +
> +	if (vb2_is_streaming(other_q)) {
> +		ipu_mem2mem_vdic_return_bufs(q);
> +		return;
> +	}
> +
> +	if (atomic_dec_return(&priv->stream_count) == 0) {
> +		/* Stop streaming */
> +		ipu_idmac_disable_channel(priv->vdi_in_ch_p);
> +		ipu_idmac_disable_channel(priv->vdi_in_ch);
> +		ipu_idmac_disable_channel(priv->vdi_in_ch_n);
> +		ipu_idmac_disable_channel(priv->vdi_out_ch);
> +
> +		ipu_vdi_disable(priv->vdi);
> +		ipu_ic_task_disable(priv->ic);
> +		ipu_ic_disable(priv->ic);
> +	}
> +
> +	ctx->sequence = 0;
> +
> +	ipu_mem2mem_vdic_return_bufs(q);
> +}
> +
> +static const struct vb2_ops mem2mem_qops = {
> +	.queue_setup	= ipu_mem2mem_vdic_queue_setup,
> +	.buf_prepare	= ipu_mem2mem_vdic_buf_prepare,
> +	.buf_queue	= ipu_mem2mem_vdic_buf_queue,
> +	.wait_prepare	= vb2_ops_wait_prepare,
> +	.wait_finish	= vb2_ops_wait_finish,
> +	.start_streaming = ipu_mem2mem_vdic_start_streaming,
> +	.stop_streaming = ipu_mem2mem_vdic_stop_streaming,
> +};
> +
> +static int ipu_mem2mem_vdic_queue_init(void *priv, struct vb2_queue *src_vq,
> +				       struct vb2_queue *dst_vq)
> +{
> +	struct ipu_mem2mem_vdic_ctx *ctx = priv;
> +	int ret;
> +
> +	memset(src_vq, 0, sizeof(*src_vq));
> +	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT;
> +	src_vq->io_modes = VB2_MMAP | VB2_DMABUF;
> +	src_vq->drv_priv = ctx;
> +	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
> +	src_vq->ops = &mem2mem_qops;
> +	src_vq->mem_ops = &vb2_dma_contig_memops;
> +	src_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
> +	src_vq->lock = &ctx->priv->mutex;
> +	src_vq->dev = ctx->priv->dev;
> +
> +	ret = vb2_queue_init(src_vq);
> +	if (ret)
> +		return ret;
> +
> +	memset(dst_vq, 0, sizeof(*dst_vq));
> +	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
> +	dst_vq->io_modes = VB2_MMAP | VB2_DMABUF;
> +	dst_vq->drv_priv = ctx;
> +	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
> +	dst_vq->ops = &mem2mem_qops;
> +	dst_vq->mem_ops = &vb2_dma_contig_memops;
> +	dst_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
> +	dst_vq->lock = &ctx->priv->mutex;
> +	dst_vq->dev = ctx->priv->dev;
> +
> +	return vb2_queue_init(dst_vq);
> +}
> +
> +#define DEFAULT_WIDTH	720
> +#define DEFAULT_HEIGHT	576
> +static const struct v4l2_pix_format ipu_mem2mem_vdic_default = {
> +	.width		= DEFAULT_WIDTH,
> +	.height		= DEFAULT_HEIGHT,
> +	.pixelformat	= V4L2_PIX_FMT_YUV420,
> +	.field		= V4L2_FIELD_SEQ_TB,
> +	.bytesperline	= DEFAULT_WIDTH,
> +	.sizeimage	= DEFAULT_WIDTH * DEFAULT_HEIGHT * 3 / 2,
> +	.colorspace	= V4L2_COLORSPACE_SRGB,
> +	.ycbcr_enc	= V4L2_YCBCR_ENC_601,
> +	.xfer_func	= V4L2_XFER_FUNC_DEFAULT,
> +	.quantization	= V4L2_QUANTIZATION_DEFAULT,
> +};
> +
> +/*
> + * File operations
> + */
> +static int ipu_mem2mem_vdic_open(struct file *file)
> +{
> +	struct ipu_mem2mem_vdic_priv *priv = video_drvdata(file);
> +	struct ipu_mem2mem_vdic_ctx *ctx = NULL;
> +	int ret;
> +
> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> +	if (!ctx)
> +		return -ENOMEM;
> +
> +	v4l2_fh_init(&ctx->fh, video_devdata(file));
> +	file->private_data = &ctx->fh;
> +	v4l2_fh_add(&ctx->fh);
> +	ctx->priv = priv;
> +
> +	ctx->fh.m2m_ctx = v4l2_m2m_ctx_init(priv->m2m_dev, ctx,
> +					    &ipu_mem2mem_vdic_queue_init);
> +	if (IS_ERR(ctx->fh.m2m_ctx)) {
> +		ret = PTR_ERR(ctx->fh.m2m_ctx);
> +		goto err_ctx;
> +	}
> +
> +	dev_dbg(priv->dev, "Created instance %p, m2m_ctx: %p\n",
> +		ctx, ctx->fh.m2m_ctx);
> +
> +	return 0;
> +
> +err_ctx:
> +	v4l2_fh_del(&ctx->fh);
> +	v4l2_fh_exit(&ctx->fh);
> +	kfree(ctx);
> +	return ret;
> +}
> +
> +static int ipu_mem2mem_vdic_release(struct file *file)
> +{
> +	struct ipu_mem2mem_vdic_priv *priv = video_drvdata(file);
> +	struct ipu_mem2mem_vdic_ctx *ctx = fh_to_ctx(file->private_data);
> +
> +	dev_dbg(priv->dev, "Releasing instance %p\n", ctx);
> +
> +	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> +	v4l2_fh_del(&ctx->fh);
> +	v4l2_fh_exit(&ctx->fh);
> +	kfree(ctx);
> +
> +	return 0;
> +}
> +
> +static const struct v4l2_file_operations mem2mem_fops = {
> +	.owner		= THIS_MODULE,
> +	.open		= ipu_mem2mem_vdic_open,
> +	.release	= ipu_mem2mem_vdic_release,
> +	.poll		= v4l2_m2m_fop_poll,
> +	.unlocked_ioctl	= video_ioctl2,
> +	.mmap		= v4l2_m2m_fop_mmap,
> +};
> +
> +static struct v4l2_m2m_ops m2m_ops = {
> +	.device_run	= ipu_mem2mem_vdic_device_run,
> +};
> +
> +static void ipu_mem2mem_vdic_device_release(struct video_device *vdev)
> +{
> +	struct ipu_mem2mem_vdic_priv *priv = video_get_drvdata(vdev);
> +
> +	v4l2_m2m_release(priv->m2m_dev);
> +	video_device_release(vdev);
> +	kfree(priv);
> +}
> +
> +static const struct video_device mem2mem_template = {
> +	.name		= "ipu_vdic",
> +	.fops		= &mem2mem_fops,
> +	.ioctl_ops	= &mem2mem_ioctl_ops,
> +	.minor		= -1,
> +	.release	= ipu_mem2mem_vdic_device_release,
> +	.vfl_dir	= VFL_DIR_M2M,
> +	.tvnorms	= V4L2_STD_NTSC | V4L2_STD_PAL | V4L2_STD_SECAM,
> +	.device_caps	= V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING,
> +};
> +
> +static int ipu_mem2mem_vdic_get_ipu_resources(struct ipu_mem2mem_vdic_priv *priv,
> +					      struct video_device *vfd)
> +{
> +	char *nfbname, *eofname;
> +	int ret;
> +
> +	nfbname = devm_kasprintf(priv->dev, GFP_KERNEL, "%s_nfb4eof:%u",
> +				 vfd->name, priv->ipu_id);
> +	if (!nfbname)
> +		return -ENOMEM;
> +
> +	eofname = devm_kasprintf(priv->dev, GFP_KERNEL, "%s_eof:%u",
> +				 vfd->name, priv->ipu_id);
> +	if (!eofname)
> +		return -ENOMEM;
> +
> +	priv->vdi = ipu_vdi_get(priv->ipu_dev);
> +	if (IS_ERR(priv->vdi)) {
> +		ret = PTR_ERR(priv->vdi);
> +		goto err_vdi;
> +	}
> +
> +	priv->ic = ipu_ic_get(priv->ipu_dev, IC_TASK_VIEWFINDER);
> +	if (IS_ERR(priv->ic)) {
> +		ret = PTR_ERR(priv->ic);
> +		goto err_ic;
> +	}
> +
> +	priv->vdi_in_ch_p = ipu_idmac_get(priv->ipu_dev,
> +					  IPUV3_CHANNEL_MEM_VDI_PREV);
> +	if (IS_ERR(priv->vdi_in_ch_p)) {
> +		ret = PTR_ERR(priv->vdi_in_ch_p);
> +		goto err_prev;
> +	}
> +
> +	priv->vdi_in_ch = ipu_idmac_get(priv->ipu_dev,
> +					IPUV3_CHANNEL_MEM_VDI_CUR);
> +	if (IS_ERR(priv->vdi_in_ch)) {
> +		ret = PTR_ERR(priv->vdi_in_ch);
> +		goto err_curr;
> +	}
> +
> +	priv->vdi_in_ch_n = ipu_idmac_get(priv->ipu_dev,
> +					  IPUV3_CHANNEL_MEM_VDI_NEXT);
> +	if (IS_ERR(priv->vdi_in_ch_n)) {
> +		ret = PTR_ERR(priv->vdi_in_ch_n);
> +		goto err_next;
> +	}
> +
> +	priv->vdi_out_ch = ipu_idmac_get(priv->ipu_dev,
> +					 IPUV3_CHANNEL_IC_PRP_VF_MEM);
> +	if (IS_ERR(priv->vdi_out_ch)) {
> +		ret = PTR_ERR(priv->vdi_out_ch);
> +		goto err_out;
> +	}
> +
> +	priv->nfb4eof_irq = ipu_idmac_channel_irq(priv->ipu_dev,
> +						  priv->vdi_out_ch,
> +						  IPU_IRQ_NFB4EOF);
> +	ret = devm_request_irq(priv->dev, priv->nfb4eof_irq,
> +			       ipu_mem2mem_vdic_nfb4eof_interrupt, 0,
> +			       nfbname, priv);
> +	if (ret)
> +		goto err_irq_eof;
> +
> +	priv->eof_irq = ipu_idmac_channel_irq(priv->ipu_dev,
> +					      priv->vdi_out_ch,
> +					      IPU_IRQ_EOF);
> +	ret = devm_request_irq(priv->dev, priv->eof_irq,
> +			       ipu_mem2mem_vdic_eof_interrupt, 0,
> +			       eofname, priv);
> +	if (ret)
> +		goto err_irq_eof;
> +
> +	/*
> +	 * Enable PRG, without PRG clock enabled (CCGR6:prg_clk_enable[0]
> +	 * and CCGR6:prg_clk_enable[1]), the VDI does not produce any
> +	 * interrupts at all.
> +	 */
> +	if (ipu_prg_present(priv->ipu_dev))
> +		ipu_prg_enable(priv->ipu_dev);
> +
> +	return 0;
> +
> +err_irq_eof:
> +	ipu_idmac_put(priv->vdi_out_ch);
> +err_out:
> +	ipu_idmac_put(priv->vdi_in_ch_n);
> +err_next:
> +	ipu_idmac_put(priv->vdi_in_ch);
> +err_curr:
> +	ipu_idmac_put(priv->vdi_in_ch_p);
> +err_prev:
> +	ipu_ic_put(priv->ic);
> +err_ic:
> +	ipu_vdi_put(priv->vdi);
> +err_vdi:
> +	return ret;
> +}
> +
> +static void ipu_mem2mem_vdic_put_ipu_resources(struct ipu_mem2mem_vdic_priv *priv)
> +{
> +	ipu_idmac_put(priv->vdi_out_ch);
> +	ipu_idmac_put(priv->vdi_in_ch_n);
> +	ipu_idmac_put(priv->vdi_in_ch);
> +	ipu_idmac_put(priv->vdi_in_ch_p);
> +	ipu_ic_put(priv->ic);
> +	ipu_vdi_put(priv->vdi);
> +}
> +
> +int imx_media_mem2mem_vdic_register(struct imx_media_video_dev *vdev)
> +{
> +	struct ipu_mem2mem_vdic_priv *priv = to_mem2mem_priv(vdev);
> +	struct video_device *vfd = vdev->vfd;
> +	int ret;
> +
> +	vfd->v4l2_dev = &priv->md->v4l2_dev;
> +
> +	ret = ipu_mem2mem_vdic_get_ipu_resources(priv, vfd);
> +	if (ret) {
> +		v4l2_err(vfd->v4l2_dev, "Failed to get VDIC resources (%d)\n", ret);
> +		return ret;
> +	}
> +
> +	ret = video_register_device(vfd, VFL_TYPE_VIDEO, -1);
> +	if (ret) {
> +		v4l2_err(vfd->v4l2_dev, "Failed to register video device\n");
> +		goto err_register;
> +	}
> +
> +	v4l2_info(vfd->v4l2_dev, "Registered %s as /dev/%s\n", vfd->name,
> +		  video_device_node_name(vfd));
> +
> +	return 0;
> +
> +err_register:
> +	ipu_mem2mem_vdic_put_ipu_resources(priv);
> +	return ret;
> +}
> +
> +void imx_media_mem2mem_vdic_unregister(struct imx_media_video_dev *vdev)
> +{
> +	struct ipu_mem2mem_vdic_priv *priv = to_mem2mem_priv(vdev);
> +	struct video_device *vfd = priv->vdev.vfd;
> +
> +	video_unregister_device(vfd);
> +
> +	ipu_mem2mem_vdic_put_ipu_resources(priv);
> +}
> +
> +struct imx_media_video_dev *
> +imx_media_mem2mem_vdic_init(struct imx_media_dev *md, int ipu_id)
> +{
> +	struct ipu_mem2mem_vdic_priv *priv;
> +	struct video_device *vfd;
> +	int ret;
> +
> +	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> +	if (!priv)
> +		return ERR_PTR(-ENOMEM);
> +
> +	priv->md = md;
> +	priv->ipu_id = ipu_id;
> +	priv->ipu_dev = md->ipu[ipu_id];
> +	priv->dev = md->md.dev;
> +
> +	mutex_init(&priv->mutex);
> +
> +	vfd = video_device_alloc();
> +	if (!vfd) {
> +		ret = -ENOMEM;
> +		goto err_vfd;
> +	}
> +
> +	*vfd = mem2mem_template;
> +	vfd->lock = &priv->mutex;
> +	priv->vdev.vfd = vfd;
> +
> +	INIT_LIST_HEAD(&priv->vdev.list);
> +	spin_lock_init(&priv->irqlock);
> +	atomic_set(&priv->stream_count, 0);
> +
> +	video_set_drvdata(vfd, priv);
> +
> +	priv->m2m_dev = v4l2_m2m_init(&m2m_ops);
> +	if (IS_ERR(priv->m2m_dev)) {
> +		ret = PTR_ERR(priv->m2m_dev);
> +		v4l2_err(&md->v4l2_dev, "Failed to init mem2mem device: %d\n",
> +			 ret);
> +		goto err_m2m;
> +	}
> +
> +	/* Reset formats */
> +	priv->fmt[V4L2_M2M_SRC] = ipu_mem2mem_vdic_default;
> +	priv->fmt[V4L2_M2M_SRC].pixelformat = V4L2_PIX_FMT_YUV420;
> +	priv->fmt[V4L2_M2M_SRC].field = V4L2_FIELD_SEQ_TB;
> +	priv->fmt[V4L2_M2M_SRC].bytesperline = DEFAULT_WIDTH;
> +	priv->fmt[V4L2_M2M_SRC].sizeimage = DEFAULT_WIDTH * DEFAULT_HEIGHT * 3 / 2;
> +
> +	priv->fmt[V4L2_M2M_DST] = ipu_mem2mem_vdic_default;
> +	priv->fmt[V4L2_M2M_DST].pixelformat = V4L2_PIX_FMT_RGB565;
> +	priv->fmt[V4L2_M2M_DST].field = V4L2_FIELD_NONE;
> +	priv->fmt[V4L2_M2M_DST].bytesperline = DEFAULT_WIDTH * 2;
> +	priv->fmt[V4L2_M2M_DST].sizeimage = DEFAULT_WIDTH * DEFAULT_HEIGHT * 2;
> +
> +	return &priv->vdev;
> +
> +err_m2m:
> +	video_device_release(vfd);
> +	video_set_drvdata(vfd, NULL);
> +err_vfd:
> +	kfree(priv);
> +	return ERR_PTR(ret);
> +}
> +
> +void imx_media_mem2mem_vdic_uninit(struct imx_media_video_dev *vdev)
> +{
> +	struct ipu_mem2mem_vdic_priv *priv = to_mem2mem_priv(vdev);
> +	struct video_device *vfd = priv->vdev.vfd;
> +
> +	video_device_release(vfd);
> +	video_set_drvdata(vfd, NULL);
> +	kfree(priv);
> +}
> +
> +MODULE_DESCRIPTION("i.MX VDIC mem2mem de-interlace driver");
> +MODULE_AUTHOR("Marek Vasut <marex@denx.de>");
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/staging/media/imx/imx-media.h b/drivers/staging/media/imx/imx-media.h
> index f095d9134fee4..9f2388e306727 100644
> --- a/drivers/staging/media/imx/imx-media.h
> +++ b/drivers/staging/media/imx/imx-media.h
> @@ -162,6 +162,9 @@ struct imx_media_dev {
>  	/* IC scaler/CSC mem2mem video device */
>  	struct imx_media_video_dev *m2m_vdev;
>  
> +	/* VDIC mem2mem video device */
> +	struct imx_media_video_dev *m2m_vdic[2];
> +
>  	/* the IPU internal subdev's registered synchronously */
>  	struct v4l2_subdev *sync_sd[2][NUM_IPU_SUBDEVS];
>  };
> @@ -284,6 +287,13 @@ imx_media_csc_scaler_device_init(struct imx_media_dev *dev);
>  int imx_media_csc_scaler_device_register(struct imx_media_video_dev *vdev);
>  void imx_media_csc_scaler_device_unregister(struct imx_media_video_dev *vdev);
>  
> +/* imx-media-mem2mem-vdic.c */
> +struct imx_media_video_dev *
> +imx_media_mem2mem_vdic_init(struct imx_media_dev *dev, int ipu_id);
> +void imx_media_mem2mem_vdic_uninit(struct imx_media_video_dev *vdev);
> +int imx_media_mem2mem_vdic_register(struct imx_media_video_dev *vdev);
> +void imx_media_mem2mem_vdic_unregister(struct imx_media_video_dev *vdev);
> +
>  /* subdev group ids */
>  #define IMX_MEDIA_GRP_ID_CSI2          BIT(8)
>  #define IMX_MEDIA_GRP_ID_IPU_CSI_BIT   10
Marek Vasut Sept. 25, 2024, 8:14 p.m. UTC | #11
On 9/25/24 5:07 PM, Philipp Zabel wrote:

Hi,

> On Di, 2024-09-24 at 17:28 +0200, Marek Vasut wrote:
>> On 9/6/24 11:01 AM, Philipp Zabel wrote:
> [...]
>>> Instead of presenting two devices to userspace, it would be better to
>>> have a single video device that can distribute work to both IPUs.
>>
>> Why do you think so ?
> 
> The scaler/colorspace converter supports frames larger than the
> 1024x1024 hardware by splitting each frame into multiple tiles. It
> currently does so sequentially on a single IC. Speed could be improved
> by distributing the tiles to both ICs. This is not an option anymore if
> there are two video devices that are fixed to one IC each.

The userspace could distribute the frames between the two devices in an 
alternating manner, can it not ?

> The same would be possible for the deinterlacer, e.g. to support 720i
> frames split into two tiles each sent to one of the two VDICs.

Would the 1280x360 field be split into two tiles vertically and each 
tile (newly 1280/2 x 360) be enqueued on each VDIC ? I don't think that 
works, because you wouldn't be able to stitch those tiles back together 
nicely after the deinterlacing, would you? I would expect to see some 
sort of an artifact exactly where the two tiles got stitched back 
together, because the VDICs are unaware of each other and how each 
deinterlaced the tile.

>> I think it is better to keep the kernel code as simple as possible, i.e.
>> provide the device node for each m2m device to userspace and handle the
>> m2m device hardware interaction in the kernel driver, but let userspace
>> take care of policy like job scheduling, access permissions assignment
>> to each device (e.g. if different user accounts should have access to
>> different VDICs), or other such topics.
> 
> I both agree and disagree with you at the same time.
> 
> If the programming model were more similar to DRM, I'd agree in a
> heartbeat. If the kernel driver just had to do memory/fence handling
> and command submission (and parameter sanitization, because there is no
> MMU), and there was some userspace API on top, it would make sense to
> me to handle parameter calculation and job scheduling in a hardware
> specific userspace driver that can just open one device for each IPU.
> 
> With the rigid V4L2 model though, where memory handling, parameter
> calculation, and job scheduling of tiles in a single frame all have to
> be hidden behind the V4L2 API, I don't think requiring userspace to
> combine multiple mem2mem video devices to work together on a single
> frame is feasible.

If your concern is throughput (from what I gathered from the text 
above), userspace could schedule frames on either VDIC in alternating 
manner.

I think this is much better and actually generic approach than trying to 
combine two independent devices on kernel level and introduce some sort 
of scheduler into kernel driver to distribute jobs between the two 
devices. Generic, because this approach works even if either of the two 
devices is not VDIC. Independent devices, because yes, the MX6Q IPUs are 
two independent blocks, it is only the current design of the IPUv3 
driver that makes them look kind-of like they are one single big device, 
I am not happy about that design, but rewriting the IPUv3 driver is way 
out of scope here. (*)

> Is limiting different users to the different deinterlacer hardware
> units a real usecase? I saw the two ICs, when used as mem2mem devices,
> as interchangeable resources.

I do not have that use case, but I can imagine it could come up.
In my case, I schedule different cameras to different VDICs from 
userspace as needed.

>>> To be fair, we never implemented that for the CSC/scaler mem2mem device
>>> either.
>>
>> I don't think that is actually a good idea. Instead, it would be better
>> to have two scaler nodes in userspace.
> 
> See above, that would make it impossible (or rather unreasonably
> complicated) to distribute work on a single frame to both IPUs.

Is your concern latency instead of throughput ? See my comment in 
paragraph (*) .

> 
> [...]
>>>> +	ipu_cpmem_set_buffer(priv->vdi_out_ch,  0, out_phys);
>>>> +	ipu_cpmem_set_buffer(priv->vdi_in_ch_p, 0, prev_phys + phys_offset);
>>>> +	ipu_cpmem_set_buffer(priv->vdi_in_ch,   0, curr_phys);
>>>> +	ipu_cpmem_set_buffer(priv->vdi_in_ch_n, 0, curr_phys + phys_offset);
>>>
>>> This always outputs at a frame rate of half the field rate, and only
>>> top fields are ever used as current field, and bottom fields as
>>> previous/next fields, right?
>>
>> Yes, currently the driver extracts 1 frame from two consecutive incoming
>> fields (previous Bottom, and current Top and Bottom):
>>
>> (frame 1 and 3 below is omitted)
>>
>>       1  2  3  4
>> ...|T |T |T |T |...
>> ...| B| B| B| B|...
>>        | ||  | ||
>>        '-''  '-''
>>         ||    ||
>>         ||    \/
>>         \/  Frame#4
>>       Frame#2
>>
>> As far as I understand it, this is how the current VDI implementation
>> behaves too, right ?
> 
> Yes, that is a hardware limitation when using the direct CSI->VDIC
> direct path. As far as I understand, for each frame (two fields) the
> CSI only sends the first ("PREV") field directly to the VDIC, which
> therefor can only be run in full motion mode (use the filter to add in
> the missing lines).
> The second ("CUR") field is just ignored. It could be written to RAM
> via IDMAC output channel 13 (IPUV3_CHANNEL_VDI_MEM_RECENT), which can
> not be used by the VDIC in direct mode. So this is not implemented.
> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/media/imx/imx-media-vdic.c#n207
> 
> That code is unused. The direct hardware path doesn't use
> IPUV3_CHANNEL_MEM_VDI_PREV/CUR/NEXT, but is has a similar effect, half
> of the incoming fields are dropped. The setup is vdic_setup_direct().

All right, let's drop that unused code then, I'll prepare a patch.

But it seems the bottom line is, the VDI direct mode does not act as a 
frame-rate doubler ?

>>> I think it would be good to add a mode that doesn't drop the
>>>
>>> 	ipu_cpmem_set_buffer(priv->vdi_in_ch_p, 0, prev_phys);
>>> 	ipu_cpmem_set_buffer(priv->vdi_in_ch,   0, prev_phys + phys_offset);
>>> 	ipu_cpmem_set_buffer(priv->vdi_in_ch_n, 0, curr_phys);
>>>
>>> output frames, right from the start.
>>
>> This would make the VDI act as a frame-rate doubler, which would spend a
>> lot more memory bandwidth, which is limited on MX6, so I would also like
>> to have a frame-drop mode (i.e. current behavior).
>>
>> Can we make that behavior configurable ? Since this is a mem2mem device,
>> we do not really have any notion of input and output frame-rate, so I
>> suspect this would need some VIDIOC_* ioctl ?
> 
> That would be good. The situation I'd like to avoid is that this device
> becomes available without the full frame-rate mode, userspace then
> assumes this is a 1:1 frame converter device, and then we can't add the
> full frame-rate later without breaking userspace.

Why would adding the (configurable) frame-rate doubling mode break 
userspace if this is not the default ?

>>> If we don't start with that supported, I fear userspace will make
>>> assumptions and be surprised when a full rate mode is added later.
>>
>> I'm afraid that since the current VDI already does retain input frame
>> rate instead of doubling it, the userspace already makes an assumption,
>> so that ship has sailed.
> 
> No, this is about the deinterlacer mem2mem device, which doesn't exist
> before this series.

I am not convinced it is OK if the direct VDI path and mem2mem VDI 
behave differently, that would be surprising to me as a user ?

> The CSI capture path already has configurable framedrops (in the CSI).

What am I looking for ? git grep doesn't give me any hits ? (**)

>> But I think we can make the frame doubling configurable ?
> 
> That would be good. Specifically, there must be no guarantee that one
> input frame with two fields only produces one deinterlaced output
> frame, and userspace should somehow be able to understand this.

See my question (**) , where is this configurable framedrops thing ?

> This would be an argument against Nicolas' suggestion of including this
> in the csc/scaler device, which always must produce one output frame
> per input frame.
> 
> [...]
>>> This maps to VDI_C_MOT_SEL_FULL aka VDI_MOT_SEL=2, which is documented
>>> as "full motion, only vertical filter is used". Doesn't this completely
>>> ignore the previous/next fields and only use the output of the di_vfilt
>>> four tap vertical filter block to fill in missing lines from the
>>> surrounding pixels (above and below) of the current field?
>>
>> Is there a suitable knob for this or shall I introduce a device specific
>> one, like the vdic_ctrl_motion_menu for the current VDIC direct driver ?
>>
>> If we introduce such a knob, then it is all the more reason to provide
>> one device node per one VDIC hardware instance, since each can be
>> configured for different motion settings.
> 
> As far as I know, there is no such control yet. I don't think this
> should be per-device, but per-stream (or even per-frame).
> 
>>> I think this should at least be configurable, and probably default to
>>> MED_MOTION.
>>
>> I think to be compatible with the current VDI behavior and to reduce
>> memory bandwidth usage, let's default to the HIGH/full mode. That one
>> produces reasonably good results without spending too much memory
>> bandwidth which is constrained already on the MX6, and if the user needs
>> better image quality, they can configure another mode using the V4L2
>> control.
> 
> I'd rather not default to the setting that throws away half of the
> input data. Not using frame doubling by default is sensible, but now
> that using all three input fields to calculate the output frame is
> possible, why not make that the default.
To save memory bandwidth on the MX6, that's my main concern.
Marek Vasut Sept. 25, 2024, 8:45 p.m. UTC | #12
On 9/25/24 7:58 PM, Nicolas Dufresne wrote:

Hi,

[...]

>> +static struct v4l2_pix_format *
>> +ipu_mem2mem_vdic_get_format(struct ipu_mem2mem_vdic_priv *priv,
>> +			    enum v4l2_buf_type type)
>> +{
>> +	return &priv->fmt[V4L2_TYPE_IS_OUTPUT(type) ? V4L2_M2M_SRC : V4L2_M2M_DST];
>> +}
> 
>  From here ...
> 
>> +
>> +static bool ipu_mem2mem_vdic_format_is_yuv420(const u32 pixelformat)
>> +{
>> +	/* All 4:2:0 subsampled formats supported by this hardware */
>> +	return pixelformat == V4L2_PIX_FMT_YUV420 ||
>> +	       pixelformat == V4L2_PIX_FMT_YVU420 ||
>> +	       pixelformat == V4L2_PIX_FMT_NV12;
>> +}
>> +
>> +static bool ipu_mem2mem_vdic_format_is_yuv422(const u32 pixelformat)
>> +{
>> +	/* All 4:2:2 subsampled formats supported by this hardware */
>> +	return pixelformat == V4L2_PIX_FMT_UYVY ||
>> +	       pixelformat == V4L2_PIX_FMT_YUYV ||
>> +	       pixelformat == V4L2_PIX_FMT_YUV422P ||
>> +	       pixelformat == V4L2_PIX_FMT_NV16;
>> +}
>> +
>> +static bool ipu_mem2mem_vdic_format_is_yuv(const u32 pixelformat)
>> +{
>> +	return ipu_mem2mem_vdic_format_is_yuv420(pixelformat) ||
>> +	       ipu_mem2mem_vdic_format_is_yuv422(pixelformat);
>> +}
>> +
>> +static bool ipu_mem2mem_vdic_format_is_rgb16(const u32 pixelformat)
>> +{
>> +	/* All 16-bit RGB formats supported by this hardware */
>> +	return pixelformat == V4L2_PIX_FMT_RGB565;
>> +}
>> +
>> +static bool ipu_mem2mem_vdic_format_is_rgb24(const u32 pixelformat)
>> +{
>> +	/* All 24-bit RGB formats supported by this hardware */
>> +	return pixelformat == V4L2_PIX_FMT_RGB24 ||
>> +	       pixelformat == V4L2_PIX_FMT_BGR24;
>> +}
>> +
>> +static bool ipu_mem2mem_vdic_format_is_rgb32(const u32 pixelformat)
>> +{
>> +	/* All 32-bit RGB formats supported by this hardware */
>> +	return pixelformat == V4L2_PIX_FMT_XRGB32 ||
>> +	       pixelformat == V4L2_PIX_FMT_XBGR32 ||
>> +	       pixelformat == V4L2_PIX_FMT_BGRX32 ||
>> +	       pixelformat == V4L2_PIX_FMT_RGBX32;
>> +}
> 
> To here, these days, all this information can be derived from v4l2_format_info
> in v4l2-common in a way you don't have to create a big barrier to adding more
> formats in the future.

I am not sure I quite understand this suggestion, what should I change here?

Note that the IPUv3 seems to be done, it does not seem like there will 
be new SoCs with this block, so the list of formats here is likely final.

[...]

>> +static irqreturn_t ipu_mem2mem_vdic_nfb4eof_interrupt(int irq, void *dev_id)
>> +{
>> +	struct ipu_mem2mem_vdic_priv *priv = dev_id;
>> +
>> +	/* That is about all we can do about it, report it. */
>> +	dev_warn_ratelimited(priv->dev, "NFB4EOF error interrupt occurred\n");
> 
> Not sure this is right. If that means ipu_mem2mem_vdic_eof_interrupt won't fire,
> then it means streamoff/close after that will hang forever, leaving a zombie
> process behind.
> 
> Perhaps mark the buffers as ERROR, and finish the job.

The NFB4EOF interrupt is generated when the VDIC didn't write (all of) 
output frame . I think it stands for "New Frame Before EOF" or some 
such. Basically the currently written frame will be corrupted and the 
next frame(s) are likely going to be OK again.

>> +
>> +	return IRQ_HANDLED;
>> +}
>> +
>> +static void ipu_mem2mem_vdic_device_run(void *_ctx)
>> +{
>> +	struct ipu_mem2mem_vdic_ctx *ctx = _ctx;
>> +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
>> +	struct vb2_v4l2_buffer *curr_buf, *dst_buf;
>> +	dma_addr_t prev_phys, curr_phys, out_phys;
>> +	struct v4l2_pix_format *infmt;
>> +	u32 phys_offset = 0;
>> +	unsigned long flags;
>> +
>> +	infmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_OUTPUT);
>> +	if (V4L2_FIELD_IS_SEQUENTIAL(infmt->field))
>> +		phys_offset = infmt->sizeimage / 2;
>> +	else if (V4L2_FIELD_IS_INTERLACED(infmt->field))
>> +		phys_offset = infmt->bytesperline;
>> +	else
>> +		dev_err(priv->dev, "Invalid field %d\n", infmt->field);
>> +
>> +	dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
>> +	out_phys = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
>> +
>> +	curr_buf = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
>> +	if (!curr_buf) {
>> +		dev_err(priv->dev, "Not enough buffers\n");
>> +		return;
> 
> Impossible branch, has been checked by __v4l2_m2m_try_queue().

Fixed in V3

>> +	}
>> +
>> +	spin_lock_irqsave(&priv->irqlock, flags);
>> +
>> +	if (ctx->curr_buf) {
>> +		ctx->prev_buf = ctx->curr_buf;
>> +		ctx->curr_buf = curr_buf;
>> +	} else {
>> +		ctx->prev_buf = curr_buf;
>> +		ctx->curr_buf = curr_buf;
>> +		dev_warn(priv->dev, "Single-buffer mode, fix your userspace\n");
>> +	}
> 
> The driver is not taking ownership of prev_buf, only curr_buf is guaranteed to
> exist until v4l2_m2m_job_finish() is called. Usespace could streamoff, allocate
> new buffers, and then an old freed buffer may endup being used.

So, what should I do about this ? Is there some way to ref the buffer to 
keep it around ?

> Its also unclear to me how userspace can avoid this ugly warning, how can you
> have curr_buf set the first time ? (I might be missing something you this one
> though).

The warning happens when streaming starts and there is only one input 
frame available for the VDIC, which needs three fields to work 
correctly. So, if there in only one input frame, VDI uses the input 
frame bottom field as PREV field for the prediction, and input frame top 
and bottom fields as CURR and NEXT fields for the prediction, the result 
may be one sub-optimal deinterlaced output frame (the first one). Once 
another input frame gets enqueued, the VDIC uses the previous frame 
bottom field as PREV and the newly enqueued frame top and bottom fields 
as CURR and NEXT and the prediction works correctly from that point on.

> Perhaps what you want is a custom job_ready() callback, that ensure you have 2
> buffers in the OUTPUT queue ? You also need to ajust the CID
> MIN_BUFFERS_FOR_OUTPUT accordingly.

I had that before, but gstreamer didn't enqueue the two frames for me, 
so I got back to this variant for maximum compatibility.

>> +	prev_phys = vb2_dma_contig_plane_dma_addr(&ctx->prev_buf->vb2_buf, 0);
>> +	curr_phys = vb2_dma_contig_plane_dma_addr(&ctx->curr_buf->vb2_buf, 0);
>> +
>> +	priv->curr_ctx = ctx;
>> +	spin_unlock_irqrestore(&priv->irqlock, flags);
>> +
>> +	ipu_cpmem_set_buffer(priv->vdi_out_ch,  0, out_phys);
>> +	ipu_cpmem_set_buffer(priv->vdi_in_ch_p, 0, prev_phys + phys_offset);
>> +	ipu_cpmem_set_buffer(priv->vdi_in_ch,   0, curr_phys);
>> +	ipu_cpmem_set_buffer(priv->vdi_in_ch_n, 0, curr_phys + phys_offset);
>> +
>> +	/* No double buffering, always pick buffer 0 */
>> +	ipu_idmac_select_buffer(priv->vdi_out_ch, 0);
>> +	ipu_idmac_select_buffer(priv->vdi_in_ch_p, 0);
>> +	ipu_idmac_select_buffer(priv->vdi_in_ch, 0);
>> +	ipu_idmac_select_buffer(priv->vdi_in_ch_n, 0);
>> +
>> +	/* Enable the channels */
>> +	ipu_idmac_enable_channel(priv->vdi_out_ch);
>> +	ipu_idmac_enable_channel(priv->vdi_in_ch_p);
>> +	ipu_idmac_enable_channel(priv->vdi_in_ch);
>> +	ipu_idmac_enable_channel(priv->vdi_in_ch_n);
>> +}

[...]

>> +static int ipu_mem2mem_vdic_try_fmt(struct file *file, void *fh,
>> +				    struct v4l2_format *f)
>> +{
>> +	const struct imx_media_pixfmt *cc;
>> +	enum imx_pixfmt_sel cs;
>> +	u32 fourcc;
>> +
>> +	if (f->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) {	/* Output */
>> +		cs = PIXFMT_SEL_YUV_RGB;	/* YUV direct / RGB via IC */
>> +
>> +		f->fmt.pix.field = V4L2_FIELD_NONE;
>> +	} else {
>> +		cs = PIXFMT_SEL_YUV;		/* YUV input only */
>> +
>> +		/*
>> +		 * Input must be interlaced with frame order.
>> +		 * Fall back to SEQ_TB otherwise.
>> +		 */
>> +		if (!V4L2_FIELD_HAS_BOTH(f->fmt.pix.field) ||
>> +		    f->fmt.pix.field == V4L2_FIELD_INTERLACED)
>> +			f->fmt.pix.field = V4L2_FIELD_SEQ_TB;
>> +	}
>> +
>> +	fourcc = f->fmt.pix.pixelformat;
>> +	cc = imx_media_find_pixel_format(fourcc, cs);
>> +	if (!cc) {
>> +		imx_media_enum_pixel_formats(&fourcc, 0, cs, 0);
>> +		cc = imx_media_find_pixel_format(fourcc, cs);
>> +	}
>> +
>> +	f->fmt.pix.pixelformat = cc->fourcc;
>> +
>> +	v4l_bound_align_image(&f->fmt.pix.width,
>> +			      1, 968, 1,
>> +			      &f->fmt.pix.height,
>> +			      1, 1024, 1, 1);
> 
> Perhaps use defines for the magic numbers ?

Fixed in V3, thanks

>> +
>> +	if (ipu_mem2mem_vdic_format_is_yuv420(f->fmt.pix.pixelformat))
>> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 3 / 2;
>> +	else if (ipu_mem2mem_vdic_format_is_yuv422(f->fmt.pix.pixelformat))
>> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 2;
>> +	else if (ipu_mem2mem_vdic_format_is_rgb16(f->fmt.pix.pixelformat))
>> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 2;
>> +	else if (ipu_mem2mem_vdic_format_is_rgb24(f->fmt.pix.pixelformat))
>> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 3;
>> +	else if (ipu_mem2mem_vdic_format_is_rgb32(f->fmt.pix.pixelformat))
>> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 4;
>> +	else
>> +		f->fmt.pix.bytesperline = f->fmt.pix.width;
>> +
>> +	f->fmt.pix.sizeimage = f->fmt.pix.height * f->fmt.pix.bytesperline;
> 
> And use v4l2-common ?

I don't really understand, there is nothing in v4l2-common.c that would 
be really useful replacement for this ?

>> +	return 0;
>> +}
>> +
>> +static int ipu_mem2mem_vdic_s_fmt(struct file *file, void *fh, struct v4l2_format *f)
>> +{
>> +	struct ipu_mem2mem_vdic_ctx *ctx = fh_to_ctx(fh);
>> +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
>> +	struct v4l2_pix_format *fmt, *infmt, *outfmt;
>> +	struct vb2_queue *vq;
>> +	int ret;
>> +
>> +	vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
>> +	if (vb2_is_busy(vq)) {
>> +		dev_err(priv->dev, "%s queue busy\n",  __func__);
>> +		return -EBUSY;
>> +	}
>> +
>> +	ret = ipu_mem2mem_vdic_try_fmt(file, fh, f);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	fmt = ipu_mem2mem_vdic_get_format(priv, f->type);
>> +	*fmt = f->fmt.pix;
>> +
>> +	/* Propagate colorimetry to the capture queue */
>> +	infmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_OUTPUT);
>> +	outfmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_CAPTURE);
>> +	outfmt->colorspace = infmt->colorspace;
>> +	outfmt->ycbcr_enc = infmt->ycbcr_enc;
>> +	outfmt->xfer_func = infmt->xfer_func;
>> +	outfmt->quantization = infmt->quantization;
> 
> So you can do CSC conversion but not colorimetry ? We have
> V4L2_PIX_FMT_FLAG_SET_CSC if you can do colorimetry transforms too. I have
> patches that I'll send for the csc-scaler driver.

See ipu_ic_calc_csc() , that's what does the colorspace conversion in 
this driver (on output from VDI).

[...]
Philipp Zabel Sept. 26, 2024, 11:14 a.m. UTC | #13
Hi,

On Mi, 2024-09-25 at 22:14 +0200, Marek Vasut wrote:
> The userspace could distribute the frames between the two devices in an 
> alternating manner, can it not ?

This doesn't help with latency, or when converting a single large
frame.

For the deinterlacer, this can't be done with the motion-aware
temporally filtering modes. Those need a field from the previous frame.

> 
> Would the 1280x360 field be split into two tiles vertically and each 
> tile (newly 1280/2 x 360) be enqueued on each VDIC ? I don't think that 
> works, because you wouldn't be able to stitch those tiles back together 
> nicely after the deinterlacing, would you? I would expect to see some 
> sort of an artifact exactly where the two tiles got stitched back 
> together, because the VDICs are unaware of each other and how each 
> deinterlaced the tile.

I was thinking horizontally, two 640x720 tiles side by side. 1280 is
larger than the 968 pixel maximum horizontal resolution of the VDIC.

As you say, splitting vertically (which would be required for 1080i)
should cause artifacts at the seam due to the 4-tap vertical filter.

[...]
> > 
> > With the rigid V4L2 model though, where memory handling, parameter
> > calculation, and job scheduling of tiles in a single frame all have to
> > be hidden behind the V4L2 API, I don't think requiring userspace to
> > combine multiple mem2mem video devices to work together on a single
> > frame is feasible.
> 
> If your concern is throughput (from what I gathered from the text 
> above), userspace could schedule frames on either VDIC in alternating 
> manner.

Both throughput and latency.

Yes, alternating to different devices would help with throughput where
possible, but it's worse for frame pacing, a hassle to implement
generically in userspace, and it's straight up impossible with temporal
filtering.

> I think this is much better and actually generic approach than trying to 
> combine two independent devices on kernel level and introduce some sort 
> of scheduler into kernel driver to distribute jobs between the two 
> devices. Generic, because this approach works even if either of the two 
> devices is not VDIC. Independent devices, because yes, the MX6Q IPUs are 
> two independent blocks, it is only the current design of the IPUv3 
> driver that makes them look kind-of like they are one single big device, 
> I am not happy about that design, but rewriting the IPUv3 driver is way 
> out of scope here. (*)

The IPUs are glued together at the capture and output paths, so yes,
they are independent blocks, but also work together as a big device.

> > Is limiting different users to the different deinterlacer hardware
> > units a real usecase? I saw the two ICs, when used as mem2mem devices,
> > as interchangeable resources.
> 
> I do not have that use case, but I can imagine it could come up.
> In my case, I schedule different cameras to different VDICs from 
> userspace as needed.

Is this just because a single VDIC does not have enough throughput to
serve all cameras, or is there some reason for a fixed assignment
between cameras and VDICs?

> > > > To be fair, we never implemented that for the CSC/scaler mem2mem device
> > > > either.
> > > 
> > > I don't think that is actually a good idea. Instead, it would be better
> > > to have two scaler nodes in userspace.
> > 
> > See above, that would make it impossible (or rather unreasonably
> > complicated) to distribute work on a single frame to both IPUs.
> 
> Is your concern latency instead of throughput ? See my comment in 
> paragraph (*) .

Either, depending on the use case.

[...]
> > > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/media/imx/imx-media-vdic.c#n207
> > 
> > That code is unused. The direct hardware path doesn't use
> > IPUV3_CHANNEL_MEM_VDI_PREV/CUR/NEXT, but is has a similar effect, half
> > of the incoming fields are dropped. The setup is vdic_setup_direct().
> 
> All right, let's drop that unused code then, I'll prepare a patch.

Thanks!

> But it seems the bottom line is, the VDI direct mode does not act as a 
> frame-rate doubler ?

Yes, it can't. In direct mode, VDIC only receives half of the fields.

[...]
> > > 
> Why would adding the (configurable) frame-rate doubling mode break 
> userspace if this is not the default ?

I'm not sure it would. Maybe there should be a deinterlacer control to
choose between full and half field rate output (aka frame doubling and
1:1 input to output frame rate).

Also, my initial assumption was that currently there is 1:1 input
frames to output frames. But with temporal filtering enabled there's
already one input frame (the first one) that doesn't produce any
output.

> > > > If we don't start with that supported, I fear userspace will make
> > > > assumptions and be surprised when a full rate mode is added later.
> > > 
> > > I'm afraid that since the current VDI already does retain input frame
> > > rate instead of doubling it, the userspace already makes an assumption,
> > > so that ship has sailed.
> > 
> > No, this is about the deinterlacer mem2mem device, which doesn't exist
> > before this series.
> 
> I am not convinced it is OK if the direct VDI path and mem2mem VDI 
> behave differently, that would be surprising to me as a user ?

Is this still about the frame rate doubling? Surely supporting it in
the mem2mem device and not in the capture path is ok. I'm not arguing
that frame doubling should be enabled by default.

> > The CSI capture path already has configurable framedrops (in the CSI).
> 
> What am I looking for ? git grep doesn't give me any hits ? (**)

That's configured by the set_frame_interval pad op of the CSI subdevice
- on the IDMAC output pad. See csi_find_best_skip().

> > > But I think we can make the frame doubling configurable ?
> > 
> > That would be good. Specifically, there must be no guarantee that one
> > input frame with two fields only produces one deinterlaced output
> > frame, and userspace should somehow be able to understand this.
> 
> See my question (**) , where is this configurable framedrops thing ?

This would have to be done differently, though. Here we don't have
subdev set_frame_interval configuration, and while VIDIOC_S_PARM /
v4l2_captureparm were used to configure frame dropping on capture
devices, that's not really applicable to mem2mem deinterlacers.

> > I'd rather not default to the setting that throws away half of the
> > input data. Not using frame doubling by default is sensible, but now
> > that using all three input fields to calculate the output frame is
> > possible, why not make that the default.
>
> To save memory bandwidth on the MX6, that's my main concern.

What userspace are you using to exercise this driver? Maybe we can back
this concern with a few numbers (or mine with pictures).

regards
Philipp
Philipp Zabel Sept. 26, 2024, 11:16 a.m. UTC | #14
On Mi, 2024-09-25 at 22:45 +0200, Marek Vasut wrote:
[...]
> > The driver is not taking ownership of prev_buf, only curr_buf is guaranteed to
> > exist until v4l2_m2m_job_finish() is called. Usespace could streamoff, allocate
> > new buffers, and then an old freed buffer may endup being used.
> 
> So, what should I do about this ? Is there some way to ref the buffer to 
> keep it around ?

Have a look how other deinterlacers with temporal filtering do it.
sunxi/sun8i-di or ti/vpe look like candidates.

> 
> > 
regards
Phlipp
Nicolas Dufresne Sept. 27, 2024, 7:33 p.m. UTC | #15
Le mercredi 25 septembre 2024 à 22:45 +0200, Marek Vasut a écrit :
> On 9/25/24 7:58 PM, Nicolas Dufresne wrote:
> 
> 

[...]

> 
> > > +static irqreturn_t ipu_mem2mem_vdic_nfb4eof_interrupt(int irq, void *dev_id)
> > > +{
> > > +	struct ipu_mem2mem_vdic_priv *priv = dev_id;
> > > +
> > > +	/* That is about all we can do about it, report it. */
> > > +	dev_warn_ratelimited(priv->dev, "NFB4EOF error interrupt occurred\n");
> > 
> > Not sure this is right. If that means ipu_mem2mem_vdic_eof_interrupt won't fire,
> > then it means streamoff/close after that will hang forever, leaving a zombie
> > process behind.
> > 
> > Perhaps mark the buffers as ERROR, and finish the job.
> 
> The NFB4EOF interrupt is generated when the VDIC didn't write (all of) 
> output frame . I think it stands for "New Frame Before EOF" or some 
> such. Basically the currently written frame will be corrupted and the 
> next frame(s) are likely going to be OK again.

So the other IRQ will be triggered ? After this one ? Is so, perhaps take a
moment to mark the frames as ERROR (which means corrupted).

[...]

> > 
> > The driver is not taking ownership of prev_buf, only curr_buf is guaranteed to
> > exist until v4l2_m2m_job_finish() is called. Usespace could streamoff, allocate
> > new buffers, and then an old freed buffer may endup being used.
> 
> So, what should I do about this ? Is there some way to ref the buffer to 
> keep it around ?
> 
> > Its also unclear to me how userspace can avoid this ugly warning, how can you
> > have curr_buf set the first time ? (I might be missing something you this one
> > though).
> 
> The warning happens when streaming starts and there is only one input 
> frame available for the VDIC, which needs three fields to work 
> correctly. So, if there in only one input frame, VDI uses the input 
> frame bottom field as PREV field for the prediction, and input frame top 
> and bottom fields as CURR and NEXT fields for the prediction, the result 
> may be one sub-optimal deinterlaced output frame (the first one). Once 
> another input frame gets enqueued, the VDIC uses the previous frame 
> bottom field as PREV and the newly enqueued frame top and bottom fields 
> as CURR and NEXT and the prediction works correctly from that point on.

Warnings by default are not acceptable.

> 
> > Perhaps what you want is a custom job_ready() callback, that ensure you have 2
> > buffers in the OUTPUT queue ? You also need to ajust the CID
> > MIN_BUFFERS_FOR_OUTPUT accordingly.
> 
> I had that before, but gstreamer didn't enqueue the two frames for me, 
> so I got back to this variant for maximum compatibility.

Its well known that GStreamer v4l2convert element have no support for
detinterlacing and need to be improved to support any deinterlace drivers out
there.

Other drivers will simply holds on output buffers until it has enough to produce
the first valid picture. Holding meaning not marking them done, which keeps then
in the ACTIVE state, which is being tracked by the core for your.

[...]

> > > +
> > > +	if (ipu_mem2mem_vdic_format_is_yuv420(f->fmt.pix.pixelformat))
> > > +		f->fmt.pix.bytesperline = f->fmt.pix.width * 3 / 2;
> > > +	else if (ipu_mem2mem_vdic_format_is_yuv422(f->fmt.pix.pixelformat))
> > > +		f->fmt.pix.bytesperline = f->fmt.pix.width * 2;
> > > +	else if (ipu_mem2mem_vdic_format_is_rgb16(f->fmt.pix.pixelformat))
> > > +		f->fmt.pix.bytesperline = f->fmt.pix.width * 2;
> > > +	else if (ipu_mem2mem_vdic_format_is_rgb24(f->fmt.pix.pixelformat))
> > > +		f->fmt.pix.bytesperline = f->fmt.pix.width * 3;
> > > +	else if (ipu_mem2mem_vdic_format_is_rgb32(f->fmt.pix.pixelformat))
> > > +		f->fmt.pix.bytesperline = f->fmt.pix.width * 4;
> > > +	else
> > > +		f->fmt.pix.bytesperline = f->fmt.pix.width;
> > > +
> > > +	f->fmt.pix.sizeimage = f->fmt.pix.height * f->fmt.pix.bytesperline;
> > 
> > And use v4l2-common ?
> 
> I don't really understand, there is nothing in v4l2-common.c that would 
> be really useful replacement for this ?

Not sure I get your response, v4l2-common is used in many drivers already, and
we intent to keep improving it so that all driver uses it in the long term. It
been created because folks believed they can calculate bytesperline and
sizeimage, but as the number of format grows, it always endup wrong, causing the
HW to overflow and break the system at a larger scale.

> 
> > > +	return 0;
> > > +}
> > > +
> > > +static int ipu_mem2mem_vdic_s_fmt(struct file *file, void *fh, struct v4l2_format *f)
> > > +{
> > > +	struct ipu_mem2mem_vdic_ctx *ctx = fh_to_ctx(fh);
> > > +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
> > > +	struct v4l2_pix_format *fmt, *infmt, *outfmt;
> > > +	struct vb2_queue *vq;
> > > +	int ret;
> > > +
> > > +	vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
> > > +	if (vb2_is_busy(vq)) {
> > > +		dev_err(priv->dev, "%s queue busy\n",  __func__);
> > > +		return -EBUSY;
> > > +	}
> > > +
> > > +	ret = ipu_mem2mem_vdic_try_fmt(file, fh, f);
> > > +	if (ret < 0)
> > > +		return ret;
> > > +
> > > +	fmt = ipu_mem2mem_vdic_get_format(priv, f->type);
> > > +	*fmt = f->fmt.pix;
> > > +
> > > +	/* Propagate colorimetry to the capture queue */
> > > +	infmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_OUTPUT);
> > > +	outfmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_CAPTURE);
> > > +	outfmt->colorspace = infmt->colorspace;
> > > +	outfmt->ycbcr_enc = infmt->ycbcr_enc;
> > > +	outfmt->xfer_func = infmt->xfer_func;
> > > +	outfmt->quantization = infmt->quantization;
> > 
> > So you can do CSC conversion but not colorimetry ? We have
> > V4L2_PIX_FMT_FLAG_SET_CSC if you can do colorimetry transforms too. I have
> > patches that I'll send for the csc-scaler driver.
> 
> See ipu_ic_calc_csc() , that's what does the colorspace conversion in 
> this driver (on output from VDI).

int ipu_ic_calc_csc(struct ipu_ic_csc *csc,
                    enum v4l2_ycbcr_encoding in_enc,
                    enum v4l2_quantization in_quant,
                    enum ipu_color_space in_cs,
                    enum v4l2_ycbcr_encoding out_enc,
                    enum v4l2_quantization out_quant,
                    enum ipu_color_space out_cs)

So instead of simply overriding CSC like you do, let userspace set different CSC
in and out, so that IPU can handle the conversion properly with correct colors.
That requires to flag these in the fmt_desc structure during enum format, and to
only read acknowledge the CSC if userspace have set V4L2_PIX_FMT_FLAG_SET_CSC,
in other condition, the information must be ignored (which you don't).

Nicolas
Marek Vasut Oct. 3, 2024, 2:57 p.m. UTC | #16
On 9/26/24 1:16 PM, Philipp Zabel wrote:
> On Mi, 2024-09-25 at 22:45 +0200, Marek Vasut wrote:
> [...]
>>> The driver is not taking ownership of prev_buf, only curr_buf is guaranteed to
>>> exist until v4l2_m2m_job_finish() is called. Usespace could streamoff, allocate
>>> new buffers, and then an old freed buffer may endup being used.
>>
>> So, what should I do about this ? Is there some way to ref the buffer to
>> keep it around ?
> 
> Have a look how other deinterlacers with temporal filtering do it.
> sunxi/sun8i-di or ti/vpe look like candidates.
I don't see exactly what those drivers are doing differently to protect 
the prev buffer during deinterlacing . Can you be more specific ?
Marek Vasut Oct. 3, 2024, 3:11 p.m. UTC | #17
On 9/26/24 1:14 PM, Philipp Zabel wrote:
> Hi,

Hi,

> On Mi, 2024-09-25 at 22:14 +0200, Marek Vasut wrote:
>> The userspace could distribute the frames between the two devices in an
>> alternating manner, can it not ?
> 
> This doesn't help with latency, or when converting a single large
> frame.
> 
> For the deinterlacer, this can't be done with the motion-aware
> temporally filtering modes. Those need a field from the previous frame.

It is up to the userspace to pass the correct frames to the deinterlacer.

>> Would the 1280x360 field be split into two tiles vertically and each
>> tile (newly 1280/2 x 360) be enqueued on each VDIC ? I don't think that
>> works, because you wouldn't be able to stitch those tiles back together
>> nicely after the deinterlacing, would you? I would expect to see some
>> sort of an artifact exactly where the two tiles got stitched back
>> together, because the VDICs are unaware of each other and how each
>> deinterlaced the tile.
> 
> I was thinking horizontally, two 640x720 tiles side by side. 1280 is
> larger than the 968 pixel maximum horizontal resolution of the VDIC.
> 
> As you say, splitting vertically (which would be required for 1080i)
> should cause artifacts at the seam due to the 4-tap vertical filter.

Can the userspace set some sort of offset/stride in each buffer and 
distribute the task between the two VDIs then ?

> [...]
>>>
>>> With the rigid V4L2 model though, where memory handling, parameter
>>> calculation, and job scheduling of tiles in a single frame all have to
>>> be hidden behind the V4L2 API, I don't think requiring userspace to
>>> combine multiple mem2mem video devices to work together on a single
>>> frame is feasible.
>>
>> If your concern is throughput (from what I gathered from the text
>> above), userspace could schedule frames on either VDIC in alternating
>> manner.
> 
> Both throughput and latency.
> 
> Yes, alternating to different devices would help with throughput where
> possible, but it's worse for frame pacing, a hassle to implement
> generically in userspace, and it's straight up impossible with temporal
> filtering.

See above, userspace should be able to pass the correct frames to m2m 
device.

>> I think this is much better and actually generic approach than trying to
>> combine two independent devices on kernel level and introduce some sort
>> of scheduler into kernel driver to distribute jobs between the two
>> devices. Generic, because this approach works even if either of the two
>> devices is not VDIC. Independent devices, because yes, the MX6Q IPUs are
>> two independent blocks, it is only the current design of the IPUv3
>> driver that makes them look kind-of like they are one single big device,
>> I am not happy about that design, but rewriting the IPUv3 driver is way
>> out of scope here. (*)
> 
> The IPUs are glued together at the capture and output paths, so yes,
> they are independent blocks, but also work together as a big device.
> 
>>> Is limiting different users to the different deinterlacer hardware
>>> units a real usecase? I saw the two ICs, when used as mem2mem devices,
>>> as interchangeable resources.
>>
>> I do not have that use case, but I can imagine it could come up.
>> In my case, I schedule different cameras to different VDICs from
>> userspace as needed.
> 
> Is this just because a single VDIC does not have enough throughput to
> serve all cameras, or is there some reason for a fixed assignment
> between cameras and VDICs?

I want to be able to distribute the bandwidth utilization between the 
two IPUs .

>>>>> To be fair, we never implemented that for the CSC/scaler mem2mem device
>>>>> either.
>>>>
>>>> I don't think that is actually a good idea. Instead, it would be better
>>>> to have two scaler nodes in userspace.
>>>
>>> See above, that would make it impossible (or rather unreasonably
>>> complicated) to distribute work on a single frame to both IPUs.
>>
>> Is your concern latency instead of throughput ? See my comment in
>> paragraph (*) .
> 
> Either, depending on the use case.
> 
> [...]
>>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/media/imx/imx-media-vdic.c#n207
>>>
>>> That code is unused. The direct hardware path doesn't use
>>> IPUV3_CHANNEL_MEM_VDI_PREV/CUR/NEXT, but is has a similar effect, half
>>> of the incoming fields are dropped. The setup is vdic_setup_direct().
>>
>> All right, let's drop that unused code then, I'll prepare a patch.
> 
> Thanks!
> 
>> But it seems the bottom line is, the VDI direct mode does not act as a
>> frame-rate doubler ?
> 
> Yes, it can't. In direct mode, VDIC only receives half of the fields.
> 
> [...]
>>>>
>> Why would adding the (configurable) frame-rate doubling mode break
>> userspace if this is not the default ?
> 
> I'm not sure it would. Maybe there should be a deinterlacer control to
> choose between full and half field rate output (aka frame doubling and
> 1:1 input to output frame rate).
> 
> Also, my initial assumption was that currently there is 1:1 input
> frames to output frames. But with temporal filtering enabled there's
> already one input frame (the first one) that doesn't produce any
> output.

Hum, ok.

>>>>> If we don't start with that supported, I fear userspace will make
>>>>> assumptions and be surprised when a full rate mode is added later.
>>>>
>>>> I'm afraid that since the current VDI already does retain input frame
>>>> rate instead of doubling it, the userspace already makes an assumption,
>>>> so that ship has sailed.
>>>
>>> No, this is about the deinterlacer mem2mem device, which doesn't exist
>>> before this series.
>>
>> I am not convinced it is OK if the direct VDI path and mem2mem VDI
>> behave differently, that would be surprising to me as a user ?
> 
> Is this still about the frame rate doubling? Surely supporting it in
> the mem2mem device and not in the capture path is ok. I'm not arguing
> that frame doubling should be enabled by default.

My understanding was that your concern was -- frame doubling should be 
the default because it not being the default would break userspace . 
Maybe that's not the case ?

>>> The CSI capture path already has configurable framedrops (in the CSI).
>>
>> What am I looking for ? git grep doesn't give me any hits ? (**)
> 
> That's configured by the set_frame_interval pad op of the CSI subdevice
> - on the IDMAC output pad. See csi_find_best_skip().
> 
>>>> But I think we can make the frame doubling configurable ?
>>>
>>> That would be good. Specifically, there must be no guarantee that one
>>> input frame with two fields only produces one deinterlaced output
>>> frame, and userspace should somehow be able to understand this.
>>
>> See my question (**) , where is this configurable framedrops thing ?
> 
> This would have to be done differently, though. Here we don't have
> subdev set_frame_interval configuration, and while VIDIOC_S_PARM /
> v4l2_captureparm were used to configure frame dropping on capture
> devices, that's not really applicable to mem2mem deinterlacers.

V4L2_CID_DEINTERLACING_MODE should probably work.

>>> I'd rather not default to the setting that throws away half of the
>>> input data. Not using frame doubling by default is sensible, but now
>>> that using all three input fields to calculate the output frame is
>>> possible, why not make that the default.
>>
>> To save memory bandwidth on the MX6, that's my main concern.
> 
> What userspace are you using to exercise this driver? Maybe we can back
> this concern with a few numbers (or mine with pictures).
Custom one, but with gstreamer 1.22 and 1.24 driving the media pipeline.
Marek Vasut Oct. 3, 2024, 5:13 p.m. UTC | #18
On 9/27/24 9:33 PM, Nicolas Dufresne wrote:
> Le mercredi 25 septembre 2024 à 22:45 +0200, Marek Vasut a écrit :
>> On 9/25/24 7:58 PM, Nicolas Dufresne wrote:
>>
>>
> 
> [...]
> 
>>
>>>> +static irqreturn_t ipu_mem2mem_vdic_nfb4eof_interrupt(int irq, void *dev_id)
>>>> +{
>>>> +	struct ipu_mem2mem_vdic_priv *priv = dev_id;
>>>> +
>>>> +	/* That is about all we can do about it, report it. */
>>>> +	dev_warn_ratelimited(priv->dev, "NFB4EOF error interrupt occurred\n");
>>>
>>> Not sure this is right. If that means ipu_mem2mem_vdic_eof_interrupt won't fire,
>>> then it means streamoff/close after that will hang forever, leaving a zombie
>>> process behind.
>>>
>>> Perhaps mark the buffers as ERROR, and finish the job.
>>
>> The NFB4EOF interrupt is generated when the VDIC didn't write (all of)
>> output frame . I think it stands for "New Frame Before EOF" or some
>> such. Basically the currently written frame will be corrupted and the
>> next frame(s) are likely going to be OK again.
> 
> So the other IRQ will be triggered ? After this one ? Is so, perhaps take a
> moment to mark the frames as ERROR (which means corrupted).

OK, fixed in V3.

> [...]
> 
>>>
>>> The driver is not taking ownership of prev_buf, only curr_buf is guaranteed to
>>> exist until v4l2_m2m_job_finish() is called. Usespace could streamoff, allocate
>>> new buffers, and then an old freed buffer may endup being used.
>>
>> So, what should I do about this ? Is there some way to ref the buffer to
>> keep it around ?
>>
>>> Its also unclear to me how userspace can avoid this ugly warning, how can you
>>> have curr_buf set the first time ? (I might be missing something you this one
>>> though).
>>
>> The warning happens when streaming starts and there is only one input
>> frame available for the VDIC, which needs three fields to work
>> correctly. So, if there in only one input frame, VDI uses the input
>> frame bottom field as PREV field for the prediction, and input frame top
>> and bottom fields as CURR and NEXT fields for the prediction, the result
>> may be one sub-optimal deinterlaced output frame (the first one). Once
>> another input frame gets enqueued, the VDIC uses the previous frame
>> bottom field as PREV and the newly enqueued frame top and bottom fields
>> as CURR and NEXT and the prediction works correctly from that point on.
> 
> Warnings by default are not acceptable.

This is a workaround so that older gstreamer versions would work, what 
else can I do here ?

>>> Perhaps what you want is a custom job_ready() callback, that ensure you have 2
>>> buffers in the OUTPUT queue ? You also need to ajust the CID
>>> MIN_BUFFERS_FOR_OUTPUT accordingly.
>>
>> I had that before, but gstreamer didn't enqueue the two frames for me,
>> so I got back to this variant for maximum compatibility.
> 
> Its well known that GStreamer v4l2convert element have no support for
> detinterlacing and need to be improved to support any deinterlace drivers out
> there.

It seems v4l2convert disable-passthrough=true works with deinterlacers 
just fine , except for this one reused frame at stream start ?

> Other drivers will simply holds on output buffers until it has enough to produce
> the first valid picture. Holding meaning not marking them done, which keeps then
> in the ACTIVE state, which is being tracked by the core for your.

As far as I understand this, when the EOF interrupt happens, 
v4l2_m2m_src_buf_remove() pulls the oldest input buffer from the queue 
and that buffer is then marked as DONE (or ERROR in v3), that is the 
->prev buffer, isn't it ?

Once the next frame deinterlacing starts, the (new) current frame and 
the prev frame are both active, the deinterlacing happens and then in 
the EOF interrupt, the ->prev frame gets marked as DONE again.

What am I missing here ?

> [...]
> 
>>>> +
>>>> +	if (ipu_mem2mem_vdic_format_is_yuv420(f->fmt.pix.pixelformat))
>>>> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 3 / 2;
>>>> +	else if (ipu_mem2mem_vdic_format_is_yuv422(f->fmt.pix.pixelformat))
>>>> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 2;
>>>> +	else if (ipu_mem2mem_vdic_format_is_rgb16(f->fmt.pix.pixelformat))
>>>> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 2;
>>>> +	else if (ipu_mem2mem_vdic_format_is_rgb24(f->fmt.pix.pixelformat))
>>>> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 3;
>>>> +	else if (ipu_mem2mem_vdic_format_is_rgb32(f->fmt.pix.pixelformat))
>>>> +		f->fmt.pix.bytesperline = f->fmt.pix.width * 4;
>>>> +	else
>>>> +		f->fmt.pix.bytesperline = f->fmt.pix.width;
>>>> +
>>>> +	f->fmt.pix.sizeimage = f->fmt.pix.height * f->fmt.pix.bytesperline;
>>>
>>> And use v4l2-common ?
>>
>> I don't really understand, there is nothing in v4l2-common.c that would
>> be really useful replacement for this ?
> 
> Not sure I get your response, v4l2-common is used in many drivers already, and
> we intent to keep improving it so that all driver uses it in the long term. It
> been created because folks believed they can calculate bytesperline and
> sizeimage, but as the number of format grows, it always endup wrong, causing the
> HW to overflow and break the system at a larger scale.

Do you want me to introduce some new generic helper ? Because I don't 
see an existing generic one.

>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int ipu_mem2mem_vdic_s_fmt(struct file *file, void *fh, struct v4l2_format *f)
>>>> +{
>>>> +	struct ipu_mem2mem_vdic_ctx *ctx = fh_to_ctx(fh);
>>>> +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
>>>> +	struct v4l2_pix_format *fmt, *infmt, *outfmt;
>>>> +	struct vb2_queue *vq;
>>>> +	int ret;
>>>> +
>>>> +	vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
>>>> +	if (vb2_is_busy(vq)) {
>>>> +		dev_err(priv->dev, "%s queue busy\n",  __func__);
>>>> +		return -EBUSY;
>>>> +	}
>>>> +
>>>> +	ret = ipu_mem2mem_vdic_try_fmt(file, fh, f);
>>>> +	if (ret < 0)
>>>> +		return ret;
>>>> +
>>>> +	fmt = ipu_mem2mem_vdic_get_format(priv, f->type);
>>>> +	*fmt = f->fmt.pix;
>>>> +
>>>> +	/* Propagate colorimetry to the capture queue */
>>>> +	infmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_OUTPUT);
>>>> +	outfmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_CAPTURE);
>>>> +	outfmt->colorspace = infmt->colorspace;
>>>> +	outfmt->ycbcr_enc = infmt->ycbcr_enc;
>>>> +	outfmt->xfer_func = infmt->xfer_func;
>>>> +	outfmt->quantization = infmt->quantization;
>>>
>>> So you can do CSC conversion but not colorimetry ? We have
>>> V4L2_PIX_FMT_FLAG_SET_CSC if you can do colorimetry transforms too. I have
>>> patches that I'll send for the csc-scaler driver.
>>
>> See ipu_ic_calc_csc() , that's what does the colorspace conversion in
>> this driver (on output from VDI).
> 
> int ipu_ic_calc_csc(struct ipu_ic_csc *csc,
>                      enum v4l2_ycbcr_encoding in_enc,
>                      enum v4l2_quantization in_quant,
>                      enum ipu_color_space in_cs,
>                      enum v4l2_ycbcr_encoding out_enc,
>                      enum v4l2_quantization out_quant,
>                      enum ipu_color_space out_cs)
> 
> So instead of simply overriding CSC like you do, let userspace set different CSC
> in and out, so that IPU can handle the conversion properly with correct colors.
> That requires to flag these in the fmt_desc structure during enum format, and to
> only read acknowledge the CSC if userspace have set V4L2_PIX_FMT_FLAG_SET_CSC,
> in other condition, the information must be ignored (which you don't).
The input is from the VDI and that always has to be YUV. Can you maybe 
just CC me on the CSC-scaler patches ? Then I'll see what can be done here.

Thanks
Philipp Zabel Oct. 8, 2024, 2:23 p.m. UTC | #19
On Do, 2024-10-03 at 16:57 +0200, Marek Vasut wrote:
> On 9/26/24 1:16 PM, Philipp Zabel wrote:
> > On Mi, 2024-09-25 at 22:45 +0200, Marek Vasut wrote:
> > [...]
> > > > The driver is not taking ownership of prev_buf, only curr_buf is guaranteed to
> > > > exist until v4l2_m2m_job_finish() is called. Usespace could streamoff, allocate
> > > > new buffers, and then an old freed buffer may endup being used.
> > > 
> > > So, what should I do about this ? Is there some way to ref the buffer to
> > > keep it around ?
> > 
> > Have a look how other deinterlacers with temporal filtering do it.
> > sunxi/sun8i-di or ti/vpe look like candidates.
> I don't see exactly what those drivers are doing differently to protect 
> the prev buffer during deinterlacing . Can you be more specific ?

In the EOF interrupt you are calling v4l2_m2m_buf_done() on src_buf,
which should be the same as ctx->curr_buf in the previous device_run.
Instead, you could release ctx->prev_buf and then store src_buf into
ctx->prev_buf. Storing curr_buf on the ctx doesn't seem to be necessary
at all. The mentioned deinterlacer drivers do something similar [1][2].

[1] https://elixir.bootlin.com/linux/master/source/drivers/media/platform/sunxi/sun8i-di/sun8i-di.c#L236
[2] https://elixir.bootlin.com/linux/master/source/drivers/media/platform/ti/vpe/vpe.c#L1481

regards
Philipp
Nicolas Dufresne Oct. 15, 2024, 5:31 p.m. UTC | #20
Le lundi 29 juillet 2024 à 04:16 +0200, Marek Vasut a écrit :
> On 7/24/24 6:08 PM, Nicolas Dufresne wrote:
> > Hi Marek,
> 
> Hi,
> 
> > Le mercredi 24 juillet 2024 à 02:19 +0200, Marek Vasut a écrit :
> > > Introduce dedicated memory-to-memory IPUv3 VDI deinterlacer driver.
> > > Currently the IPUv3 can operate VDI in DIRECT mode, from sensor to
> > > memory. This only works for single stream, that is, one input from
> > > one camera is deinterlaced on the fly with a helper buffer in DRAM
> > > and the result is written into memory.
> > > 
> > > The i.MX6Q/QP does support up to four analog cameras via two IPUv3
> > > instances, each containing one VDI deinterlacer block. In order to
> > > deinterlace all four streams from all four analog cameras live, it
> > > is necessary to operate VDI in INDIRECT mode, where the interlaced
> > > streams are written to buffers in memory, and then deinterlaced in
> > > memory using VDI in INDIRECT memory-to-memory mode.
> > 
> > Just a quick design question. Is it possible to chain the deinterlacer and the
> > csc-scaler ?
> 
> I think you could do that.
> 
> > If so, it would be much more efficient if all this could be
> > combined into the existing m2m driver, since you could save a memory rountrip
> > when needing to deinterlace, change the colorspace and possibly scale too.
> 
> The existing PRP/IC driver is similar to what this driver does, yes, but 
> it uses a different DMA path , I believe it is IDMAC->PRP->IC->IDMAC . 
> This driver uses IDMAC->VDI->IC->IDMAC . I am not convinced mixing the 
> two paths into a single driver would be beneficial, but I am reasonably 
> sure it would be very convoluted. Instead, this driver could be extended 
> to do deinterlacing and scaling using the IC if that was needed. I think 
> that would be the cleaner approach.

No strong opinion, in an ideal world all these hacks are removed and we do a
single multi-context / m2m media controller, that let user pick the path they
need for their task. When I look at the hardware documentation, you can do
inline from VDI to IC, and having IC in both drivers duplicates the CSC
handling. If you allow bypassing the VDI, then you have a duplicated driver and
highly confused users. The fact the ipuv3 (internal) drm driver does not have
the VDI already seems because the display controller driver is missing
interlaced video support, but I could be wrong. Same if you want to support IRT
(even though that is not inline, but using a custom memory protocol).

Nicolas
Nicolas Dufresne Oct. 15, 2024, 5:46 p.m. UTC | #21
Le mardi 24 septembre 2024 à 17:28 +0200, Marek Vasut a écrit :
> On 9/6/24 11:01 AM, Philipp Zabel wrote:
> 
> Hi,
> 
> > > diff --git a/drivers/staging/media/imx/imx-media-dev.c b/drivers/staging/media/imx/imx-media-dev.c
> > > index be54dca11465d..a841fdb4c2394 100644
> > > --- a/drivers/staging/media/imx/imx-media-dev.c
> > > +++ b/drivers/staging/media/imx/imx-media-dev.c
> > > @@ -57,7 +57,52 @@ static int imx6_media_probe_complete(struct v4l2_async_notifier *notifier)
> > >   		goto unlock;
> > >   	}
> > >   
> > > +	imxmd->m2m_vdic[0] = imx_media_mem2mem_vdic_init(imxmd, 0);
> > > +	if (IS_ERR(imxmd->m2m_vdic[0])) {
> > > +		ret = PTR_ERR(imxmd->m2m_vdic[0]);
> > > +		imxmd->m2m_vdic[0] = NULL;
> > > +		goto unlock;
> > > +	}
> > > +
> > > +	/* MX6S/DL has one IPUv3, init second VDI only on MX6Q/QP */
> > > +	if (imxmd->ipu[1]) {
> > > +		imxmd->m2m_vdic[1] = imx_media_mem2mem_vdic_init(imxmd, 1);
> > > +		if (IS_ERR(imxmd->m2m_vdic[1])) {
> > > +			ret = PTR_ERR(imxmd->m2m_vdic[1]);
> > > +			imxmd->m2m_vdic[1] = NULL;
> > > +			goto uninit_vdi0;
> > > +		}
> > > +	}
> > 
> > Instead of presenting two devices to userspace, it would be better to
> > have a single video device that can distribute work to both IPUs.
> 
> Why do you think so ?
> 
> I think it is better to keep the kernel code as simple as possible, i.e. 
> provide the device node for each m2m device to userspace and handle the 
> m2m device hardware interaction in the kernel driver, but let userspace 
> take care of policy like job scheduling, access permissions assignment 
> to each device (e.g. if different user accounts should have access to 
> different VDICs), or other such topics.

We have run through this topic already for multi-core stateless CODECs. It is
preferable to schedule interchangeable cores inside the Linux kernel.
> 
> > To be fair, we never implemented that for the CSC/scaler mem2mem device
> > either.
> 
> I don't think that is actually a good idea. Instead, it would be better 
> to have two scaler nodes in userspace.

It is impossible for userspace to properly dispatch the work and ensure maximal
performance across multiple process. A long as there is no state that can reside
on the chip of course.

Nicolas

> 
> [...]
> 
> > > +++ b/drivers/staging/media/imx/imx-media-mem2mem-vdic.c
> > > @@ -0,0 +1,997 @@
> > > +// SPDX-License-Identifier: GPL-2.0-or-later
> > > +/*
> > > + * i.MX VDIC mem2mem de-interlace driver
> > > + *
> > > + * Copyright (c) 2024 Marek Vasut <marex@denx.de>
> > > + *
> > > + * Based on previous VDIC mem2mem work by Steve Longerbeam that is:
> > > + * Copyright (c) 2018 Mentor Graphics Inc.
> > > + */
> > > +
> > > +#include <linux/delay.h>
> > > +#include <linux/fs.h>
> > > +#include <linux/module.h>
> > > +#include <linux/sched.h>
> > > +#include <linux/slab.h>
> > > +#include <linux/version.h>
> > > +
> > > +#include <media/media-device.h>
> > > +#include <media/v4l2-ctrls.h>
> > > +#include <media/v4l2-device.h>
> > > +#include <media/v4l2-event.h>
> > > +#include <media/v4l2-ioctl.h>
> > > +#include <media/v4l2-mem2mem.h>
> > > +#include <media/videobuf2-dma-contig.h>
> > > +
> > > +#include "imx-media.h"
> > > +
> > > +#define fh_to_ctx(__fh)	container_of(__fh, struct ipu_mem2mem_vdic_ctx, fh)
> > > +
> > > +#define to_mem2mem_priv(v) container_of(v, struct ipu_mem2mem_vdic_priv, vdev)
> > 
> > These could be inline functions for added type safety.
> 
> Fixed in v3
> 
> [...]
> 
> > > +static void ipu_mem2mem_vdic_device_run(void *_ctx)
> > > +{
> > > +	struct ipu_mem2mem_vdic_ctx *ctx = _ctx;
> > > +	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
> > > +	struct vb2_v4l2_buffer *curr_buf, *dst_buf;
> > > +	dma_addr_t prev_phys, curr_phys, out_phys;
> > > +	struct v4l2_pix_format *infmt;
> > > +	u32 phys_offset = 0;
> > > +	unsigned long flags;
> > > +
> > > +	infmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_OUTPUT);
> > > +	if (V4L2_FIELD_IS_SEQUENTIAL(infmt->field))
> > > +		phys_offset = infmt->sizeimage / 2;
> > > +	else if (V4L2_FIELD_IS_INTERLACED(infmt->field))
> > > +		phys_offset = infmt->bytesperline;
> > > +	else
> > > +		dev_err(priv->dev, "Invalid field %d\n", infmt->field);
> > > +
> > > +	dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
> > > +	out_phys = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
> > > +
> > > +	curr_buf = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
> > > +	if (!curr_buf) {
> > > +		dev_err(priv->dev, "Not enough buffers\n");
> > > +		return;
> > > +	}
> > > +
> > > +	spin_lock_irqsave(&priv->irqlock, flags);
> > > +
> > > +	if (ctx->curr_buf) {
> > > +		ctx->prev_buf = ctx->curr_buf;
> > > +		ctx->curr_buf = curr_buf;
> > > +	} else {
> > > +		ctx->prev_buf = curr_buf;
> > > +		ctx->curr_buf = curr_buf;
> > > +		dev_warn(priv->dev, "Single-buffer mode, fix your userspace\n");
> > > +	}
> > > +
> > > +	prev_phys = vb2_dma_contig_plane_dma_addr(&ctx->prev_buf->vb2_buf, 0);
> > > +	curr_phys = vb2_dma_contig_plane_dma_addr(&ctx->curr_buf->vb2_buf, 0);
> > > +
> > > +	priv->curr_ctx = ctx;
> > > +	spin_unlock_irqrestore(&priv->irqlock, flags);
> > > +
> > > +	ipu_cpmem_set_buffer(priv->vdi_out_ch,  0, out_phys);
> > > +	ipu_cpmem_set_buffer(priv->vdi_in_ch_p, 0, prev_phys + phys_offset);
> > > +	ipu_cpmem_set_buffer(priv->vdi_in_ch,   0, curr_phys);
> > > +	ipu_cpmem_set_buffer(priv->vdi_in_ch_n, 0, curr_phys + phys_offset);
> > 
> > This always outputs at a frame rate of half the field rate, and only
> > top fields are ever used as current field, and bottom fields as
> > previous/next fields, right?
> 
> Yes, currently the driver extracts 1 frame from two consecutive incoming 
> fields (previous Bottom, and current Top and Bottom):
> 
> (frame 1 and 3 below is omitted)
> 
>      1  2  3  4
> ...|T |T |T |T |...
> ...| B| B| B| B|...
>       | ||  | ||
>       '-''  '-''
>        ||    ||
>        ||    \/
>        \/  Frame#4
>      Frame#2
> 
> As far as I understand it, this is how the current VDI implementation 
> behaves too, right ?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/media/imx/imx-media-vdic.c#n207
> 
> > I think it would be good to add a mode that doesn't drop the
> > 
> > 	ipu_cpmem_set_buffer(priv->vdi_in_ch_p, 0, prev_phys);
> > 	ipu_cpmem_set_buffer(priv->vdi_in_ch,   0, prev_phys + phys_offset);
> > 	ipu_cpmem_set_buffer(priv->vdi_in_ch_n, 0, curr_phys);
> > 
> > output frames, right from the start.
> 
> This would make the VDI act as a frame-rate doubler, which would spend a 
> lot more memory bandwidth, which is limited on MX6, so I would also like 
> to have a frame-drop mode (i.e. current behavior).
> 
> Can we make that behavior configurable ? Since this is a mem2mem device, 
> we do not really have any notion of input and output frame-rate, so I 
> suspect this would need some VIDIOC_* ioctl ?
> 
> > If we don't start with that supported, I fear userspace will make
> > assumptions and be surprised when a full rate mode is added later.
> 
> I'm afraid that since the current VDI already does retain input frame 
> rate instead of doubling it, the userspace already makes an assumption, 
> so that ship has sailed.
> 
> But I think we can make the frame doubling configurable ?
> 
> > > +	/* No double buffering, always pick buffer 0 */
> > > +	ipu_idmac_select_buffer(priv->vdi_out_ch, 0);
> > > +	ipu_idmac_select_buffer(priv->vdi_in_ch_p, 0);
> > > +	ipu_idmac_select_buffer(priv->vdi_in_ch, 0);
> > > +	ipu_idmac_select_buffer(priv->vdi_in_ch_n, 0);
> > > +
> > > +	/* Enable the channels */
> > > +	ipu_idmac_enable_channel(priv->vdi_out_ch);
> > > +	ipu_idmac_enable_channel(priv->vdi_in_ch_p);
> > > +	ipu_idmac_enable_channel(priv->vdi_in_ch);
> > > +	ipu_idmac_enable_channel(priv->vdi_in_ch_n);
> > > +}
> 
> [...]
> 
> > > +static int ipu_mem2mem_vdic_setup_hardware(struct ipu_mem2mem_vdic_priv *priv)
> > > +{
> > > +	struct v4l2_pix_format *infmt, *outfmt;
> > > +	struct ipu_ic_csc csc;
> > > +	bool in422, outyuv;
> > > +	int ret;
> > > +
> > > +	infmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_OUTPUT);
> > > +	outfmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_CAPTURE);
> > > +	in422 = ipu_mem2mem_vdic_format_is_yuv422(infmt->pixelformat);
> > > +	outyuv = ipu_mem2mem_vdic_format_is_yuv(outfmt->pixelformat);
> > > +
> > > +	ipu_vdi_setup(priv->vdi, in422, infmt->width, infmt->height);
> > > +	ipu_vdi_set_field_order(priv->vdi, V4L2_STD_UNKNOWN, infmt->field);
> > > +	ipu_vdi_set_motion(priv->vdi, HIGH_MOTION);
> > 
> > This maps to VDI_C_MOT_SEL_FULL aka VDI_MOT_SEL=2, which is documented
> > as "full motion, only vertical filter is used". Doesn't this completely
> > ignore the previous/next fields and only use the output of the di_vfilt
> > four tap vertical filter block to fill in missing lines from the
> > surrounding pixels (above and below) of the current field?
> 
> Is there a suitable knob for this or shall I introduce a device specific 
> one, like the vdic_ctrl_motion_menu for the current VDIC direct driver ?
> 
> If we introduce such a knob, then it is all the more reason to provide 
> one device node per one VDIC hardware instance, since each can be 
> configured for different motion settings.
> 
> > I think this should at least be configurable, and probably default to
> > MED_MOTION.
> 
> I think to be compatible with the current VDI behavior and to reduce 
> memory bandwidth usage, let's default to the HIGH/full mode. That one 
> produces reasonably good results without spending too much memory 
> bandwidth which is constrained already on the MX6, and if the user needs 
> better image quality, they can configure another mode using the V4L2 
> control.
> 
> [...]
>
Nicolas Dufresne Oct. 15, 2024, 6:13 p.m. UTC | #22
Le jeudi 03 octobre 2024 à 16:57 +0200, Marek Vasut a écrit :
> On 9/26/24 1:16 PM, Philipp Zabel wrote:
> > On Mi, 2024-09-25 at 22:45 +0200, Marek Vasut wrote:
> > [...]
> > > > The driver is not taking ownership of prev_buf, only curr_buf is guaranteed to
> > > > exist until v4l2_m2m_job_finish() is called. Usespace could streamoff, allocate
> > > > new buffers, and then an old freed buffer may endup being used.
> > > 
> > > So, what should I do about this ? Is there some way to ref the buffer to
> > > keep it around ?
> > 
> > Have a look how other deinterlacers with temporal filtering do it.
> > sunxi/sun8i-di or ti/vpe look like candidates.
> I don't see exactly what those drivers are doing differently to protect 
> the prev buffer during deinterlacing . Can you be more specific ?

drivers/media/platform/sunxi/sun8i-di/sun8i-di.c:

                src = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
                if (ctx->prev)
                        v4l2_m2m_buf_done(ctx->prev, state);
                ctx->prev = src;


What that does is that whenever a src buffer has been processed and needs to be
kept has prev, it is removed from the m2m pending queue
(v4l2_m2m_src_buf_remove()), but not marked done. At the VB2 level it means that
buffer will keep its ACTIVE/QUEUED state, meaning is currently under driver
ownership. I also expect the driver to start producing frame on the second
device run, but I didn't spend the extra time to check if that is the case for
sun8i-di driver.

As for GStreamer wrapper, since it does not support deinterlaced, it does not
always allocate this one extra buffer for prev. If the driver implement the
MIN_BUFFERS_FOR_OUTPUT CID though, it will allocate matching number of extras.
Though, this has a side effect at driver level, since start streaming will be
delayed until 2 buffers has been queued and any way you need to queue 2 buffers
before the driver will produces its first buffer.

This comes to the next reason why the wrapper will fail, since for each buffer
that is pushed, it synchronously wait for the output. So it systematically stall
on first frame. As the author of that wrapper, I'm well aware of that, but never
had a use case where I needed to fix it. I will be happy to accept support for
that, though in current mainline state, there is no generic way to actually
know. One way is to thread the transform, but then GstBasetransform class can't
be used, its a lot of work and adds complexity.

We can certainly fix gstv4l2transform.c behaviour with adding
MIN_BUFFERS_FOR_OUTPUT in upstream drivers. That would be easy to handle with
adding a matching buffering delay. These deinterlacers works for Kodi, since the
userspace code they have is not generic and have internal knowledge of the
hardware it is running on.

Nicolas
diff mbox series

Patch

diff --git a/drivers/staging/media/imx/Makefile b/drivers/staging/media/imx/Makefile
index 330e0825f506b..0cad87123b590 100644
--- a/drivers/staging/media/imx/Makefile
+++ b/drivers/staging/media/imx/Makefile
@@ -4,7 +4,7 @@  imx-media-common-objs := imx-media-capture.o imx-media-dev-common.o \
 
 imx6-media-objs := imx-media-dev.o imx-media-internal-sd.o \
 	imx-ic-common.o imx-ic-prp.o imx-ic-prpencvf.o imx-media-vdic.o \
-	imx-media-csc-scaler.o
+	imx-media-mem2mem-vdic.o imx-media-csc-scaler.o
 
 imx6-media-csi-objs := imx-media-csi.o imx-media-fim.o
 
diff --git a/drivers/staging/media/imx/imx-media-dev.c b/drivers/staging/media/imx/imx-media-dev.c
index be54dca11465d..a841fdb4c2394 100644
--- a/drivers/staging/media/imx/imx-media-dev.c
+++ b/drivers/staging/media/imx/imx-media-dev.c
@@ -57,7 +57,52 @@  static int imx6_media_probe_complete(struct v4l2_async_notifier *notifier)
 		goto unlock;
 	}
 
+	imxmd->m2m_vdic[0] = imx_media_mem2mem_vdic_init(imxmd, 0);
+	if (IS_ERR(imxmd->m2m_vdic[0])) {
+		ret = PTR_ERR(imxmd->m2m_vdic[0]);
+		imxmd->m2m_vdic[0] = NULL;
+		goto unlock;
+	}
+
+	/* MX6S/DL has one IPUv3, init second VDI only on MX6Q/QP */
+	if (imxmd->ipu[1]) {
+		imxmd->m2m_vdic[1] = imx_media_mem2mem_vdic_init(imxmd, 1);
+		if (IS_ERR(imxmd->m2m_vdic[1])) {
+			ret = PTR_ERR(imxmd->m2m_vdic[1]);
+			imxmd->m2m_vdic[1] = NULL;
+			goto uninit_vdi0;
+		}
+	}
+
 	ret = imx_media_csc_scaler_device_register(imxmd->m2m_vdev);
+	if (ret)
+		goto uninit_vdi1;
+
+	ret = imx_media_mem2mem_vdic_register(imxmd->m2m_vdic[0]);
+	if (ret)
+		goto unreg_csc;
+
+	/* MX6S/DL has one IPUv3, init second VDI only on MX6Q/QP */
+	if (imxmd->ipu[1]) {
+		ret = imx_media_mem2mem_vdic_register(imxmd->m2m_vdic[1]);
+		if (ret)
+			goto unreg_vdic;
+	}
+
+	mutex_unlock(&imxmd->mutex);
+	return ret;
+
+unreg_vdic:
+	imx_media_mem2mem_vdic_unregister(imxmd->m2m_vdic[0]);
+	imxmd->m2m_vdic[0] = NULL;
+unreg_csc:
+	imx_media_csc_scaler_device_unregister(imxmd->m2m_vdev);
+	imxmd->m2m_vdev = NULL;
+uninit_vdi1:
+	if (imxmd->ipu[1])
+		imx_media_mem2mem_vdic_uninit(imxmd->m2m_vdic[1]);
+uninit_vdi0:
+	imx_media_mem2mem_vdic_uninit(imxmd->m2m_vdic[0]);
 unlock:
 	mutex_unlock(&imxmd->mutex);
 	return ret;
@@ -108,6 +153,16 @@  static void imx_media_remove(struct platform_device *pdev)
 
 	v4l2_info(&imxmd->v4l2_dev, "Removing imx-media\n");
 
+	if (imxmd->m2m_vdic[1]) {	/* MX6Q/QP only */
+		imx_media_mem2mem_vdic_unregister(imxmd->m2m_vdic[1]);
+		imxmd->m2m_vdic[1] = NULL;
+	}
+
+	if (imxmd->m2m_vdic[0]) {
+		imx_media_mem2mem_vdic_unregister(imxmd->m2m_vdic[0]);
+		imxmd->m2m_vdic[0] = NULL;
+	}
+
 	if (imxmd->m2m_vdev) {
 		imx_media_csc_scaler_device_unregister(imxmd->m2m_vdev);
 		imxmd->m2m_vdev = NULL;
diff --git a/drivers/staging/media/imx/imx-media-mem2mem-vdic.c b/drivers/staging/media/imx/imx-media-mem2mem-vdic.c
new file mode 100644
index 0000000000000..71c6c023d2bf8
--- /dev/null
+++ b/drivers/staging/media/imx/imx-media-mem2mem-vdic.c
@@ -0,0 +1,997 @@ 
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * i.MX VDIC mem2mem de-interlace driver
+ *
+ * Copyright (c) 2024 Marek Vasut <marex@denx.de>
+ *
+ * Based on previous VDIC mem2mem work by Steve Longerbeam that is:
+ * Copyright (c) 2018 Mentor Graphics Inc.
+ */
+
+#include <linux/delay.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/version.h>
+
+#include <media/media-device.h>
+#include <media/v4l2-ctrls.h>
+#include <media/v4l2-device.h>
+#include <media/v4l2-event.h>
+#include <media/v4l2-ioctl.h>
+#include <media/v4l2-mem2mem.h>
+#include <media/videobuf2-dma-contig.h>
+
+#include "imx-media.h"
+
+#define fh_to_ctx(__fh)	container_of(__fh, struct ipu_mem2mem_vdic_ctx, fh)
+
+#define to_mem2mem_priv(v) container_of(v, struct ipu_mem2mem_vdic_priv, vdev)
+
+enum {
+	V4L2_M2M_SRC = 0,
+	V4L2_M2M_DST = 1,
+};
+
+struct ipu_mem2mem_vdic_ctx;
+
+struct ipu_mem2mem_vdic_priv {
+	struct imx_media_video_dev	vdev;
+	struct imx_media_dev		*md;
+	struct device			*dev;
+	struct ipu_soc			*ipu_dev;
+	int				ipu_id;
+
+	struct v4l2_m2m_dev		*m2m_dev;
+	struct mutex			mutex;		/* mem2mem device mutex */
+
+	/* VDI resources */
+	struct ipu_vdi			*vdi;
+	struct ipu_ic			*ic;
+	struct ipuv3_channel		*vdi_in_ch_p;
+	struct ipuv3_channel		*vdi_in_ch;
+	struct ipuv3_channel		*vdi_in_ch_n;
+	struct ipuv3_channel		*vdi_out_ch;
+	int				eof_irq;
+	int				nfb4eof_irq;
+	spinlock_t			irqlock;	/* protect eof_irq handler */
+
+	atomic_t			stream_count;
+
+	struct ipu_mem2mem_vdic_ctx	*curr_ctx;
+
+	struct v4l2_pix_format		fmt[2];
+};
+
+struct ipu_mem2mem_vdic_ctx {
+	struct ipu_mem2mem_vdic_priv	*priv;
+	struct v4l2_fh			fh;
+	unsigned int			sequence;
+	struct vb2_v4l2_buffer		*prev_buf;
+	struct vb2_v4l2_buffer		*curr_buf;
+};
+
+static struct v4l2_pix_format *
+ipu_mem2mem_vdic_get_format(struct ipu_mem2mem_vdic_priv *priv,
+			    enum v4l2_buf_type type)
+{
+	return &priv->fmt[V4L2_TYPE_IS_OUTPUT(type) ? V4L2_M2M_SRC : V4L2_M2M_DST];
+}
+
+static bool ipu_mem2mem_vdic_format_is_yuv420(const u32 pixelformat)
+{
+	/* All 4:2:0 subsampled formats supported by this hardware */
+	return pixelformat == V4L2_PIX_FMT_YUV420 ||
+	       pixelformat == V4L2_PIX_FMT_YVU420 ||
+	       pixelformat == V4L2_PIX_FMT_NV12;
+}
+
+static bool ipu_mem2mem_vdic_format_is_yuv422(const u32 pixelformat)
+{
+	/* All 4:2:2 subsampled formats supported by this hardware */
+	return pixelformat == V4L2_PIX_FMT_UYVY ||
+	       pixelformat == V4L2_PIX_FMT_YUYV ||
+	       pixelformat == V4L2_PIX_FMT_YUV422P ||
+	       pixelformat == V4L2_PIX_FMT_NV16;
+}
+
+static bool ipu_mem2mem_vdic_format_is_yuv(const u32 pixelformat)
+{
+	return ipu_mem2mem_vdic_format_is_yuv420(pixelformat) ||
+	       ipu_mem2mem_vdic_format_is_yuv422(pixelformat);
+}
+
+static bool ipu_mem2mem_vdic_format_is_rgb16(const u32 pixelformat)
+{
+	/* All 16-bit RGB formats supported by this hardware */
+	return pixelformat == V4L2_PIX_FMT_RGB565;
+}
+
+static bool ipu_mem2mem_vdic_format_is_rgb24(const u32 pixelformat)
+{
+	/* All 24-bit RGB formats supported by this hardware */
+	return pixelformat == V4L2_PIX_FMT_RGB24 ||
+	       pixelformat == V4L2_PIX_FMT_BGR24;
+}
+
+static bool ipu_mem2mem_vdic_format_is_rgb32(const u32 pixelformat)
+{
+	/* All 32-bit RGB formats supported by this hardware */
+	return pixelformat == V4L2_PIX_FMT_XRGB32 ||
+	       pixelformat == V4L2_PIX_FMT_XBGR32 ||
+	       pixelformat == V4L2_PIX_FMT_BGRX32 ||
+	       pixelformat == V4L2_PIX_FMT_RGBX32;
+}
+
+/*
+ * mem2mem callbacks
+ */
+static irqreturn_t ipu_mem2mem_vdic_eof_interrupt(int irq, void *dev_id)
+{
+	struct ipu_mem2mem_vdic_priv *priv = dev_id;
+	struct ipu_mem2mem_vdic_ctx *ctx = priv->curr_ctx;
+	struct vb2_v4l2_buffer *src_buf, *dst_buf;
+
+	spin_lock(&priv->irqlock);
+
+	src_buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
+	dst_buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
+
+	v4l2_m2m_buf_copy_metadata(src_buf, dst_buf, true);
+
+	src_buf->sequence = ctx->sequence++;
+	dst_buf->sequence = src_buf->sequence;
+
+	v4l2_m2m_buf_done(src_buf, VB2_BUF_STATE_DONE);
+	v4l2_m2m_buf_done(dst_buf, VB2_BUF_STATE_DONE);
+
+	v4l2_m2m_job_finish(priv->m2m_dev, ctx->fh.m2m_ctx);
+
+	spin_unlock(&priv->irqlock);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t ipu_mem2mem_vdic_nfb4eof_interrupt(int irq, void *dev_id)
+{
+	struct ipu_mem2mem_vdic_priv *priv = dev_id;
+
+	/* That is about all we can do about it, report it. */
+	dev_warn_ratelimited(priv->dev, "NFB4EOF error interrupt occurred\n");
+
+	return IRQ_HANDLED;
+}
+
+static void ipu_mem2mem_vdic_device_run(void *_ctx)
+{
+	struct ipu_mem2mem_vdic_ctx *ctx = _ctx;
+	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
+	struct vb2_v4l2_buffer *curr_buf, *dst_buf;
+	dma_addr_t prev_phys, curr_phys, out_phys;
+	struct v4l2_pix_format *infmt;
+	u32 phys_offset = 0;
+	unsigned long flags;
+
+	infmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_OUTPUT);
+	if (V4L2_FIELD_IS_SEQUENTIAL(infmt->field))
+		phys_offset = infmt->sizeimage / 2;
+	else if (V4L2_FIELD_IS_INTERLACED(infmt->field))
+		phys_offset = infmt->bytesperline;
+	else
+		dev_err(priv->dev, "Invalid field %d\n", infmt->field);
+
+	dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
+	out_phys = vb2_dma_contig_plane_dma_addr(&dst_buf->vb2_buf, 0);
+
+	curr_buf = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
+	if (!curr_buf) {
+		dev_err(priv->dev, "Not enough buffers\n");
+		return;
+	}
+
+	spin_lock_irqsave(&priv->irqlock, flags);
+
+	if (ctx->curr_buf) {
+		ctx->prev_buf = ctx->curr_buf;
+		ctx->curr_buf = curr_buf;
+	} else {
+		ctx->prev_buf = curr_buf;
+		ctx->curr_buf = curr_buf;
+		dev_warn(priv->dev, "Single-buffer mode, fix your userspace\n");
+	}
+
+	prev_phys = vb2_dma_contig_plane_dma_addr(&ctx->prev_buf->vb2_buf, 0);
+	curr_phys = vb2_dma_contig_plane_dma_addr(&ctx->curr_buf->vb2_buf, 0);
+
+	priv->curr_ctx = ctx;
+	spin_unlock_irqrestore(&priv->irqlock, flags);
+
+	ipu_cpmem_set_buffer(priv->vdi_out_ch,  0, out_phys);
+	ipu_cpmem_set_buffer(priv->vdi_in_ch_p, 0, prev_phys + phys_offset);
+	ipu_cpmem_set_buffer(priv->vdi_in_ch,   0, curr_phys);
+	ipu_cpmem_set_buffer(priv->vdi_in_ch_n, 0, curr_phys + phys_offset);
+
+	/* No double buffering, always pick buffer 0 */
+	ipu_idmac_select_buffer(priv->vdi_out_ch, 0);
+	ipu_idmac_select_buffer(priv->vdi_in_ch_p, 0);
+	ipu_idmac_select_buffer(priv->vdi_in_ch, 0);
+	ipu_idmac_select_buffer(priv->vdi_in_ch_n, 0);
+
+	/* Enable the channels */
+	ipu_idmac_enable_channel(priv->vdi_out_ch);
+	ipu_idmac_enable_channel(priv->vdi_in_ch_p);
+	ipu_idmac_enable_channel(priv->vdi_in_ch);
+	ipu_idmac_enable_channel(priv->vdi_in_ch_n);
+}
+
+/*
+ * Video ioctls
+ */
+static int ipu_mem2mem_vdic_querycap(struct file *file, void *priv,
+				     struct v4l2_capability *cap)
+{
+	strscpy(cap->driver, "imx-m2m-vdic", sizeof(cap->driver));
+	strscpy(cap->card, "imx-m2m-vdic", sizeof(cap->card));
+	strscpy(cap->bus_info, "platform:imx-m2m-vdic", sizeof(cap->bus_info));
+	cap->device_caps = V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING;
+	cap->capabilities = cap->device_caps | V4L2_CAP_DEVICE_CAPS;
+
+	return 0;
+}
+
+static int ipu_mem2mem_vdic_enum_fmt(struct file *file, void *fh, struct v4l2_fmtdesc *f)
+{
+	struct ipu_mem2mem_vdic_ctx *ctx = fh_to_ctx(fh);
+	struct vb2_queue *vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
+	enum imx_pixfmt_sel cs = vq->type == V4L2_BUF_TYPE_VIDEO_CAPTURE ?
+				 PIXFMT_SEL_YUV_RGB : PIXFMT_SEL_YUV;
+	u32 fourcc;
+	int ret;
+
+	ret = imx_media_enum_pixel_formats(&fourcc, f->index, cs, 0);
+	if (ret)
+		return ret;
+
+	f->pixelformat = fourcc;
+
+	return 0;
+}
+
+static int ipu_mem2mem_vdic_g_fmt(struct file *file, void *fh, struct v4l2_format *f)
+{
+	struct ipu_mem2mem_vdic_ctx *ctx = fh_to_ctx(fh);
+	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
+	struct v4l2_pix_format *fmt = ipu_mem2mem_vdic_get_format(priv, f->type);
+
+	f->fmt.pix = *fmt;
+
+	return 0;
+}
+
+static int ipu_mem2mem_vdic_try_fmt(struct file *file, void *fh,
+				    struct v4l2_format *f)
+{
+	const struct imx_media_pixfmt *cc;
+	enum imx_pixfmt_sel cs;
+	u32 fourcc;
+
+	if (f->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) {	/* Output */
+		cs = PIXFMT_SEL_YUV_RGB;	/* YUV direct / RGB via IC */
+
+		f->fmt.pix.field = V4L2_FIELD_NONE;
+	} else {
+		cs = PIXFMT_SEL_YUV;		/* YUV input only */
+
+		/*
+		 * Input must be interlaced with frame order.
+		 * Fall back to SEQ_TB otherwise.
+		 */
+		if (!V4L2_FIELD_HAS_BOTH(f->fmt.pix.field) ||
+		    f->fmt.pix.field == V4L2_FIELD_INTERLACED)
+			f->fmt.pix.field = V4L2_FIELD_SEQ_TB;
+	}
+
+	fourcc = f->fmt.pix.pixelformat;
+	cc = imx_media_find_pixel_format(fourcc, cs);
+	if (!cc) {
+		imx_media_enum_pixel_formats(&fourcc, 0, cs, 0);
+		cc = imx_media_find_pixel_format(fourcc, cs);
+	}
+
+	f->fmt.pix.pixelformat = cc->fourcc;
+
+	v4l_bound_align_image(&f->fmt.pix.width,
+			      1, 968, 1,
+			      &f->fmt.pix.height,
+			      1, 1024, 1, 1);
+
+	if (ipu_mem2mem_vdic_format_is_yuv420(f->fmt.pix.pixelformat))
+		f->fmt.pix.bytesperline = f->fmt.pix.width * 3 / 2;
+	else if (ipu_mem2mem_vdic_format_is_yuv422(f->fmt.pix.pixelformat))
+		f->fmt.pix.bytesperline = f->fmt.pix.width * 2;
+	else if (ipu_mem2mem_vdic_format_is_rgb16(f->fmt.pix.pixelformat))
+		f->fmt.pix.bytesperline = f->fmt.pix.width * 2;
+	else if (ipu_mem2mem_vdic_format_is_rgb24(f->fmt.pix.pixelformat))
+		f->fmt.pix.bytesperline = f->fmt.pix.width * 3;
+	else if (ipu_mem2mem_vdic_format_is_rgb32(f->fmt.pix.pixelformat))
+		f->fmt.pix.bytesperline = f->fmt.pix.width * 4;
+	else
+		f->fmt.pix.bytesperline = f->fmt.pix.width;
+
+	f->fmt.pix.sizeimage = f->fmt.pix.height * f->fmt.pix.bytesperline;
+
+	return 0;
+}
+
+static int ipu_mem2mem_vdic_s_fmt(struct file *file, void *fh, struct v4l2_format *f)
+{
+	struct ipu_mem2mem_vdic_ctx *ctx = fh_to_ctx(fh);
+	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
+	struct v4l2_pix_format *fmt, *infmt, *outfmt;
+	struct vb2_queue *vq;
+	int ret;
+
+	vq = v4l2_m2m_get_vq(ctx->fh.m2m_ctx, f->type);
+	if (vb2_is_busy(vq)) {
+		dev_err(priv->dev, "%s queue busy\n",  __func__);
+		return -EBUSY;
+	}
+
+	ret = ipu_mem2mem_vdic_try_fmt(file, fh, f);
+	if (ret < 0)
+		return ret;
+
+	fmt = ipu_mem2mem_vdic_get_format(priv, f->type);
+	*fmt = f->fmt.pix;
+
+	/* Propagate colorimetry to the capture queue */
+	infmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_OUTPUT);
+	outfmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_CAPTURE);
+	outfmt->colorspace = infmt->colorspace;
+	outfmt->ycbcr_enc = infmt->ycbcr_enc;
+	outfmt->xfer_func = infmt->xfer_func;
+	outfmt->quantization = infmt->quantization;
+
+	return 0;
+}
+
+static const struct v4l2_ioctl_ops mem2mem_ioctl_ops = {
+	.vidioc_querycap		= ipu_mem2mem_vdic_querycap,
+
+	.vidioc_enum_fmt_vid_cap	= ipu_mem2mem_vdic_enum_fmt,
+	.vidioc_g_fmt_vid_cap		= ipu_mem2mem_vdic_g_fmt,
+	.vidioc_try_fmt_vid_cap		= ipu_mem2mem_vdic_try_fmt,
+	.vidioc_s_fmt_vid_cap		= ipu_mem2mem_vdic_s_fmt,
+
+	.vidioc_enum_fmt_vid_out	= ipu_mem2mem_vdic_enum_fmt,
+	.vidioc_g_fmt_vid_out		= ipu_mem2mem_vdic_g_fmt,
+	.vidioc_try_fmt_vid_out		= ipu_mem2mem_vdic_try_fmt,
+	.vidioc_s_fmt_vid_out		= ipu_mem2mem_vdic_s_fmt,
+
+	.vidioc_reqbufs			= v4l2_m2m_ioctl_reqbufs,
+	.vidioc_querybuf		= v4l2_m2m_ioctl_querybuf,
+
+	.vidioc_qbuf			= v4l2_m2m_ioctl_qbuf,
+	.vidioc_expbuf			= v4l2_m2m_ioctl_expbuf,
+	.vidioc_dqbuf			= v4l2_m2m_ioctl_dqbuf,
+	.vidioc_create_bufs		= v4l2_m2m_ioctl_create_bufs,
+
+	.vidioc_streamon		= v4l2_m2m_ioctl_streamon,
+	.vidioc_streamoff		= v4l2_m2m_ioctl_streamoff,
+
+	.vidioc_subscribe_event		= v4l2_ctrl_subscribe_event,
+	.vidioc_unsubscribe_event	= v4l2_event_unsubscribe,
+};
+
+/*
+ * Queue operations
+ */
+static int ipu_mem2mem_vdic_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
+					unsigned int *nplanes, unsigned int sizes[],
+					struct device *alloc_devs[])
+{
+	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(vq);
+	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
+	struct v4l2_pix_format *fmt = ipu_mem2mem_vdic_get_format(priv, vq->type);
+	unsigned int count = *nbuffers;
+
+	if (*nplanes)
+		return sizes[0] < fmt->sizeimage ? -EINVAL : 0;
+
+	*nplanes = 1;
+	sizes[0] = fmt->sizeimage;
+
+	dev_dbg(ctx->priv->dev, "get %u buffer(s) of size %d each.\n",
+		count, fmt->sizeimage);
+
+	return 0;
+}
+
+static int ipu_mem2mem_vdic_buf_prepare(struct vb2_buffer *vb)
+{
+	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+	struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
+	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
+	struct vb2_queue *vq = vb->vb2_queue;
+	struct v4l2_pix_format *fmt;
+	unsigned long size;
+
+	dev_dbg(ctx->priv->dev, "type: %d\n", vb->vb2_queue->type);
+
+	if (V4L2_TYPE_IS_OUTPUT(vq->type)) {
+		if (vbuf->field == V4L2_FIELD_ANY)
+			vbuf->field = V4L2_FIELD_SEQ_TB;
+		if (!V4L2_FIELD_HAS_BOTH(vbuf->field)) {
+			dev_dbg(ctx->priv->dev, "%s: field isn't supported\n",
+				__func__);
+			return -EINVAL;
+		}
+	}
+
+	fmt = ipu_mem2mem_vdic_get_format(priv, vb->vb2_queue->type);
+	size = fmt->sizeimage;
+
+	if (vb2_plane_size(vb, 0) < size) {
+		dev_dbg(ctx->priv->dev,
+			"%s: data will not fit into plane (%lu < %lu)\n",
+			__func__, vb2_plane_size(vb, 0), size);
+		return -EINVAL;
+	}
+
+	vb2_set_plane_payload(vb, 0, fmt->sizeimage);
+
+	return 0;
+}
+
+static void ipu_mem2mem_vdic_buf_queue(struct vb2_buffer *vb)
+{
+	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
+
+	v4l2_m2m_buf_queue(ctx->fh.m2m_ctx, to_vb2_v4l2_buffer(vb));
+}
+
+/* VDIC hardware setup */
+static int ipu_mem2mem_vdic_setup_channel(struct ipu_mem2mem_vdic_priv *priv,
+					  struct ipuv3_channel *channel,
+					  struct v4l2_pix_format *fmt,
+					  bool in)
+{
+	struct ipu_image image = { 0 };
+	unsigned int burst_size;
+	int ret;
+
+	image.pix = *fmt;
+	image.rect.width = image.pix.width;
+	image.rect.height = image.pix.height;
+
+	ipu_cpmem_zero(channel);
+
+	if (in) {
+		/* One field to VDIC channels */
+		image.pix.height /= 2;
+		image.rect.height /= 2;
+	} else {
+		/* Skip writing U and V components to odd rows */
+		if (ipu_mem2mem_vdic_format_is_yuv420(image.pix.pixelformat))
+			ipu_cpmem_skip_odd_chroma_rows(channel);
+	}
+
+	ret = ipu_cpmem_set_image(channel, &image);
+	if (ret)
+		return ret;
+
+	burst_size = (image.pix.width & 0xf) ? 8 : 16;
+	ipu_cpmem_set_burstsize(channel, burst_size);
+
+	if (!ipu_prg_present(priv->ipu_dev))
+		ipu_cpmem_set_axi_id(channel, 1);
+
+	ipu_idmac_set_double_buffer(channel, false);
+
+	return 0;
+}
+
+static int ipu_mem2mem_vdic_setup_hardware(struct ipu_mem2mem_vdic_priv *priv)
+{
+	struct v4l2_pix_format *infmt, *outfmt;
+	struct ipu_ic_csc csc;
+	bool in422, outyuv;
+	int ret;
+
+	infmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_OUTPUT);
+	outfmt = ipu_mem2mem_vdic_get_format(priv, V4L2_BUF_TYPE_VIDEO_CAPTURE);
+	in422 = ipu_mem2mem_vdic_format_is_yuv422(infmt->pixelformat);
+	outyuv = ipu_mem2mem_vdic_format_is_yuv(outfmt->pixelformat);
+
+	ipu_vdi_setup(priv->vdi, in422, infmt->width, infmt->height);
+	ipu_vdi_set_field_order(priv->vdi, V4L2_STD_UNKNOWN, infmt->field);
+	ipu_vdi_set_motion(priv->vdi, HIGH_MOTION);
+
+	/* Initialize the VDI IDMAC channels */
+	ret = ipu_mem2mem_vdic_setup_channel(priv, priv->vdi_in_ch_p, infmt, true);
+	if (ret)
+		return ret;
+
+	ret = ipu_mem2mem_vdic_setup_channel(priv, priv->vdi_in_ch, infmt, true);
+	if (ret)
+		return ret;
+
+	ret = ipu_mem2mem_vdic_setup_channel(priv, priv->vdi_in_ch_n, infmt, true);
+	if (ret)
+		return ret;
+
+	ret = ipu_mem2mem_vdic_setup_channel(priv, priv->vdi_out_ch, outfmt, false);
+	if (ret)
+		return ret;
+
+	ret = ipu_ic_calc_csc(&csc,
+			      infmt->ycbcr_enc, infmt->quantization,
+			      IPUV3_COLORSPACE_YUV,
+			      outfmt->ycbcr_enc, outfmt->quantization,
+			      outyuv ? IPUV3_COLORSPACE_YUV :
+				       IPUV3_COLORSPACE_RGB);
+	if (ret)
+		return ret;
+
+	/* Enable the IC */
+	ipu_ic_task_init(priv->ic, &csc,
+			 infmt->width, infmt->height,
+			 outfmt->width, outfmt->height);
+	ipu_ic_task_idma_init(priv->ic, priv->vdi_out_ch,
+			      infmt->width, infmt->height, 16, 0);
+	ipu_ic_enable(priv->ic);
+	ipu_ic_task_enable(priv->ic);
+
+	/* Enable the VDI */
+	ipu_vdi_enable(priv->vdi);
+
+	return 0;
+}
+
+static struct vb2_queue *ipu_mem2mem_vdic_get_other_q(struct vb2_queue *q)
+{
+	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(q);
+	enum v4l2_buf_type type = q->type == V4L2_BUF_TYPE_VIDEO_CAPTURE ?
+				  V4L2_BUF_TYPE_VIDEO_OUTPUT :
+				  V4L2_BUF_TYPE_VIDEO_CAPTURE;
+
+	return v4l2_m2m_get_vq(ctx->fh.m2m_ctx, type);
+}
+
+static void ipu_mem2mem_vdic_return_bufs(struct vb2_queue *q)
+{
+	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(q);
+	struct vb2_v4l2_buffer *buf;
+
+	if (q->type == V4L2_BUF_TYPE_VIDEO_OUTPUT)
+		while ((buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx)))
+			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_QUEUED);
+	else
+		while ((buf = v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx)))
+			v4l2_m2m_buf_done(buf, VB2_BUF_STATE_QUEUED);
+}
+
+static int ipu_mem2mem_vdic_start_streaming(struct vb2_queue *q, unsigned int count)
+{
+	struct vb2_queue *other_q = ipu_mem2mem_vdic_get_other_q(q);
+	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(q);
+	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
+	int ret;
+
+	if (!vb2_is_streaming(other_q))
+		return 0;
+
+	/* Already streaming, do not reconfigure the VDI. */
+	if (atomic_inc_return(&priv->stream_count) != 1)
+		return 0;
+
+	/* Start streaming */
+	ret = ipu_mem2mem_vdic_setup_hardware(priv);
+	if (ret)
+		ipu_mem2mem_vdic_return_bufs(q);
+
+	return ret;
+}
+
+static void ipu_mem2mem_vdic_stop_streaming(struct vb2_queue *q)
+{
+	struct vb2_queue *other_q = ipu_mem2mem_vdic_get_other_q(q);
+	struct ipu_mem2mem_vdic_ctx *ctx = vb2_get_drv_priv(q);
+	struct ipu_mem2mem_vdic_priv *priv = ctx->priv;
+
+	if (vb2_is_streaming(other_q)) {
+		ipu_mem2mem_vdic_return_bufs(q);
+		return;
+	}
+
+	if (atomic_dec_return(&priv->stream_count) == 0) {
+		/* Stop streaming */
+		ipu_idmac_disable_channel(priv->vdi_in_ch_p);
+		ipu_idmac_disable_channel(priv->vdi_in_ch);
+		ipu_idmac_disable_channel(priv->vdi_in_ch_n);
+		ipu_idmac_disable_channel(priv->vdi_out_ch);
+
+		ipu_vdi_disable(priv->vdi);
+		ipu_ic_task_disable(priv->ic);
+		ipu_ic_disable(priv->ic);
+	}
+
+	ctx->sequence = 0;
+
+	ipu_mem2mem_vdic_return_bufs(q);
+}
+
+static const struct vb2_ops mem2mem_qops = {
+	.queue_setup	= ipu_mem2mem_vdic_queue_setup,
+	.buf_prepare	= ipu_mem2mem_vdic_buf_prepare,
+	.buf_queue	= ipu_mem2mem_vdic_buf_queue,
+	.wait_prepare	= vb2_ops_wait_prepare,
+	.wait_finish	= vb2_ops_wait_finish,
+	.start_streaming = ipu_mem2mem_vdic_start_streaming,
+	.stop_streaming = ipu_mem2mem_vdic_stop_streaming,
+};
+
+static int ipu_mem2mem_vdic_queue_init(void *priv, struct vb2_queue *src_vq,
+				       struct vb2_queue *dst_vq)
+{
+	struct ipu_mem2mem_vdic_ctx *ctx = priv;
+	int ret;
+
+	memset(src_vq, 0, sizeof(*src_vq));
+	src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT;
+	src_vq->io_modes = VB2_MMAP | VB2_DMABUF;
+	src_vq->drv_priv = ctx;
+	src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	src_vq->ops = &mem2mem_qops;
+	src_vq->mem_ops = &vb2_dma_contig_memops;
+	src_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+	src_vq->lock = &ctx->priv->mutex;
+	src_vq->dev = ctx->priv->dev;
+
+	ret = vb2_queue_init(src_vq);
+	if (ret)
+		return ret;
+
+	memset(dst_vq, 0, sizeof(*dst_vq));
+	dst_vq->type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
+	dst_vq->io_modes = VB2_MMAP | VB2_DMABUF;
+	dst_vq->drv_priv = ctx;
+	dst_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
+	dst_vq->ops = &mem2mem_qops;
+	dst_vq->mem_ops = &vb2_dma_contig_memops;
+	dst_vq->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_COPY;
+	dst_vq->lock = &ctx->priv->mutex;
+	dst_vq->dev = ctx->priv->dev;
+
+	return vb2_queue_init(dst_vq);
+}
+
+#define DEFAULT_WIDTH	720
+#define DEFAULT_HEIGHT	576
+static const struct v4l2_pix_format ipu_mem2mem_vdic_default = {
+	.width		= DEFAULT_WIDTH,
+	.height		= DEFAULT_HEIGHT,
+	.pixelformat	= V4L2_PIX_FMT_YUV420,
+	.field		= V4L2_FIELD_SEQ_TB,
+	.bytesperline	= DEFAULT_WIDTH,
+	.sizeimage	= DEFAULT_WIDTH * DEFAULT_HEIGHT * 3 / 2,
+	.colorspace	= V4L2_COLORSPACE_SRGB,
+	.ycbcr_enc	= V4L2_YCBCR_ENC_601,
+	.xfer_func	= V4L2_XFER_FUNC_DEFAULT,
+	.quantization	= V4L2_QUANTIZATION_DEFAULT,
+};
+
+/*
+ * File operations
+ */
+static int ipu_mem2mem_vdic_open(struct file *file)
+{
+	struct ipu_mem2mem_vdic_priv *priv = video_drvdata(file);
+	struct ipu_mem2mem_vdic_ctx *ctx = NULL;
+	int ret;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	v4l2_fh_init(&ctx->fh, video_devdata(file));
+	file->private_data = &ctx->fh;
+	v4l2_fh_add(&ctx->fh);
+	ctx->priv = priv;
+
+	ctx->fh.m2m_ctx = v4l2_m2m_ctx_init(priv->m2m_dev, ctx,
+					    &ipu_mem2mem_vdic_queue_init);
+	if (IS_ERR(ctx->fh.m2m_ctx)) {
+		ret = PTR_ERR(ctx->fh.m2m_ctx);
+		goto err_ctx;
+	}
+
+	dev_dbg(priv->dev, "Created instance %p, m2m_ctx: %p\n",
+		ctx, ctx->fh.m2m_ctx);
+
+	return 0;
+
+err_ctx:
+	v4l2_fh_del(&ctx->fh);
+	v4l2_fh_exit(&ctx->fh);
+	kfree(ctx);
+	return ret;
+}
+
+static int ipu_mem2mem_vdic_release(struct file *file)
+{
+	struct ipu_mem2mem_vdic_priv *priv = video_drvdata(file);
+	struct ipu_mem2mem_vdic_ctx *ctx = fh_to_ctx(file->private_data);
+
+	dev_dbg(priv->dev, "Releasing instance %p\n", ctx);
+
+	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
+	v4l2_fh_del(&ctx->fh);
+	v4l2_fh_exit(&ctx->fh);
+	kfree(ctx);
+
+	return 0;
+}
+
+static const struct v4l2_file_operations mem2mem_fops = {
+	.owner		= THIS_MODULE,
+	.open		= ipu_mem2mem_vdic_open,
+	.release	= ipu_mem2mem_vdic_release,
+	.poll		= v4l2_m2m_fop_poll,
+	.unlocked_ioctl	= video_ioctl2,
+	.mmap		= v4l2_m2m_fop_mmap,
+};
+
+static struct v4l2_m2m_ops m2m_ops = {
+	.device_run	= ipu_mem2mem_vdic_device_run,
+};
+
+static void ipu_mem2mem_vdic_device_release(struct video_device *vdev)
+{
+	struct ipu_mem2mem_vdic_priv *priv = video_get_drvdata(vdev);
+
+	v4l2_m2m_release(priv->m2m_dev);
+	video_device_release(vdev);
+	kfree(priv);
+}
+
+static const struct video_device mem2mem_template = {
+	.name		= "ipu_vdic",
+	.fops		= &mem2mem_fops,
+	.ioctl_ops	= &mem2mem_ioctl_ops,
+	.minor		= -1,
+	.release	= ipu_mem2mem_vdic_device_release,
+	.vfl_dir	= VFL_DIR_M2M,
+	.tvnorms	= V4L2_STD_NTSC | V4L2_STD_PAL | V4L2_STD_SECAM,
+	.device_caps	= V4L2_CAP_VIDEO_M2M | V4L2_CAP_STREAMING,
+};
+
+static int ipu_mem2mem_vdic_get_ipu_resources(struct ipu_mem2mem_vdic_priv *priv,
+					      struct video_device *vfd)
+{
+	char *nfbname, *eofname;
+	int ret;
+
+	nfbname = devm_kasprintf(priv->dev, GFP_KERNEL, "%s_nfb4eof:%u",
+				 vfd->name, priv->ipu_id);
+	if (!nfbname)
+		return -ENOMEM;
+
+	eofname = devm_kasprintf(priv->dev, GFP_KERNEL, "%s_eof:%u",
+				 vfd->name, priv->ipu_id);
+	if (!eofname)
+		return -ENOMEM;
+
+	priv->vdi = ipu_vdi_get(priv->ipu_dev);
+	if (IS_ERR(priv->vdi)) {
+		ret = PTR_ERR(priv->vdi);
+		goto err_vdi;
+	}
+
+	priv->ic = ipu_ic_get(priv->ipu_dev, IC_TASK_VIEWFINDER);
+	if (IS_ERR(priv->ic)) {
+		ret = PTR_ERR(priv->ic);
+		goto err_ic;
+	}
+
+	priv->vdi_in_ch_p = ipu_idmac_get(priv->ipu_dev,
+					  IPUV3_CHANNEL_MEM_VDI_PREV);
+	if (IS_ERR(priv->vdi_in_ch_p)) {
+		ret = PTR_ERR(priv->vdi_in_ch_p);
+		goto err_prev;
+	}
+
+	priv->vdi_in_ch = ipu_idmac_get(priv->ipu_dev,
+					IPUV3_CHANNEL_MEM_VDI_CUR);
+	if (IS_ERR(priv->vdi_in_ch)) {
+		ret = PTR_ERR(priv->vdi_in_ch);
+		goto err_curr;
+	}
+
+	priv->vdi_in_ch_n = ipu_idmac_get(priv->ipu_dev,
+					  IPUV3_CHANNEL_MEM_VDI_NEXT);
+	if (IS_ERR(priv->vdi_in_ch_n)) {
+		ret = PTR_ERR(priv->vdi_in_ch_n);
+		goto err_next;
+	}
+
+	priv->vdi_out_ch = ipu_idmac_get(priv->ipu_dev,
+					 IPUV3_CHANNEL_IC_PRP_VF_MEM);
+	if (IS_ERR(priv->vdi_out_ch)) {
+		ret = PTR_ERR(priv->vdi_out_ch);
+		goto err_out;
+	}
+
+	priv->nfb4eof_irq = ipu_idmac_channel_irq(priv->ipu_dev,
+						  priv->vdi_out_ch,
+						  IPU_IRQ_NFB4EOF);
+	ret = devm_request_irq(priv->dev, priv->nfb4eof_irq,
+			       ipu_mem2mem_vdic_nfb4eof_interrupt, 0,
+			       nfbname, priv);
+	if (ret)
+		goto err_irq_eof;
+
+	priv->eof_irq = ipu_idmac_channel_irq(priv->ipu_dev,
+					      priv->vdi_out_ch,
+					      IPU_IRQ_EOF);
+	ret = devm_request_irq(priv->dev, priv->eof_irq,
+			       ipu_mem2mem_vdic_eof_interrupt, 0,
+			       eofname, priv);
+	if (ret)
+		goto err_irq_eof;
+
+	/*
+	 * Enable PRG, without PRG clock enabled (CCGR6:prg_clk_enable[0]
+	 * and CCGR6:prg_clk_enable[1]), the VDI does not produce any
+	 * interrupts at all.
+	 */
+	if (ipu_prg_present(priv->ipu_dev))
+		ipu_prg_enable(priv->ipu_dev);
+
+	return 0;
+
+err_irq_eof:
+	ipu_idmac_put(priv->vdi_out_ch);
+err_out:
+	ipu_idmac_put(priv->vdi_in_ch_n);
+err_next:
+	ipu_idmac_put(priv->vdi_in_ch);
+err_curr:
+	ipu_idmac_put(priv->vdi_in_ch_p);
+err_prev:
+	ipu_ic_put(priv->ic);
+err_ic:
+	ipu_vdi_put(priv->vdi);
+err_vdi:
+	return ret;
+}
+
+static void ipu_mem2mem_vdic_put_ipu_resources(struct ipu_mem2mem_vdic_priv *priv)
+{
+	ipu_idmac_put(priv->vdi_out_ch);
+	ipu_idmac_put(priv->vdi_in_ch_n);
+	ipu_idmac_put(priv->vdi_in_ch);
+	ipu_idmac_put(priv->vdi_in_ch_p);
+	ipu_ic_put(priv->ic);
+	ipu_vdi_put(priv->vdi);
+}
+
+int imx_media_mem2mem_vdic_register(struct imx_media_video_dev *vdev)
+{
+	struct ipu_mem2mem_vdic_priv *priv = to_mem2mem_priv(vdev);
+	struct video_device *vfd = vdev->vfd;
+	int ret;
+
+	vfd->v4l2_dev = &priv->md->v4l2_dev;
+
+	ret = ipu_mem2mem_vdic_get_ipu_resources(priv, vfd);
+	if (ret) {
+		v4l2_err(vfd->v4l2_dev, "Failed to get VDIC resources (%d)\n", ret);
+		return ret;
+	}
+
+	ret = video_register_device(vfd, VFL_TYPE_VIDEO, -1);
+	if (ret) {
+		v4l2_err(vfd->v4l2_dev, "Failed to register video device\n");
+		goto err_register;
+	}
+
+	v4l2_info(vfd->v4l2_dev, "Registered %s as /dev/%s\n", vfd->name,
+		  video_device_node_name(vfd));
+
+	return 0;
+
+err_register:
+	ipu_mem2mem_vdic_put_ipu_resources(priv);
+	return ret;
+}
+
+void imx_media_mem2mem_vdic_unregister(struct imx_media_video_dev *vdev)
+{
+	struct ipu_mem2mem_vdic_priv *priv = to_mem2mem_priv(vdev);
+	struct video_device *vfd = priv->vdev.vfd;
+
+	video_unregister_device(vfd);
+
+	ipu_mem2mem_vdic_put_ipu_resources(priv);
+}
+
+struct imx_media_video_dev *
+imx_media_mem2mem_vdic_init(struct imx_media_dev *md, int ipu_id)
+{
+	struct ipu_mem2mem_vdic_priv *priv;
+	struct video_device *vfd;
+	int ret;
+
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return ERR_PTR(-ENOMEM);
+
+	priv->md = md;
+	priv->ipu_id = ipu_id;
+	priv->ipu_dev = md->ipu[ipu_id];
+	priv->dev = md->md.dev;
+
+	mutex_init(&priv->mutex);
+
+	vfd = video_device_alloc();
+	if (!vfd) {
+		ret = -ENOMEM;
+		goto err_vfd;
+	}
+
+	*vfd = mem2mem_template;
+	vfd->lock = &priv->mutex;
+	priv->vdev.vfd = vfd;
+
+	INIT_LIST_HEAD(&priv->vdev.list);
+	spin_lock_init(&priv->irqlock);
+	atomic_set(&priv->stream_count, 0);
+
+	video_set_drvdata(vfd, priv);
+
+	priv->m2m_dev = v4l2_m2m_init(&m2m_ops);
+	if (IS_ERR(priv->m2m_dev)) {
+		ret = PTR_ERR(priv->m2m_dev);
+		v4l2_err(&md->v4l2_dev, "Failed to init mem2mem device: %d\n",
+			 ret);
+		goto err_m2m;
+	}
+
+	/* Reset formats */
+	priv->fmt[V4L2_M2M_SRC] = ipu_mem2mem_vdic_default;
+	priv->fmt[V4L2_M2M_SRC].pixelformat = V4L2_PIX_FMT_YUV420;
+	priv->fmt[V4L2_M2M_SRC].field = V4L2_FIELD_SEQ_TB;
+	priv->fmt[V4L2_M2M_SRC].bytesperline = DEFAULT_WIDTH;
+	priv->fmt[V4L2_M2M_SRC].sizeimage = DEFAULT_WIDTH * DEFAULT_HEIGHT * 3 / 2;
+
+	priv->fmt[V4L2_M2M_DST] = ipu_mem2mem_vdic_default;
+	priv->fmt[V4L2_M2M_DST].pixelformat = V4L2_PIX_FMT_RGB565;
+	priv->fmt[V4L2_M2M_DST].field = V4L2_FIELD_NONE;
+	priv->fmt[V4L2_M2M_DST].bytesperline = DEFAULT_WIDTH * 2;
+	priv->fmt[V4L2_M2M_DST].sizeimage = DEFAULT_WIDTH * DEFAULT_HEIGHT * 2;
+
+	return &priv->vdev;
+
+err_m2m:
+	video_device_release(vfd);
+	video_set_drvdata(vfd, NULL);
+err_vfd:
+	kfree(priv);
+	return ERR_PTR(ret);
+}
+
+void imx_media_mem2mem_vdic_uninit(struct imx_media_video_dev *vdev)
+{
+	struct ipu_mem2mem_vdic_priv *priv = to_mem2mem_priv(vdev);
+	struct video_device *vfd = priv->vdev.vfd;
+
+	video_device_release(vfd);
+	video_set_drvdata(vfd, NULL);
+	kfree(priv);
+}
+
+MODULE_DESCRIPTION("i.MX VDIC mem2mem de-interlace driver");
+MODULE_AUTHOR("Marek Vasut <marex@denx.de>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/staging/media/imx/imx-media.h b/drivers/staging/media/imx/imx-media.h
index f095d9134fee4..9f2388e306727 100644
--- a/drivers/staging/media/imx/imx-media.h
+++ b/drivers/staging/media/imx/imx-media.h
@@ -162,6 +162,9 @@  struct imx_media_dev {
 	/* IC scaler/CSC mem2mem video device */
 	struct imx_media_video_dev *m2m_vdev;
 
+	/* VDIC mem2mem video device */
+	struct imx_media_video_dev *m2m_vdic[2];
+
 	/* the IPU internal subdev's registered synchronously */
 	struct v4l2_subdev *sync_sd[2][NUM_IPU_SUBDEVS];
 };
@@ -284,6 +287,13 @@  imx_media_csc_scaler_device_init(struct imx_media_dev *dev);
 int imx_media_csc_scaler_device_register(struct imx_media_video_dev *vdev);
 void imx_media_csc_scaler_device_unregister(struct imx_media_video_dev *vdev);
 
+/* imx-media-mem2mem-vdic.c */
+struct imx_media_video_dev *
+imx_media_mem2mem_vdic_init(struct imx_media_dev *dev, int ipu_id);
+void imx_media_mem2mem_vdic_uninit(struct imx_media_video_dev *vdev);
+int imx_media_mem2mem_vdic_register(struct imx_media_video_dev *vdev);
+void imx_media_mem2mem_vdic_unregister(struct imx_media_video_dev *vdev);
+
 /* subdev group ids */
 #define IMX_MEDIA_GRP_ID_CSI2          BIT(8)
 #define IMX_MEDIA_GRP_ID_IPU_CSI_BIT   10