media: dw100: Enable dynamic vertex map

Message ID	20241022063155.506191-1-umang.jain@ideasonboard.com (mailing list archive)
State	New
Headers	show Received: from perceval.ideasonboard.com (perceval.ideasonboard.com [213.167.242.64]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A24BF126C02 for <linux-media@vger.kernel.org>; Tue, 22 Oct 2024 06:32:04 +0000 (UTC) From: Umang Jain <umang.jain@ideasonboard.com> To: linux-media@vger.kernel.org, Xavier Roumegue <xavier.roumegue@oss.nxp.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>, Sakari Ailus <sakari.ailus@linux.intel.com>, Laurent Pinchart <laurent.pinchart@ideasonboard.com>, Stefan Klug <stefan.klug@ideasonboard.com>, Kieran Bingham <kieran.bingham@ideasonboard.com>, Umang Jain <umang.jain@ideasonboard.com> Subject: [PATCH] media: dw100: Enable dynamic vertex map Date: Tue, 22 Oct 2024 12:01:55 +0530 Message-ID: <20241022063155.506191-1-umang.jain@ideasonboard.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	media: dw100: Enable dynamic vertex map \| expand media: dw100: Enable dynamic vertex map

Umang Jain Oct. 22, 2024, 6:31 a.m. UTC

Currently, vertex maps cannot be updated dynamically while dw100
is streaming. This patch enables the support to update the vertex
map dynamically at runtime.

To support this functionality, we need to allocate and track two
sets of DMA-allocated vertex maps, one for the currently applied map
and another for the updated (pending) map. Before the start of next
frame, if a new user-supplied vertex map is available, the hardware
mapping is changed to use new vertex map, thus enabling the user to
update the vertex map at runtime.

We should ensure no race occurs when the vertex map is updated multiple
times when a frame is processing. Hence, vertex map is never updated to
the applied vertex map index in .s_ctrl(). It is always updated on the
pending vertex map slot, with `maps_mutex` lock held. `maps_mutex` lock
is also held when the pending vertex map is applied to the hardware in
dw100_start().

Ability to update the vertex map at runtime, enables abritrary rotation
and digital zoom features for the input frames, through the dw100
hardware.

Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
---
 drivers/media/platform/nxp/dw100/dw100.c | 73 ++++++++++++++++++------
 1 file changed, 56 insertions(+), 17 deletions(-)

kernel test robot Oct. 22, 2024, 11:09 p.m. UTC | #1

Hi Umang,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linuxtv-media-stage/master]
[also build test WARNING on media-tree/master sailus-media-tree/streams sailus-media-tree/master linus/master v6.12-rc4 next-20241022]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Umang-Jain/media-dw100-Enable-dynamic-vertex-map/20241022-143315
base:   https://git.linuxtv.org/media_stage.git master
patch link:    https://lore.kernel.org/r/20241022063155.506191-1-umang.jain%40ideasonboard.com
patch subject: [PATCH] media: dw100: Enable dynamic vertex map
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20241023/202410230647.fPoPRh2O-lkp@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241023/202410230647.fPoPRh2O-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410230647.fPoPRh2O-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/media/platform/nxp/dw100/dw100.c:369:3: warning: label followed by a declaration is a C23 extension [-Wc23-extensions]
     369 |                 u32 *user_map = ctrl->p_new.p_u32;
         |                 ^
   1 warning generated.

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for MODVERSIONS
   Depends on [n]: MODULES [=y] && !COMPILE_TEST [=y]
   Selected by [y]:
   - RANDSTRUCT_FULL [=y] && (CC_HAS_RANDSTRUCT [=y] || GCC_PLUGINS [=n]) && MODULES [=y]


vim +369 drivers/media/platform/nxp/dw100/dw100.c

   361	
   362	static int dw100_s_ctrl(struct v4l2_ctrl *ctrl)
   363	{
   364		struct dw100_ctx *ctx =
   365			container_of(ctrl->handler, struct dw100_ctx, hdl);
   366	
   367		switch (ctrl->id) {
   368		case V4L2_CID_DW100_DEWARPING_16x16_VERTEX_MAP:
 > 369			u32 *user_map = ctrl->p_new.p_u32;
   370			unsigned int id;
   371	
   372			mutex_lock(&ctx->maps_mutex);
   373			id = ctx->applied_map_id ? 0 : 1;
   374			memcpy(ctx->maps[id].map, user_map, ctx->map_size);
   375			ctx->user_map_is_updated = true;
   376			mutex_unlock(&ctx->maps_mutex);
   377	
   378			ctx->user_map_is_set = true;
   379			break;
   380		}
   381	
   382		return 0;
   383	}
   384

Xavier Roumegue (OSS) Oct. 26, 2024, 7:52 p.m. UTC | #2

Hi Umang,

Thanks for the patch, this feature sounds promising.

On 10/22/24 8:31 AM, Umang Jain wrote:
> Currently, vertex maps cannot be updated dynamically while dw100
> is streaming. This patch enables the support to update the vertex
> map dynamically at runtime.
> 
> To support this functionality, we need to allocate and track two
> sets of DMA-allocated vertex maps, one for the currently applied map
> and another for the updated (pending) map. Before the start of next
> frame, if a new user-supplied vertex map is available, the hardware
> mapping is changed to use new vertex map, thus enabling the user to
> update the vertex map at runtime.
> 
> We should ensure no race occurs when the vertex map is updated multiple
> times when a frame is processing. Hence, vertex map is never updated to
> the applied vertex map index in .s_ctrl(). It is always updated on the
> pending vertex map slot, with `maps_mutex` lock held. `maps_mutex` lock
> is also held when the pending vertex map is applied to the hardware in
> dw100_start().
> 
> Ability to update the vertex map at runtime, enables abritrary rotation
> and digital zoom features for the input frames, through the dw100
> hardware.
> 
> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
> ---
>   drivers/media/platform/nxp/dw100/dw100.c | 73 ++++++++++++++++++------
>   1 file changed, 56 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/media/platform/nxp/dw100/dw100.c b/drivers/media/platform/nxp/dw100/dw100.c
> index 54ebf59df682..42712ccff754 100644
> --- a/drivers/media/platform/nxp/dw100/dw100.c
> +++ b/drivers/media/platform/nxp/dw100/dw100.c
> @@ -83,6 +83,11 @@ struct dw100_q_data {
>   	struct v4l2_rect		crop;
>   };
>   
> +struct dw100_map {
> +	unsigned int *map;
> +	dma_addr_t map_dma;
> +};
> +
>   struct dw100_ctx {
>   	struct v4l2_fh			fh;
>   	struct dw100_device		*dw_dev;
> @@ -92,12 +97,14 @@ struct dw100_ctx {
>   	struct mutex			vq_mutex;
>   
>   	/* Look Up Table for pixel remapping */
> -	unsigned int			*map;
> -	dma_addr_t			map_dma;
> +	struct dw100_map		maps[2];
> +	unsigned int			applied_map_id;
>   	size_t				map_size;
>   	unsigned int			map_width;
>   	unsigned int			map_height;
>   	bool				user_map_is_set;
> +	bool				user_map_is_updated;
> +	struct mutex			maps_mutex;
>   
>   	/* Source and destination queue data */
>   	struct dw100_q_data		q_data[2];
> @@ -308,24 +315,31 @@ static int dw100_create_mapping(struct dw100_ctx *ctx)
>   {
>   	u32 *user_map;
>   
> -	if (ctx->map)
> -		dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> -				  ctx->map, ctx->map_dma);
> +	for (unsigned int i = 0; i < 2; i++) {
> +		if (ctx->maps[i].map)
> +			dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> +					  ctx->maps[i].map, ctx->maps[i].map_dma);
>   
> -	ctx->map = dma_alloc_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> -				      &ctx->map_dma, GFP_KERNEL);
> +		ctx->maps[i].map = dma_alloc_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> +						      &ctx->maps[i].map_dma, GFP_KERNEL);
>   
> -	if (!ctx->map)
> -		return -ENOMEM;
> +		if (!ctx->maps[i].map)
> +			return -ENOMEM;
> +	}
>   
>   	user_map = dw100_get_user_map(ctx);
> -	memcpy(ctx->map, user_map, ctx->map_size);
> +
> +	mutex_lock(&ctx->maps_mutex);
> +	ctx->applied_map_id = 0;
> +	memcpy(ctx->maps[ctx->applied_map_id].map, user_map, ctx->map_size);
> +	mutex_unlock(&ctx->maps_mutex);
>   
>   	dev_dbg(&ctx->dw_dev->pdev->dev,
>   		"%ux%u %s mapping created (d:%pad-c:%p) for stream %ux%u->%ux%u\n",
>   		ctx->map_width, ctx->map_height,
>   		ctx->user_map_is_set ? "user" : "identity",
> -		&ctx->map_dma, ctx->map,
> +		&ctx->maps[ctx->applied_map_id].map_dma,
> +		ctx->maps[ctx->applied_map_id].map,
>   		ctx->q_data[DW100_QUEUE_SRC].pix_fmt.width,
>   		ctx->q_data[DW100_QUEUE_DST].pix_fmt.height,
>   		ctx->q_data[DW100_QUEUE_SRC].pix_fmt.width,
> @@ -336,10 +350,12 @@ static int dw100_create_mapping(struct dw100_ctx *ctx)
>   
>   static void dw100_destroy_mapping(struct dw100_ctx *ctx)
>   {
> -	if (ctx->map) {
> -		dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> -				  ctx->map, ctx->map_dma);
> -		ctx->map = NULL;
> +	for (unsigned int i = 0; i < 2; i++) {
> +		if (ctx->maps[i].map)
> +			dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> +					  ctx->maps[i].map, ctx->maps[i].map_dma);
> +
> +		ctx->maps[i].map = NULL;
>   	}
>   }
>   
> @@ -350,6 +366,15 @@ static int dw100_s_ctrl(struct v4l2_ctrl *ctrl)
>   
>   	switch (ctrl->id) {
>   	case V4L2_CID_DW100_DEWARPING_16x16_VERTEX_MAP:
> +		u32 *user_map = ctrl->p_new.p_u32;
A warning to fix here.
> +		unsigned int id;
> +
> +		mutex_lock(&ctx->maps_mutex);
> +		id = ctx->applied_map_id ? 0 : 1;
> +		memcpy(ctx->maps[id].map, user_map, ctx->map_size);
> +		ctx->user_map_is_updated = true;
If you call the control before to start the stream, the dma mapping is 
not yet done(dw100_create_mapping not yet called). Hence, copying the 
user map to a NULL pointer.
> +		mutex_unlock(&ctx->maps_mutex);
> +
>   		ctx->user_map_is_set = true;
>   		break;
>   	}
> @@ -655,6 +680,8 @@ static int dw100_open(struct file *file)
>   
>   	v4l2_fh_add(&ctx->fh);
>   
> +	mutex_init(&ctx->maps_mutex);
> +
>   	return 0;
>   
>   err:
> @@ -675,6 +702,7 @@ static int dw100_release(struct file *file)
>   	v4l2_ctrl_handler_free(&ctx->hdl);
>   	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
>   	mutex_destroy(&ctx->vq_mutex);
> +	mutex_destroy(&ctx->maps_mutex);
>   	kfree(ctx);
>   
>   	return 0;
> @@ -1453,8 +1481,19 @@ static void dw100_start(struct dw100_ctx *ctx, struct vb2_v4l2_buffer *in_vb,
>   	dw100_hw_set_destination(dw_dev, &ctx->q_data[DW100_QUEUE_DST],
>   				 ctx->q_data[DW100_QUEUE_SRC].fmt,
>   				 &out_vb->vb2_buf);
> -	dw100_hw_set_mapping(dw_dev, ctx->map_dma,
> -			     ctx->map_width, ctx->map_height);
> +
> +
> +	mutex_lock(&ctx->maps_mutex);
> +	if (ctx->user_map_is_updated) {
The hardware register must unconditionally be updated while starting a 
new context, as a v4l2 m2m supports multi context operations. Otherwise, 
you may be running with the user mapping used by the previous context.

Moreover, the hardware mapping will not be set in case you use the 
driver as a simple scaler without user mapping, which causes the process 
to hang as the run does not start and never completes.
> +		unsigned int id = ctx->applied_map_id ? 0 : 1;
> +
> +		dw100_hw_set_mapping(dw_dev, ctx->maps[id].map_dma,
> +				     ctx->map_width, ctx->map_height);
> +		ctx->applied_map_id = id;
> +		ctx->user_map_is_updated = false;
> +	}
> +	mutex_unlock(&ctx->maps_mutex);
> +
>   	dw100_hw_enable_irq(dw_dev);
>   	dw100_hw_dewarp_start(dw_dev);
>   

It sounds as this patch requires a collaborative application for running 
well. All my simple tests failed.

You can test a simple scaler/pixfmt conversion operation with v4l2 utils:


v4l2-ctl \
-d 0 \
--set-fmt-video-out width=640,height=480,pixelformat=NV12,field=none \
--set-fmt-video width=640,height=480,pixelformat=NV21,field=none \
--stream-out-pattern 3 \
--set-selection-output\ 
target=crop,top=0,left=0,width=640,height=480,flags= \
--stream-count 100 \
--stream-mmap \
--stream-out-mmap

Xavier

Laurent Pinchart Oct. 27, 2024, 2:40 p.m. UTC | #3

On Sat, Oct 26, 2024 at 09:52:43PM +0200, Xavier Roumegue wrote:
> Hi Umang,
> 
> Thanks for the patch, this feature sounds promising.
> 
> On 10/22/24 8:31 AM, Umang Jain wrote:
> > Currently, vertex maps cannot be updated dynamically while dw100
> > is streaming. This patch enables the support to update the vertex
> > map dynamically at runtime.
> > 
> > To support this functionality, we need to allocate and track two
> > sets of DMA-allocated vertex maps, one for the currently applied map
> > and another for the updated (pending) map. Before the start of next
> > frame, if a new user-supplied vertex map is available, the hardware
> > mapping is changed to use new vertex map, thus enabling the user to
> > update the vertex map at runtime.

How do you synchronize the new map with the jobs ? That doesn't seem to
be supported by the patch, is it a feature that you don't need ?

> > 
> > We should ensure no race occurs when the vertex map is updated multiple
> > times when a frame is processing. Hence, vertex map is never updated to
> > the applied vertex map index in .s_ctrl(). It is always updated on the
> > pending vertex map slot, with `maps_mutex` lock held. `maps_mutex` lock
> > is also held when the pending vertex map is applied to the hardware in
> > dw100_start().
> > 
> > Ability to update the vertex map at runtime, enables abritrary rotation

s/abritrary/arbitrary/

> > and digital zoom features for the input frames, through the dw100
> > hardware.
> > 
> > Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
> > ---
> >   drivers/media/platform/nxp/dw100/dw100.c | 73 ++++++++++++++++++------
> >   1 file changed, 56 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/media/platform/nxp/dw100/dw100.c b/drivers/media/platform/nxp/dw100/dw100.c
> > index 54ebf59df682..42712ccff754 100644
> > --- a/drivers/media/platform/nxp/dw100/dw100.c
> > +++ b/drivers/media/platform/nxp/dw100/dw100.c
> > @@ -83,6 +83,11 @@ struct dw100_q_data {
> >   	struct v4l2_rect		crop;
> >   };
> >   
> > +struct dw100_map {
> > +	unsigned int *map;
> > +	dma_addr_t map_dma;

I would have called the field just 'dma' as it's already qualified by
the structure name or the field name in dw100_ctx.

> > +};
> > +
> >   struct dw100_ctx {
> >   	struct v4l2_fh			fh;
> >   	struct dw100_device		*dw_dev;
> > @@ -92,12 +97,14 @@ struct dw100_ctx {
> >   	struct mutex			vq_mutex;
> >   
> >   	/* Look Up Table for pixel remapping */
> > -	unsigned int			*map;
> > -	dma_addr_t			map_dma;
> > +	struct dw100_map		maps[2];
> > +	unsigned int			applied_map_id;
> >   	size_t				map_size;
> >   	unsigned int			map_width;
> >   	unsigned int			map_height;
> >   	bool				user_map_is_set;
> > +	bool				user_map_is_updated;
> > +	struct mutex			maps_mutex;
> >   
> >   	/* Source and destination queue data */
> >   	struct dw100_q_data		q_data[2];
> > @@ -308,24 +315,31 @@ static int dw100_create_mapping(struct dw100_ctx *ctx)
> >   {
> >   	u32 *user_map;
> >   
> > -	if (ctx->map)
> > -		dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> > -				  ctx->map, ctx->map_dma);
> > +	for (unsigned int i = 0; i < 2; i++) {

	for (unsigned int i = 0; i < ARRAY_SIZE(ctx->maps); i++) {
		struct dw100_map *map = &ctx->maps[i];

and use map below.


> > +		if (ctx->maps[i].map)
> > +			dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> > +					  ctx->maps[i].map, ctx->maps[i].map_dma);
> >   
> > -	ctx->map = dma_alloc_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> > -				      &ctx->map_dma, GFP_KERNEL);
> > +		ctx->maps[i].map = dma_alloc_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> > +						      &ctx->maps[i].map_dma, GFP_KERNEL);
> >   
> > -	if (!ctx->map)
> > -		return -ENOMEM;
> > +		if (!ctx->maps[i].map)
> > +			return -ENOMEM;
> > +	}
> >   
> >   	user_map = dw100_get_user_map(ctx);
> > -	memcpy(ctx->map, user_map, ctx->map_size);
> > +
> > +	mutex_lock(&ctx->maps_mutex);
> > +	ctx->applied_map_id = 0;
> > +	memcpy(ctx->maps[ctx->applied_map_id].map, user_map, ctx->map_size);
> > +	mutex_unlock(&ctx->maps_mutex);
> >   
> >   	dev_dbg(&ctx->dw_dev->pdev->dev,
> >   		"%ux%u %s mapping created (d:%pad-c:%p) for stream %ux%u->%ux%u\n",
> >   		ctx->map_width, ctx->map_height,
> >   		ctx->user_map_is_set ? "user" : "identity",
> > -		&ctx->map_dma, ctx->map,
> > +		&ctx->maps[ctx->applied_map_id].map_dma,
> > +		ctx->maps[ctx->applied_map_id].map,
> >   		ctx->q_data[DW100_QUEUE_SRC].pix_fmt.width,
> >   		ctx->q_data[DW100_QUEUE_DST].pix_fmt.height,
> >   		ctx->q_data[DW100_QUEUE_SRC].pix_fmt.width,
> > @@ -336,10 +350,12 @@ static int dw100_create_mapping(struct dw100_ctx *ctx)
> >   
> >   static void dw100_destroy_mapping(struct dw100_ctx *ctx)
> >   {
> > -	if (ctx->map) {
> > -		dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> > -				  ctx->map, ctx->map_dma);
> > -		ctx->map = NULL;
> > +	for (unsigned int i = 0; i < 2; i++) {

	for (unsigned int i = 0; i < ARRAY_SIZE(ctx->maps); i++) {
		struct dw100_map *map = &ctx->maps[i];

and use map below.

> > +		if (ctx->maps[i].map)
> > +			dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> > +					  ctx->maps[i].map, ctx->maps[i].map_dma);
> > +
> > +		ctx->maps[i].map = NULL;
> >   	}
> >   }
> >   
> > @@ -350,6 +366,15 @@ static int dw100_s_ctrl(struct v4l2_ctrl *ctrl)
> >   
> >   	switch (ctrl->id) {
> >   	case V4L2_CID_DW100_DEWARPING_16x16_VERTEX_MAP:
> > +		u32 *user_map = ctrl->p_new.p_u32;
>
> A warning to fix here.
>
> > +		unsigned int id;
> > +
> > +		mutex_lock(&ctx->maps_mutex);
> > +		id = ctx->applied_map_id ? 0 : 1;
> > +		memcpy(ctx->maps[id].map, user_map, ctx->map_size);
> > +		ctx->user_map_is_updated = true;
>
> If you call the control before to start the stream, the dma mapping is 
> not yet done(dw100_create_mapping not yet called). Hence, copying the 
> user map to a NULL pointer.

The maps could be allocated in dw100_open() when creating the context.
That would likely require moving the initialization of ctx->map_width,
ctx->map_height and ctx->map_size as well. The handling of the identity
map would probably need to be rewritten too.

> > +		mutex_unlock(&ctx->maps_mutex);
> > +
> >   		ctx->user_map_is_set = true;
> >   		break;
> >   	}
> > @@ -655,6 +680,8 @@ static int dw100_open(struct file *file)
> >   
> >   	v4l2_fh_add(&ctx->fh);
> >   
> > +	mutex_init(&ctx->maps_mutex);
> > +
> >   	return 0;
> >   
> >   err:
> > @@ -675,6 +702,7 @@ static int dw100_release(struct file *file)
> >   	v4l2_ctrl_handler_free(&ctx->hdl);
> >   	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> >   	mutex_destroy(&ctx->vq_mutex);
> > +	mutex_destroy(&ctx->maps_mutex);
> >   	kfree(ctx);
> >   
> >   	return 0;
> > @@ -1453,8 +1481,19 @@ static void dw100_start(struct dw100_ctx *ctx, struct vb2_v4l2_buffer *in_vb,
> >   	dw100_hw_set_destination(dw_dev, &ctx->q_data[DW100_QUEUE_DST],
> >   				 ctx->q_data[DW100_QUEUE_SRC].fmt,
> >   				 &out_vb->vb2_buf);
> > -	dw100_hw_set_mapping(dw_dev, ctx->map_dma,
> > -			     ctx->map_width, ctx->map_height);
> > +
> > +
> > +	mutex_lock(&ctx->maps_mutex);
> > +	if (ctx->user_map_is_updated) {
>
> The hardware register must unconditionally be updated while starting a 
> new context, as a v4l2 m2m supports multi context operations. Otherwise, 
> you may be running with the user mapping used by the previous context.
> 
> Moreover, the hardware mapping will not be set in case you use the 
> driver as a simple scaler without user mapping, which causes the process 
> to hang as the run does not start and never completes.
>
> > +		unsigned int id = ctx->applied_map_id ? 0 : 1;
> > +
> > +		dw100_hw_set_mapping(dw_dev, ctx->maps[id].map_dma,
> > +				     ctx->map_width, ctx->map_height);
> > +		ctx->applied_map_id = id;
> > +		ctx->user_map_is_updated = false;
> > +	}
> > +	mutex_unlock(&ctx->maps_mutex);
> > +
> >   	dw100_hw_enable_irq(dw_dev);
> >   	dw100_hw_dewarp_start(dw_dev);
> >   
> 
> It sounds as this patch requires a collaborative application for running 
> well. All my simple tests failed.
> 
> You can test a simple scaler/pixfmt conversion operation with v4l2 utils:
> 
> 
> v4l2-ctl \
> -d 0 \
> --set-fmt-video-out width=640,height=480,pixelformat=NV12,field=none \
> --set-fmt-video width=640,height=480,pixelformat=NV21,field=none \
> --stream-out-pattern 3 \
> --set-selection-output\ 
> target=crop,top=0,left=0,width=640,height=480,flags= \
> --stream-count 100 \
> --stream-mmap \
> --stream-out-mmap

Umang Jain Oct. 28, 2024, 7:27 a.m. UTC | #4

Hi Laurent,

On 27/10/24 8:10 pm, Laurent Pinchart wrote:
> On Sat, Oct 26, 2024 at 09:52:43PM +0200, Xavier Roumegue wrote:
>> Hi Umang,
>>
>> Thanks for the patch, this feature sounds promising.
>>
>> On 10/22/24 8:31 AM, Umang Jain wrote:
>>> Currently, vertex maps cannot be updated dynamically while dw100
>>> is streaming. This patch enables the support to update the vertex
>>> map dynamically at runtime.
>>>
>>> To support this functionality, we need to allocate and track two
>>> sets of DMA-allocated vertex maps, one for the currently applied map
>>> and another for the updated (pending) map. Before the start of next
>>> frame, if a new user-supplied vertex map is available, the hardware
>>> mapping is changed to use new vertex map, thus enabling the user to
>>> update the vertex map at runtime.
> How do you synchronize the new map with the jobs ? That doesn't seem to
> be supported by the patch, is it a feature that you don't need ?

First question around running jobs - isn't the DW100 dewarper run with 1 
src buffer and 1 dst buffer per context ? Do we need addiitonal 
synchronisation there? I think same point was mentioned by you for 
scaler crop setting but maybe I am missing something to see here.

 From the description of .job_ready():

  * @job_ready:  optional. Should return 0 if the driver does not have a 
job
  *              fully prepared to run yet (i.e. it will not be able to 
finish a
  *              transaction without sleeping). If not provided, it will be
  *              assumed that one source and one destination buffer are all
  *              that is required for the driver to perform one full 
transaction.
  *              This method may not sleep.

And when the IRQ handler completes, it calls to v4l2_m2m_job_finish 
which then schedules the next job through v4l2_m2m_schedule_next_job()

So I assumed that, .device_run() and ... in the IRQ handler (call 
to v4l2_m2m_job_finish) will be counted as one job transaction - since 
we are not working with multiple source and destination buffers for one 
transaction.

When the next job is started (i.e. dw100_start()) the vertex map index 
which is updated, is programmed to the register. That's is why this 
patch introduced two vertex maps for tracking - one which is applied and 
other one which can be updated by user.


>
>>> We should ensure no race occurs when the vertex map is updated multiple
>>> times when a frame is processing. Hence, vertex map is never updated to
>>> the applied vertex map index in .s_ctrl(). It is always updated on the
>>> pending vertex map slot, with `maps_mutex` lock held. `maps_mutex` lock
>>> is also held when the pending vertex map is applied to the hardware in
>>> dw100_start().
>>>
>>> Ability to update the vertex map at runtime, enables abritrary rotation
> s/abritrary/arbitrary/
>
>>> and digital zoom features for the input frames, through the dw100
>>> hardware.
>>>
>>> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
>>> ---
>>>    drivers/media/platform/nxp/dw100/dw100.c | 73 ++++++++++++++++++------
>>>    1 file changed, 56 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/media/platform/nxp/dw100/dw100.c b/drivers/media/platform/nxp/dw100/dw100.c
>>> index 54ebf59df682..42712ccff754 100644
>>> --- a/drivers/media/platform/nxp/dw100/dw100.c
>>> +++ b/drivers/media/platform/nxp/dw100/dw100.c
>>> @@ -83,6 +83,11 @@ struct dw100_q_data {
>>>    	struct v4l2_rect		crop;
>>>    };
>>>    
>>> +struct dw100_map {
>>> +	unsigned int *map;
>>> +	dma_addr_t map_dma;
> I would have called the field just 'dma' as it's already qualified by
> the structure name or the field name in dw100_ctx.
>
>>> +};
>>> +
>>>    struct dw100_ctx {
>>>    	struct v4l2_fh			fh;
>>>    	struct dw100_device		*dw_dev;
>>> @@ -92,12 +97,14 @@ struct dw100_ctx {
>>>    	struct mutex			vq_mutex;
>>>    
>>>    	/* Look Up Table for pixel remapping */
>>> -	unsigned int			*map;
>>> -	dma_addr_t			map_dma;
>>> +	struct dw100_map		maps[2];
>>> +	unsigned int			applied_map_id;
>>>    	size_t				map_size;
>>>    	unsigned int			map_width;
>>>    	unsigned int			map_height;
>>>    	bool				user_map_is_set;
>>> +	bool				user_map_is_updated;
>>> +	struct mutex			maps_mutex;
>>>    
>>>    	/* Source and destination queue data */
>>>    	struct dw100_q_data		q_data[2];
>>> @@ -308,24 +315,31 @@ static int dw100_create_mapping(struct dw100_ctx *ctx)
>>>    {
>>>    	u32 *user_map;
>>>    
>>> -	if (ctx->map)
>>> -		dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>> -				  ctx->map, ctx->map_dma);
>>> +	for (unsigned int i = 0; i < 2; i++) {
> 	for (unsigned int i = 0; i < ARRAY_SIZE(ctx->maps); i++) {
> 		struct dw100_map *map = &ctx->maps[i];
>
> and use map below.
>
>
>>> +		if (ctx->maps[i].map)
>>> +			dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>> +					  ctx->maps[i].map, ctx->maps[i].map_dma);
>>>    
>>> -	ctx->map = dma_alloc_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>> -				      &ctx->map_dma, GFP_KERNEL);
>>> +		ctx->maps[i].map = dma_alloc_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>> +						      &ctx->maps[i].map_dma, GFP_KERNEL);
>>>    
>>> -	if (!ctx->map)
>>> -		return -ENOMEM;
>>> +		if (!ctx->maps[i].map)
>>> +			return -ENOMEM;
>>> +	}
>>>    
>>>    	user_map = dw100_get_user_map(ctx);
>>> -	memcpy(ctx->map, user_map, ctx->map_size);
>>> +
>>> +	mutex_lock(&ctx->maps_mutex);
>>> +	ctx->applied_map_id = 0;
>>> +	memcpy(ctx->maps[ctx->applied_map_id].map, user_map, ctx->map_size);
>>> +	mutex_unlock(&ctx->maps_mutex);
>>>    
>>>    	dev_dbg(&ctx->dw_dev->pdev->dev,
>>>    		"%ux%u %s mapping created (d:%pad-c:%p) for stream %ux%u->%ux%u\n",
>>>    		ctx->map_width, ctx->map_height,
>>>    		ctx->user_map_is_set ? "user" : "identity",
>>> -		&ctx->map_dma, ctx->map,
>>> +		&ctx->maps[ctx->applied_map_id].map_dma,
>>> +		ctx->maps[ctx->applied_map_id].map,
>>>    		ctx->q_data[DW100_QUEUE_SRC].pix_fmt.width,
>>>    		ctx->q_data[DW100_QUEUE_DST].pix_fmt.height,
>>>    		ctx->q_data[DW100_QUEUE_SRC].pix_fmt.width,
>>> @@ -336,10 +350,12 @@ static int dw100_create_mapping(struct dw100_ctx *ctx)
>>>    
>>>    static void dw100_destroy_mapping(struct dw100_ctx *ctx)
>>>    {
>>> -	if (ctx->map) {
>>> -		dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>> -				  ctx->map, ctx->map_dma);
>>> -		ctx->map = NULL;
>>> +	for (unsigned int i = 0; i < 2; i++) {
> 	for (unsigned int i = 0; i < ARRAY_SIZE(ctx->maps); i++) {
> 		struct dw100_map *map = &ctx->maps[i];
>
> and use map below.
>
>>> +		if (ctx->maps[i].map)
>>> +			dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>> +					  ctx->maps[i].map, ctx->maps[i].map_dma);
>>> +
>>> +		ctx->maps[i].map = NULL;
>>>    	}
>>>    }
>>>    
>>> @@ -350,6 +366,15 @@ static int dw100_s_ctrl(struct v4l2_ctrl *ctrl)
>>>    
>>>    	switch (ctrl->id) {
>>>    	case V4L2_CID_DW100_DEWARPING_16x16_VERTEX_MAP:
>>> +		u32 *user_map = ctrl->p_new.p_u32;
>> A warning to fix here.
>>
>>> +		unsigned int id;
>>> +
>>> +		mutex_lock(&ctx->maps_mutex);
>>> +		id = ctx->applied_map_id ? 0 : 1;
>>> +		memcpy(ctx->maps[id].map, user_map, ctx->map_size);
>>> +		ctx->user_map_is_updated = true;
>> If you call the control before to start the stream, the dma mapping is
>> not yet done(dw100_create_mapping not yet called). Hence, copying the
>> user map to a NULL pointer.
> The maps could be allocated in dw100_open() when creating the context.
> That would likely require moving the initialization of ctx->map_width,
> ctx->map_height and ctx->map_size as well. The handling of the identity
> map would probably need to be rewritten too.
>
>>> +		mutex_unlock(&ctx->maps_mutex);
>>> +
>>>    		ctx->user_map_is_set = true;
>>>    		break;
>>>    	}
>>> @@ -655,6 +680,8 @@ static int dw100_open(struct file *file)
>>>    
>>>    	v4l2_fh_add(&ctx->fh);
>>>    
>>> +	mutex_init(&ctx->maps_mutex);
>>> +
>>>    	return 0;
>>>    
>>>    err:
>>> @@ -675,6 +702,7 @@ static int dw100_release(struct file *file)
>>>    	v4l2_ctrl_handler_free(&ctx->hdl);
>>>    	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
>>>    	mutex_destroy(&ctx->vq_mutex);
>>> +	mutex_destroy(&ctx->maps_mutex);
>>>    	kfree(ctx);
>>>    
>>>    	return 0;
>>> @@ -1453,8 +1481,19 @@ static void dw100_start(struct dw100_ctx *ctx, struct vb2_v4l2_buffer *in_vb,
>>>    	dw100_hw_set_destination(dw_dev, &ctx->q_data[DW100_QUEUE_DST],
>>>    				 ctx->q_data[DW100_QUEUE_SRC].fmt,
>>>    				 &out_vb->vb2_buf);
>>> -	dw100_hw_set_mapping(dw_dev, ctx->map_dma,
>>> -			     ctx->map_width, ctx->map_height);
>>> +
>>> +
>>> +	mutex_lock(&ctx->maps_mutex);
>>> +	if (ctx->user_map_is_updated) {
>> The hardware register must unconditionally be updated while starting a
>> new context, as a v4l2 m2m supports multi context operations. Otherwise,
>> you may be running with the user mapping used by the previous context.
>>
>> Moreover, the hardware mapping will not be set in case you use the
>> driver as a simple scaler without user mapping, which causes the process
>> to hang as the run does not start and never completes.
>>
>>> +		unsigned int id = ctx->applied_map_id ? 0 : 1;
>>> +
>>> +		dw100_hw_set_mapping(dw_dev, ctx->maps[id].map_dma,
>>> +				     ctx->map_width, ctx->map_height);
>>> +		ctx->applied_map_id = id;
>>> +		ctx->user_map_is_updated = false;
>>> +	}
>>> +	mutex_unlock(&ctx->maps_mutex);
>>> +
>>>    	dw100_hw_enable_irq(dw_dev);
>>>    	dw100_hw_dewarp_start(dw_dev);
>>>    
>> It sounds as this patch requires a collaborative application for running
>> well. All my simple tests failed.
>>
>> You can test a simple scaler/pixfmt conversion operation with v4l2 utils:
>>
>>
>> v4l2-ctl \
>> -d 0 \
>> --set-fmt-video-out width=640,height=480,pixelformat=NV12,field=none \
>> --set-fmt-video width=640,height=480,pixelformat=NV21,field=none \
>> --stream-out-pattern 3 \
>> --set-selection-output\
>> target=crop,top=0,left=0,width=640,height=480,flags= \
>> --stream-count 100 \
>> --stream-mmap \
>> --stream-out-mmap

Umang Jain Oct. 28, 2024, 7:32 a.m. UTC | #5

Hi Xavier

On 27/10/24 1:22 am, Xavier Roumegue (OSS) wrote:
> Hi Umang,
>
> Thanks for the patch, this feature sounds promising.
>
> On 10/22/24 8:31 AM, Umang Jain wrote:
>> Currently, vertex maps cannot be updated dynamically while dw100
>> is streaming. This patch enables the support to update the vertex
>> map dynamically at runtime.
>>
>> To support this functionality, we need to allocate and track two
>> sets of DMA-allocated vertex maps, one for the currently applied map
>> and another for the updated (pending) map. Before the start of next
>> frame, if a new user-supplied vertex map is available, the hardware
>> mapping is changed to use new vertex map, thus enabling the user to
>> update the vertex map at runtime.
>>
>> We should ensure no race occurs when the vertex map is updated multiple
>> times when a frame is processing. Hence, vertex map is never updated to
>> the applied vertex map index in .s_ctrl(). It is always updated on the
>> pending vertex map slot, with `maps_mutex` lock held. `maps_mutex` lock
>> is also held when the pending vertex map is applied to the hardware in
>> dw100_start().
>>
>> Ability to update the vertex map at runtime, enables abritrary rotation
>> and digital zoom features for the input frames, through the dw100
>> hardware.
>>
>> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
>> ---
>>   drivers/media/platform/nxp/dw100/dw100.c | 73 ++++++++++++++++++------
>>   1 file changed, 56 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/media/platform/nxp/dw100/dw100.c 
>> b/drivers/media/platform/nxp/dw100/dw100.c
>> index 54ebf59df682..42712ccff754 100644
>> --- a/drivers/media/platform/nxp/dw100/dw100.c
>> +++ b/drivers/media/platform/nxp/dw100/dw100.c
>> @@ -83,6 +83,11 @@ struct dw100_q_data {
>>       struct v4l2_rect        crop;
>>   };
>>   +struct dw100_map {
>> +    unsigned int *map;
>> +    dma_addr_t map_dma;
>> +};
>> +
>>   struct dw100_ctx {
>>       struct v4l2_fh            fh;
>>       struct dw100_device        *dw_dev;
>> @@ -92,12 +97,14 @@ struct dw100_ctx {
>>       struct mutex            vq_mutex;
>>         /* Look Up Table for pixel remapping */
>> -    unsigned int            *map;
>> -    dma_addr_t            map_dma;
>> +    struct dw100_map        maps[2];
>> +    unsigned int            applied_map_id;
>>       size_t                map_size;
>>       unsigned int            map_width;
>>       unsigned int            map_height;
>>       bool                user_map_is_set;
>> +    bool                user_map_is_updated;
>> +    struct mutex            maps_mutex;
>>         /* Source and destination queue data */
>>       struct dw100_q_data        q_data[2];
>> @@ -308,24 +315,31 @@ static int dw100_create_mapping(struct 
>> dw100_ctx *ctx)
>>   {
>>       u32 *user_map;
>>   -    if (ctx->map)
>> -        dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>> -                  ctx->map, ctx->map_dma);
>> +    for (unsigned int i = 0; i < 2; i++) {
>> +        if (ctx->maps[i].map)
>> + dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>> +                      ctx->maps[i].map, ctx->maps[i].map_dma);
>>   -    ctx->map = dma_alloc_coherent(&ctx->dw_dev->pdev->dev, 
>> ctx->map_size,
>> -                      &ctx->map_dma, GFP_KERNEL);
>> +        ctx->maps[i].map = 
>> dma_alloc_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>> +                              &ctx->maps[i].map_dma, GFP_KERNEL);
>>   -    if (!ctx->map)
>> -        return -ENOMEM;
>> +        if (!ctx->maps[i].map)
>> +            return -ENOMEM;
>> +    }
>>         user_map = dw100_get_user_map(ctx);
>> -    memcpy(ctx->map, user_map, ctx->map_size);
>> +
>> +    mutex_lock(&ctx->maps_mutex);
>> +    ctx->applied_map_id = 0;
>> +    memcpy(ctx->maps[ctx->applied_map_id].map, user_map, 
>> ctx->map_size);
>> +    mutex_unlock(&ctx->maps_mutex);
>>         dev_dbg(&ctx->dw_dev->pdev->dev,
>>           "%ux%u %s mapping created (d:%pad-c:%p) for stream 
>> %ux%u->%ux%u\n",
>>           ctx->map_width, ctx->map_height,
>>           ctx->user_map_is_set ? "user" : "identity",
>> -        &ctx->map_dma, ctx->map,
>> +        &ctx->maps[ctx->applied_map_id].map_dma,
>> +        ctx->maps[ctx->applied_map_id].map,
>>           ctx->q_data[DW100_QUEUE_SRC].pix_fmt.width,
>>           ctx->q_data[DW100_QUEUE_DST].pix_fmt.height,
>>           ctx->q_data[DW100_QUEUE_SRC].pix_fmt.width,
>> @@ -336,10 +350,12 @@ static int dw100_create_mapping(struct 
>> dw100_ctx *ctx)
>>     static void dw100_destroy_mapping(struct dw100_ctx *ctx)
>>   {
>> -    if (ctx->map) {
>> -        dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>> -                  ctx->map, ctx->map_dma);
>> -        ctx->map = NULL;
>> +    for (unsigned int i = 0; i < 2; i++) {
>> +        if (ctx->maps[i].map)
>> + dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>> +                      ctx->maps[i].map, ctx->maps[i].map_dma);
>> +
>> +        ctx->maps[i].map = NULL;
>>       }
>>   }
>>   @@ -350,6 +366,15 @@ static int dw100_s_ctrl(struct v4l2_ctrl *ctrl)
>>         switch (ctrl->id) {
>>       case V4L2_CID_DW100_DEWARPING_16x16_VERTEX_MAP:
>> +        u32 *user_map = ctrl->p_new.p_u32;
> A warning to fix here.
>> +        unsigned int id;
>> +
>> +        mutex_lock(&ctx->maps_mutex);
>> +        id = ctx->applied_map_id ? 0 : 1;
>> +        memcpy(ctx->maps[id].map, user_map, ctx->map_size);
>> +        ctx->user_map_is_updated = true;
> If you call the control before to start the stream, the dma mapping is 
> not yet done(dw100_create_mapping not yet called). Hence, copying the 
> user map to a NULL pointer.
>> + mutex_unlock(&ctx->maps_mutex);
>> +
>>           ctx->user_map_is_set = true;
>>           break;
>>       }
>> @@ -655,6 +680,8 @@ static int dw100_open(struct file *file)
>>         v4l2_fh_add(&ctx->fh);
>>   +    mutex_init(&ctx->maps_mutex);
>> +
>>       return 0;
>>     err:
>> @@ -675,6 +702,7 @@ static int dw100_release(struct file *file)
>>       v4l2_ctrl_handler_free(&ctx->hdl);
>>       v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
>>       mutex_destroy(&ctx->vq_mutex);
>> +    mutex_destroy(&ctx->maps_mutex);
>>       kfree(ctx);
>>         return 0;
>> @@ -1453,8 +1481,19 @@ static void dw100_start(struct dw100_ctx *ctx, 
>> struct vb2_v4l2_buffer *in_vb,
>>       dw100_hw_set_destination(dw_dev, &ctx->q_data[DW100_QUEUE_DST],
>>                    ctx->q_data[DW100_QUEUE_SRC].fmt,
>>                    &out_vb->vb2_buf);
>> -    dw100_hw_set_mapping(dw_dev, ctx->map_dma,
>> -                 ctx->map_width, ctx->map_height);
>> +
>> +
>> +    mutex_lock(&ctx->maps_mutex);
>> +    if (ctx->user_map_is_updated) {
> The hardware register must unconditionally be updated while starting a 
> new context, as a v4l2 m2m supports multi context operations. 
> Otherwise, you may be running with the user mapping used by the 
> previous context.

Indeed, I missed to take into multi-context view point into account and 
indeed there can be some contexts which have not set the user map - 
user_map_is_updated=false.

>
> Moreover, the hardware mapping will not be set in case you use the 
> driver as a simple scaler without user mapping, which causes the 
> process to hang as the run does not start and never completes.

Hence, I am noticing hangs right now, so I would update this in v2.

>> +        unsigned int id = ctx->applied_map_id ? 0 : 1;
>> +
>> +        dw100_hw_set_mapping(dw_dev, ctx->maps[id].map_dma,
>> +                     ctx->map_width, ctx->map_height);
>> +        ctx->applied_map_id = id;
>> +        ctx->user_map_is_updated = false;
>> +    }
>> +    mutex_unlock(&ctx->maps_mutex);
>> +
>>       dw100_hw_enable_irq(dw_dev);
>>       dw100_hw_dewarp_start(dw_dev);
>
> It sounds as this patch requires a collaborative application for 
> running well. All my simple tests failed.

I think we have tested Zoom/Rotate with this patch running (i.e. user 
maps being updated) and seeing the effect visually.

>
> You can test a simple scaler/pixfmt conversion operation with v4l2 utils:

I'll check this as well before submitting for v2.
>
>
> v4l2-ctl \
> -d 0 \
> --set-fmt-video-out width=640,height=480,pixelformat=NV12,field=none \
> --set-fmt-video width=640,height=480,pixelformat=NV21,field=none \
> --stream-out-pattern 3 \
> --set-selection-output\ 
> target=crop,top=0,left=0,width=640,height=480,flags= \
> --stream-count 100 \
> --stream-mmap \
> --stream-out-mmap
>
> Xavier

Umang Jain Oct. 28, 2024, 2:17 p.m. UTC | #6

Hi Laurent

On 27/10/24 8:10 pm, Laurent Pinchart wrote:
> On Sat, Oct 26, 2024 at 09:52:43PM +0200, Xavier Roumegue wrote:
>> Hi Umang,
>>
>> Thanks for the patch, this feature sounds promising.
>>
>> On 10/22/24 8:31 AM, Umang Jain wrote:
>>> Currently, vertex maps cannot be updated dynamically while dw100
>>> is streaming. This patch enables the support to update the vertex
>>> map dynamically at runtime.
>>>
>>> To support this functionality, we need to allocate and track two
>>> sets of DMA-allocated vertex maps, one for the currently applied map
>>> and another for the updated (pending) map. Before the start of next
>>> frame, if a new user-supplied vertex map is available, the hardware
>>> mapping is changed to use new vertex map, thus enabling the user to
>>> update the vertex map at runtime.
> How do you synchronize the new map with the jobs ? That doesn't seem to
> be supported by the patch, is it a feature that you don't need ?
>
>>> We should ensure no race occurs when the vertex map is updated multiple
>>> times when a frame is processing. Hence, vertex map is never updated to
>>> the applied vertex map index in .s_ctrl(). It is always updated on the
>>> pending vertex map slot, with `maps_mutex` lock held. `maps_mutex` lock
>>> is also held when the pending vertex map is applied to the hardware in
>>> dw100_start().
>>>
>>> Ability to update the vertex map at runtime, enables abritrary rotation
> s/abritrary/arbitrary/
>
>>> and digital zoom features for the input frames, through the dw100
>>> hardware.
>>>
>>> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
>>> ---
>>>    drivers/media/platform/nxp/dw100/dw100.c | 73 ++++++++++++++++++------
>>>    1 file changed, 56 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/media/platform/nxp/dw100/dw100.c b/drivers/media/platform/nxp/dw100/dw100.c
>>> index 54ebf59df682..42712ccff754 100644
>>> --- a/drivers/media/platform/nxp/dw100/dw100.c
>>> +++ b/drivers/media/platform/nxp/dw100/dw100.c
>>> @@ -83,6 +83,11 @@ struct dw100_q_data {
>>>    	struct v4l2_rect		crop;
>>>    };
>>>    
>>> +struct dw100_map {
>>> +	unsigned int *map;
>>> +	dma_addr_t map_dma;
> I would have called the field just 'dma' as it's already qualified by
> the structure name or the field name in dw100_ctx.
>
>>> +};
>>> +
>>>    struct dw100_ctx {
>>>    	struct v4l2_fh			fh;
>>>    	struct dw100_device		*dw_dev;
>>> @@ -92,12 +97,14 @@ struct dw100_ctx {
>>>    	struct mutex			vq_mutex;
>>>    
>>>    	/* Look Up Table for pixel remapping */
>>> -	unsigned int			*map;
>>> -	dma_addr_t			map_dma;
>>> +	struct dw100_map		maps[2];
>>> +	unsigned int			applied_map_id;
>>>    	size_t				map_size;
>>>    	unsigned int			map_width;
>>>    	unsigned int			map_height;
>>>    	bool				user_map_is_set;
>>> +	bool				user_map_is_updated;
>>> +	struct mutex			maps_mutex;
>>>    
>>>    	/* Source and destination queue data */
>>>    	struct dw100_q_data		q_data[2];
>>> @@ -308,24 +315,31 @@ static int dw100_create_mapping(struct dw100_ctx *ctx)
>>>    {
>>>    	u32 *user_map;
>>>    
>>> -	if (ctx->map)
>>> -		dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>> -				  ctx->map, ctx->map_dma);
>>> +	for (unsigned int i = 0; i < 2; i++) {
> 	for (unsigned int i = 0; i < ARRAY_SIZE(ctx->maps); i++) {
> 		struct dw100_map *map = &ctx->maps[i];
>
> and use map below.
>
>
>>> +		if (ctx->maps[i].map)
>>> +			dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>> +					  ctx->maps[i].map, ctx->maps[i].map_dma);
>>>    
>>> -	ctx->map = dma_alloc_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>> -				      &ctx->map_dma, GFP_KERNEL);
>>> +		ctx->maps[i].map = dma_alloc_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>> +						      &ctx->maps[i].map_dma, GFP_KERNEL);
>>>    
>>> -	if (!ctx->map)
>>> -		return -ENOMEM;
>>> +		if (!ctx->maps[i].map)
>>> +			return -ENOMEM;
>>> +	}
>>>    
>>>    	user_map = dw100_get_user_map(ctx);
>>> -	memcpy(ctx->map, user_map, ctx->map_size);
>>> +
>>> +	mutex_lock(&ctx->maps_mutex);
>>> +	ctx->applied_map_id = 0;
>>> +	memcpy(ctx->maps[ctx->applied_map_id].map, user_map, ctx->map_size);
>>> +	mutex_unlock(&ctx->maps_mutex);
>>>    
>>>    	dev_dbg(&ctx->dw_dev->pdev->dev,
>>>    		"%ux%u %s mapping created (d:%pad-c:%p) for stream %ux%u->%ux%u\n",
>>>    		ctx->map_width, ctx->map_height,
>>>    		ctx->user_map_is_set ? "user" : "identity",
>>> -		&ctx->map_dma, ctx->map,
>>> +		&ctx->maps[ctx->applied_map_id].map_dma,
>>> +		ctx->maps[ctx->applied_map_id].map,
>>>    		ctx->q_data[DW100_QUEUE_SRC].pix_fmt.width,
>>>    		ctx->q_data[DW100_QUEUE_DST].pix_fmt.height,
>>>    		ctx->q_data[DW100_QUEUE_SRC].pix_fmt.width,
>>> @@ -336,10 +350,12 @@ static int dw100_create_mapping(struct dw100_ctx *ctx)
>>>    
>>>    static void dw100_destroy_mapping(struct dw100_ctx *ctx)
>>>    {
>>> -	if (ctx->map) {
>>> -		dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>> -				  ctx->map, ctx->map_dma);
>>> -		ctx->map = NULL;
>>> +	for (unsigned int i = 0; i < 2; i++) {
> 	for (unsigned int i = 0; i < ARRAY_SIZE(ctx->maps); i++) {
> 		struct dw100_map *map = &ctx->maps[i];
>
> and use map below.
>
>>> +		if (ctx->maps[i].map)
>>> +			dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>> +					  ctx->maps[i].map, ctx->maps[i].map_dma);
>>> +
>>> +		ctx->maps[i].map = NULL;
>>>    	}
>>>    }
>>>    
>>> @@ -350,6 +366,15 @@ static int dw100_s_ctrl(struct v4l2_ctrl *ctrl)
>>>    
>>>    	switch (ctrl->id) {
>>>    	case V4L2_CID_DW100_DEWARPING_16x16_VERTEX_MAP:
>>> +		u32 *user_map = ctrl->p_new.p_u32;
>> A warning to fix here.
>>
>>> +		unsigned int id;
>>> +
>>> +		mutex_lock(&ctx->maps_mutex);
>>> +		id = ctx->applied_map_id ? 0 : 1;
>>> +		memcpy(ctx->maps[id].map, user_map, ctx->map_size);
>>> +		ctx->user_map_is_updated = true;
>> If you call the control before to start the stream, the dma mapping is
>> not yet done(dw100_create_mapping not yet called). Hence, copying the
>> user map to a NULL pointer.
> The maps could be allocated in dw100_open() when creating the context.
> That would likely require moving the initialization of ctx->map_width,
> ctx->map_height and ctx->map_size as well. The handling of the identity
> map would probably need to be rewritten too.

The ctx->map_width, ctx->map_height and ctx->map_size would be updated 
on s_fmt().

I think we can solve the NULL pointer issue by allocating when creating 
the context (open()), however, it would require updating (re-allocation) 
again before the map can be memcpy()ed before streaming. Because the map 
dimensions would have changed.

See dw100_s_fmt()

...
dims[0] = dw100_get_n_vertices_from_length(q_data->pix_fmt.width);
dims[1] = dw100_get_n_vertices_from_length(q_data->pix_fmt.height);

ret = v4l2_ctrl_modify_dimensions(ctrl, dims);
```

I checked the v4l2_ctrl_modify_dimensions definition to check if it 
issues a call v4l2_ctrl_type_ops.init  (where the map dimensions are 
updated for dw100) and it does.

So, I think I will have to introduce allocations in dw100_open() so that 
NULL pointer issue doesn't occur and let the dma allocation get 
re-allocated with new dimensions just before stream start.

Also, we do not have to move the ctx->map_width, ctx->height abd 
ctx->map_size inititlisation, since they are already gets initialised to 
defaults, on the open() path when v4l2_ctrl_new_custom() is done.

>>> +		mutex_unlock(&ctx->maps_mutex);
>>> +
>>>    		ctx->user_map_is_set = true;
>>>    		break;
>>>    	}
>>> @@ -655,6 +680,8 @@ static int dw100_open(struct file *file)
>>>    
>>>    	v4l2_fh_add(&ctx->fh);
>>>    
>>> +	mutex_init(&ctx->maps_mutex);
>>> +
>>>    	return 0;
>>>    
>>>    err:
>>> @@ -675,6 +702,7 @@ static int dw100_release(struct file *file)
>>>    	v4l2_ctrl_handler_free(&ctx->hdl);
>>>    	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
>>>    	mutex_destroy(&ctx->vq_mutex);
>>> +	mutex_destroy(&ctx->maps_mutex);
>>>    	kfree(ctx);
>>>    
>>>    	return 0;
>>> @@ -1453,8 +1481,19 @@ static void dw100_start(struct dw100_ctx *ctx, struct vb2_v4l2_buffer *in_vb,
>>>    	dw100_hw_set_destination(dw_dev, &ctx->q_data[DW100_QUEUE_DST],
>>>    				 ctx->q_data[DW100_QUEUE_SRC].fmt,
>>>    				 &out_vb->vb2_buf);
>>> -	dw100_hw_set_mapping(dw_dev, ctx->map_dma,
>>> -			     ctx->map_width, ctx->map_height);
>>> +
>>> +
>>> +	mutex_lock(&ctx->maps_mutex);
>>> +	if (ctx->user_map_is_updated) {
>> The hardware register must unconditionally be updated while starting a
>> new context, as a v4l2 m2m supports multi context operations. Otherwise,
>> you may be running with the user mapping used by the previous context.
>>
>> Moreover, the hardware mapping will not be set in case you use the
>> driver as a simple scaler without user mapping, which causes the process
>> to hang as the run does not start and never completes.
>>
>>> +		unsigned int id = ctx->applied_map_id ? 0 : 1;
>>> +
>>> +		dw100_hw_set_mapping(dw_dev, ctx->maps[id].map_dma,
>>> +				     ctx->map_width, ctx->map_height);
>>> +		ctx->applied_map_id = id;
>>> +		ctx->user_map_is_updated = false;
>>> +	}
>>> +	mutex_unlock(&ctx->maps_mutex);
>>> +
>>>    	dw100_hw_enable_irq(dw_dev);
>>>    	dw100_hw_dewarp_start(dw_dev);
>>>    
>> It sounds as this patch requires a collaborative application for running
>> well. All my simple tests failed.
>>
>> You can test a simple scaler/pixfmt conversion operation with v4l2 utils:
>>
>>
>> v4l2-ctl \
>> -d 0 \
>> --set-fmt-video-out width=640,height=480,pixelformat=NV12,field=none \
>> --set-fmt-video width=640,height=480,pixelformat=NV21,field=none \
>> --stream-out-pattern 3 \
>> --set-selection-output\
>> target=crop,top=0,left=0,width=640,height=480,flags= \
>> --stream-count 100 \
>> --stream-mmap \
>> --stream-out-mmap

Laurent Pinchart Oct. 28, 2024, 2:32 p.m. UTC | #7

Hi Umang,

On Mon, Oct 28, 2024 at 12:57:52PM +0530, Umang Jain wrote:
> On 27/10/24 8:10 pm, Laurent Pinchart wrote:
> > On Sat, Oct 26, 2024 at 09:52:43PM +0200, Xavier Roumegue wrote:
> >> On 10/22/24 8:31 AM, Umang Jain wrote:
> >>> Currently, vertex maps cannot be updated dynamically while dw100
> >>> is streaming. This patch enables the support to update the vertex
> >>> map dynamically at runtime.
> >>>
> >>> To support this functionality, we need to allocate and track two
> >>> sets of DMA-allocated vertex maps, one for the currently applied map
> >>> and another for the updated (pending) map. Before the start of next
> >>> frame, if a new user-supplied vertex map is available, the hardware
> >>> mapping is changed to use new vertex map, thus enabling the user to
> >>> update the vertex map at runtime.
> >
> > How do you synchronize the new map with the jobs ? That doesn't seem to
> > be supported by the patch, is it a feature that you don't need ?
> 
> First question around running jobs - isn't the DW100 dewarper run with 1 
> src buffer and 1 dst buffer per context ? Do we need addiitonal 
> synchronisation there? I think same point was mentioned by you for 
> scaler crop setting but maybe I am missing something to see here.
> 
>  From the description of .job_ready():
> 
>   * @job_ready:  optional. Should return 0 if the driver does not have a job
>   *              fully prepared to run yet (i.e. it will not be able to finish a
>   *              transaction without sleeping). If not provided, it will be
>   *              assumed that one source and one destination buffer are all
>   *              that is required for the driver to perform one full transaction.
>   *              This method may not sleep.
> 
> And when the IRQ handler completes, it calls to v4l2_m2m_job_finish 
> which then schedules the next job through v4l2_m2m_schedule_next_job()
> 
> So I assumed that, .device_run() and ... in the IRQ handler (call 
> to v4l2_m2m_job_finish) will be counted as one job transaction - since 
> we are not working with multiple source and destination buffers for one 
> transaction.

Correct, but userspace can queue multiple jobs by queuing multiple
output and capture buffers, one of each per job. It doesn't have to wait
for the completion of a job before starting the next one.

> When the next job is started (i.e. dw100_start()) the vertex map index 
> which is updated, is programmed to the register. That's is why this 
> patch introduced two vertex maps for tracking - one which is applied and 
> other one which can be updated by user.

When multiple jobs are queued, setting the control will become
asynchronous relative to the jobs, you won't know to which job the new
map applies. This is due to the asynchronous nature of V4L2 controls.
The way to synchronize controls with jobs would be to use the request
API, which adds complexity. Another option is to avoid queuing multiple
jobs, which is a bit of a workaround that results in reduce throughput.

What I wonder, and would like to have your feedback on, is whether
synchronous operation is needed in some of the DW100 use cases (and in
particular in your use cases).

> >>> We should ensure no race occurs when the vertex map is updated multiple
> >>> times when a frame is processing. Hence, vertex map is never updated to
> >>> the applied vertex map index in .s_ctrl(). It is always updated on the
> >>> pending vertex map slot, with `maps_mutex` lock held. `maps_mutex` lock
> >>> is also held when the pending vertex map is applied to the hardware in
> >>> dw100_start().
> >>>
> >>> Ability to update the vertex map at runtime, enables abritrary rotation
> >
> > s/abritrary/arbitrary/
> >
> >>> and digital zoom features for the input frames, through the dw100
> >>> hardware.
> >>>
> >>> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
> >>> ---
> >>>    drivers/media/platform/nxp/dw100/dw100.c | 73 ++++++++++++++++++------
> >>>    1 file changed, 56 insertions(+), 17 deletions(-)
> >>>
> >>> diff --git a/drivers/media/platform/nxp/dw100/dw100.c b/drivers/media/platform/nxp/dw100/dw100.c
> >>> index 54ebf59df682..42712ccff754 100644
> >>> --- a/drivers/media/platform/nxp/dw100/dw100.c
> >>> +++ b/drivers/media/platform/nxp/dw100/dw100.c
> >>> @@ -83,6 +83,11 @@ struct dw100_q_data {
> >>>    	struct v4l2_rect		crop;
> >>>    };
> >>>    
> >>> +struct dw100_map {
> >>> +	unsigned int *map;
> >>> +	dma_addr_t map_dma;
> >
> > I would have called the field just 'dma' as it's already qualified by
> > the structure name or the field name in dw100_ctx.
> >
> >>> +};
> >>> +
> >>>    struct dw100_ctx {
> >>>    	struct v4l2_fh			fh;
> >>>    	struct dw100_device		*dw_dev;
> >>> @@ -92,12 +97,14 @@ struct dw100_ctx {
> >>>    	struct mutex			vq_mutex;
> >>>    
> >>>    	/* Look Up Table for pixel remapping */
> >>> -	unsigned int			*map;
> >>> -	dma_addr_t			map_dma;
> >>> +	struct dw100_map		maps[2];
> >>> +	unsigned int			applied_map_id;
> >>>    	size_t				map_size;
> >>>    	unsigned int			map_width;
> >>>    	unsigned int			map_height;
> >>>    	bool				user_map_is_set;
> >>> +	bool				user_map_is_updated;
> >>> +	struct mutex			maps_mutex;
> >>>    
> >>>    	/* Source and destination queue data */
> >>>    	struct dw100_q_data		q_data[2];
> >>> @@ -308,24 +315,31 @@ static int dw100_create_mapping(struct dw100_ctx *ctx)
> >>>    {
> >>>    	u32 *user_map;
> >>>    
> >>> -	if (ctx->map)
> >>> -		dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> >>> -				  ctx->map, ctx->map_dma);
> >>> +	for (unsigned int i = 0; i < 2; i++) {
> >
> > 	for (unsigned int i = 0; i < ARRAY_SIZE(ctx->maps); i++) {
> > 		struct dw100_map *map = &ctx->maps[i];
> >
> > and use map below.
> >
> >
> >>> +		if (ctx->maps[i].map)
> >>> +			dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> >>> +					  ctx->maps[i].map, ctx->maps[i].map_dma);
> >>>    
> >>> -	ctx->map = dma_alloc_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> >>> -				      &ctx->map_dma, GFP_KERNEL);
> >>> +		ctx->maps[i].map = dma_alloc_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> >>> +						      &ctx->maps[i].map_dma, GFP_KERNEL);
> >>>    
> >>> -	if (!ctx->map)
> >>> -		return -ENOMEM;
> >>> +		if (!ctx->maps[i].map)
> >>> +			return -ENOMEM;
> >>> +	}
> >>>    
> >>>    	user_map = dw100_get_user_map(ctx);
> >>> -	memcpy(ctx->map, user_map, ctx->map_size);
> >>> +
> >>> +	mutex_lock(&ctx->maps_mutex);
> >>> +	ctx->applied_map_id = 0;
> >>> +	memcpy(ctx->maps[ctx->applied_map_id].map, user_map, ctx->map_size);
> >>> +	mutex_unlock(&ctx->maps_mutex);
> >>>    
> >>>    	dev_dbg(&ctx->dw_dev->pdev->dev,
> >>>    		"%ux%u %s mapping created (d:%pad-c:%p) for stream %ux%u->%ux%u\n",
> >>>    		ctx->map_width, ctx->map_height,
> >>>    		ctx->user_map_is_set ? "user" : "identity",
> >>> -		&ctx->map_dma, ctx->map,
> >>> +		&ctx->maps[ctx->applied_map_id].map_dma,
> >>> +		ctx->maps[ctx->applied_map_id].map,
> >>>    		ctx->q_data[DW100_QUEUE_SRC].pix_fmt.width,
> >>>    		ctx->q_data[DW100_QUEUE_DST].pix_fmt.height,
> >>>    		ctx->q_data[DW100_QUEUE_SRC].pix_fmt.width,
> >>> @@ -336,10 +350,12 @@ static int dw100_create_mapping(struct dw100_ctx *ctx)
> >>>    
> >>>    static void dw100_destroy_mapping(struct dw100_ctx *ctx)
> >>>    {
> >>> -	if (ctx->map) {
> >>> -		dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> >>> -				  ctx->map, ctx->map_dma);
> >>> -		ctx->map = NULL;
> >>> +	for (unsigned int i = 0; i < 2; i++) {
> >
> > 	for (unsigned int i = 0; i < ARRAY_SIZE(ctx->maps); i++) {
> > 		struct dw100_map *map = &ctx->maps[i];
> >
> > and use map below.
> >
> >>> +		if (ctx->maps[i].map)
> >>> +			dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> >>> +					  ctx->maps[i].map, ctx->maps[i].map_dma);
> >>> +
> >>> +		ctx->maps[i].map = NULL;
> >>>    	}
> >>>    }
> >>>    
> >>> @@ -350,6 +366,15 @@ static int dw100_s_ctrl(struct v4l2_ctrl *ctrl)
> >>>    
> >>>    	switch (ctrl->id) {
> >>>    	case V4L2_CID_DW100_DEWARPING_16x16_VERTEX_MAP:
> >>> +		u32 *user_map = ctrl->p_new.p_u32;
> >>
> >> A warning to fix here.
> >>
> >>> +		unsigned int id;
> >>> +
> >>> +		mutex_lock(&ctx->maps_mutex);
> >>> +		id = ctx->applied_map_id ? 0 : 1;
> >>> +		memcpy(ctx->maps[id].map, user_map, ctx->map_size);
> >>> +		ctx->user_map_is_updated = true;
> >>
> >> If you call the control before to start the stream, the dma mapping is
> >> not yet done(dw100_create_mapping not yet called). Hence, copying the
> >> user map to a NULL pointer.
> >
> > The maps could be allocated in dw100_open() when creating the context.
> > That would likely require moving the initialization of ctx->map_width,
> > ctx->map_height and ctx->map_size as well. The handling of the identity
> > map would probably need to be rewritten too.
> >
> >>> +		mutex_unlock(&ctx->maps_mutex);
> >>> +
> >>>    		ctx->user_map_is_set = true;
> >>>    		break;
> >>>    	}
> >>> @@ -655,6 +680,8 @@ static int dw100_open(struct file *file)
> >>>    
> >>>    	v4l2_fh_add(&ctx->fh);
> >>>    
> >>> +	mutex_init(&ctx->maps_mutex);
> >>> +
> >>>    	return 0;
> >>>    
> >>>    err:
> >>> @@ -675,6 +702,7 @@ static int dw100_release(struct file *file)
> >>>    	v4l2_ctrl_handler_free(&ctx->hdl);
> >>>    	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> >>>    	mutex_destroy(&ctx->vq_mutex);
> >>> +	mutex_destroy(&ctx->maps_mutex);
> >>>    	kfree(ctx);
> >>>    
> >>>    	return 0;
> >>> @@ -1453,8 +1481,19 @@ static void dw100_start(struct dw100_ctx *ctx, struct vb2_v4l2_buffer *in_vb,
> >>>    	dw100_hw_set_destination(dw_dev, &ctx->q_data[DW100_QUEUE_DST],
> >>>    				 ctx->q_data[DW100_QUEUE_SRC].fmt,
> >>>    				 &out_vb->vb2_buf);
> >>> -	dw100_hw_set_mapping(dw_dev, ctx->map_dma,
> >>> -			     ctx->map_width, ctx->map_height);
> >>> +
> >>> +
> >>> +	mutex_lock(&ctx->maps_mutex);
> >>> +	if (ctx->user_map_is_updated) {
> >>
> >> The hardware register must unconditionally be updated while starting a
> >> new context, as a v4l2 m2m supports multi context operations. Otherwise,
> >> you may be running with the user mapping used by the previous context.
> >>
> >> Moreover, the hardware mapping will not be set in case you use the
> >> driver as a simple scaler without user mapping, which causes the process
> >> to hang as the run does not start and never completes.
> >>
> >>> +		unsigned int id = ctx->applied_map_id ? 0 : 1;
> >>> +
> >>> +		dw100_hw_set_mapping(dw_dev, ctx->maps[id].map_dma,
> >>> +				     ctx->map_width, ctx->map_height);
> >>> +		ctx->applied_map_id = id;
> >>> +		ctx->user_map_is_updated = false;
> >>> +	}
> >>> +	mutex_unlock(&ctx->maps_mutex);
> >>> +
> >>>    	dw100_hw_enable_irq(dw_dev);
> >>>    	dw100_hw_dewarp_start(dw_dev);
> >>>    
> >> It sounds as this patch requires a collaborative application for running
> >> well. All my simple tests failed.
> >>
> >> You can test a simple scaler/pixfmt conversion operation with v4l2 utils:
> >>
> >>
> >> v4l2-ctl \
> >> -d 0 \
> >> --set-fmt-video-out width=640,height=480,pixelformat=NV12,field=none \
> >> --set-fmt-video width=640,height=480,pixelformat=NV21,field=none \
> >> --stream-out-pattern 3 \
> >> --set-selection-output\
> >> target=crop,top=0,left=0,width=640,height=480,flags= \
> >> --stream-count 100 \
> >> --stream-mmap \
> >> --stream-out-mmap

Laurent Pinchart Oct. 28, 2024, 2:41 p.m. UTC | #8

Hi Umang,

On Mon, Oct 28, 2024 at 07:47:46PM +0530, Umang Jain wrote:
> On 27/10/24 8:10 pm, Laurent Pinchart wrote:
> > On Sat, Oct 26, 2024 at 09:52:43PM +0200, Xavier Roumegue wrote:
> >> On 10/22/24 8:31 AM, Umang Jain wrote:
> >>> Currently, vertex maps cannot be updated dynamically while dw100
> >>> is streaming. This patch enables the support to update the vertex
> >>> map dynamically at runtime.
> >>>
> >>> To support this functionality, we need to allocate and track two
> >>> sets of DMA-allocated vertex maps, one for the currently applied map
> >>> and another for the updated (pending) map. Before the start of next
> >>> frame, if a new user-supplied vertex map is available, the hardware
> >>> mapping is changed to use new vertex map, thus enabling the user to
> >>> update the vertex map at runtime.
> >
> > How do you synchronize the new map with the jobs ? That doesn't seem to
> > be supported by the patch, is it a feature that you don't need ?
> >
> >>> We should ensure no race occurs when the vertex map is updated multiple
> >>> times when a frame is processing. Hence, vertex map is never updated to
> >>> the applied vertex map index in .s_ctrl(). It is always updated on the
> >>> pending vertex map slot, with `maps_mutex` lock held. `maps_mutex` lock
> >>> is also held when the pending vertex map is applied to the hardware in
> >>> dw100_start().
> >>>
> >>> Ability to update the vertex map at runtime, enables abritrary rotation
> >
> > s/abritrary/arbitrary/
> >
> >>> and digital zoom features for the input frames, through the dw100
> >>> hardware.
> >>>
> >>> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
> >>> ---
> >>>    drivers/media/platform/nxp/dw100/dw100.c | 73 ++++++++++++++++++------
> >>>    1 file changed, 56 insertions(+), 17 deletions(-)
> >>>
> >>> diff --git a/drivers/media/platform/nxp/dw100/dw100.c b/drivers/media/platform/nxp/dw100/dw100.c
> >>> index 54ebf59df682..42712ccff754 100644
> >>> --- a/drivers/media/platform/nxp/dw100/dw100.c
> >>> +++ b/drivers/media/platform/nxp/dw100/dw100.c
> >>> @@ -83,6 +83,11 @@ struct dw100_q_data {
> >>>    	struct v4l2_rect		crop;
> >>>    };
> >>>    
> >>> +struct dw100_map {
> >>> +	unsigned int *map;
> >>> +	dma_addr_t map_dma;
> >
> > I would have called the field just 'dma' as it's already qualified by
> > the structure name or the field name in dw100_ctx.
> >
> >>> +};
> >>> +
> >>>    struct dw100_ctx {
> >>>    	struct v4l2_fh			fh;
> >>>    	struct dw100_device		*dw_dev;
> >>> @@ -92,12 +97,14 @@ struct dw100_ctx {
> >>>    	struct mutex			vq_mutex;
> >>>    
> >>>    	/* Look Up Table for pixel remapping */
> >>> -	unsigned int			*map;
> >>> -	dma_addr_t			map_dma;
> >>> +	struct dw100_map		maps[2];
> >>> +	unsigned int			applied_map_id;
> >>>    	size_t				map_size;
> >>>    	unsigned int			map_width;
> >>>    	unsigned int			map_height;
> >>>    	bool				user_map_is_set;
> >>> +	bool				user_map_is_updated;
> >>> +	struct mutex			maps_mutex;
> >>>    
> >>>    	/* Source and destination queue data */
> >>>    	struct dw100_q_data		q_data[2];
> >>> @@ -308,24 +315,31 @@ static int dw100_create_mapping(struct dw100_ctx *ctx)
> >>>    {
> >>>    	u32 *user_map;
> >>>    
> >>> -	if (ctx->map)
> >>> -		dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> >>> -				  ctx->map, ctx->map_dma);
> >>> +	for (unsigned int i = 0; i < 2; i++) {
> >
> > 	for (unsigned int i = 0; i < ARRAY_SIZE(ctx->maps); i++) {
> > 		struct dw100_map *map = &ctx->maps[i];
> >
> > and use map below.
> >
> >
> >>> +		if (ctx->maps[i].map)
> >>> +			dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> >>> +					  ctx->maps[i].map, ctx->maps[i].map_dma);
> >>>    
> >>> -	ctx->map = dma_alloc_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> >>> -				      &ctx->map_dma, GFP_KERNEL);
> >>> +		ctx->maps[i].map = dma_alloc_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> >>> +						      &ctx->maps[i].map_dma, GFP_KERNEL);
> >>>    
> >>> -	if (!ctx->map)
> >>> -		return -ENOMEM;
> >>> +		if (!ctx->maps[i].map)
> >>> +			return -ENOMEM;
> >>> +	}
> >>>    
> >>>    	user_map = dw100_get_user_map(ctx);
> >>> -	memcpy(ctx->map, user_map, ctx->map_size);
> >>> +
> >>> +	mutex_lock(&ctx->maps_mutex);
> >>> +	ctx->applied_map_id = 0;
> >>> +	memcpy(ctx->maps[ctx->applied_map_id].map, user_map, ctx->map_size);
> >>> +	mutex_unlock(&ctx->maps_mutex);
> >>>    
> >>>    	dev_dbg(&ctx->dw_dev->pdev->dev,
> >>>    		"%ux%u %s mapping created (d:%pad-c:%p) for stream %ux%u->%ux%u\n",
> >>>    		ctx->map_width, ctx->map_height,
> >>>    		ctx->user_map_is_set ? "user" : "identity",
> >>> -		&ctx->map_dma, ctx->map,
> >>> +		&ctx->maps[ctx->applied_map_id].map_dma,
> >>> +		ctx->maps[ctx->applied_map_id].map,
> >>>    		ctx->q_data[DW100_QUEUE_SRC].pix_fmt.width,
> >>>    		ctx->q_data[DW100_QUEUE_DST].pix_fmt.height,
> >>>    		ctx->q_data[DW100_QUEUE_SRC].pix_fmt.width,
> >>> @@ -336,10 +350,12 @@ static int dw100_create_mapping(struct dw100_ctx *ctx)
> >>>    
> >>>    static void dw100_destroy_mapping(struct dw100_ctx *ctx)
> >>>    {
> >>> -	if (ctx->map) {
> >>> -		dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> >>> -				  ctx->map, ctx->map_dma);
> >>> -		ctx->map = NULL;
> >>> +	for (unsigned int i = 0; i < 2; i++) {
> >
> > 	for (unsigned int i = 0; i < ARRAY_SIZE(ctx->maps); i++) {
> > 		struct dw100_map *map = &ctx->maps[i];
> >
> > and use map below.
> >
> >>> +		if (ctx->maps[i].map)
> >>> +			dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> >>> +					  ctx->maps[i].map, ctx->maps[i].map_dma);
> >>> +
> >>> +		ctx->maps[i].map = NULL;
> >>>    	}
> >>>    }
> >>>    
> >>> @@ -350,6 +366,15 @@ static int dw100_s_ctrl(struct v4l2_ctrl *ctrl)
> >>>    
> >>>    	switch (ctrl->id) {
> >>>    	case V4L2_CID_DW100_DEWARPING_16x16_VERTEX_MAP:
> >>> +		u32 *user_map = ctrl->p_new.p_u32;
> >>
> >> A warning to fix here.
> >>
> >>> +		unsigned int id;
> >>> +
> >>> +		mutex_lock(&ctx->maps_mutex);
> >>> +		id = ctx->applied_map_id ? 0 : 1;
> >>> +		memcpy(ctx->maps[id].map, user_map, ctx->map_size);
> >>> +		ctx->user_map_is_updated = true;
> >>
> >> If you call the control before to start the stream, the dma mapping is
> >> not yet done(dw100_create_mapping not yet called). Hence, copying the
> >> user map to a NULL pointer.
> >
> > The maps could be allocated in dw100_open() when creating the context.
> > That would likely require moving the initialization of ctx->map_width,
> > ctx->map_height and ctx->map_size as well. The handling of the identity
> > map would probably need to be rewritten too.
> 
> The ctx->map_width, ctx->map_height and ctx->map_size would be updated 
> on s_fmt().

I saw that ctx->map_width, ctx->map_height and ctx->map_size are set in
dw100_ctrl_dewarping_map_init(), with

	mw = ctrl->dims[0];
	mh = ctrl->dims[1];

	[...]

	ctx->map_width = mw;
	ctx->map_height = mh;
	ctx->map_size = mh * mw * sizeof(u32);

but overlooked the fact that the dimensions are set in dw100_s_fmt().

> I think we can solve the NULL pointer issue by allocating when creating 
> the context (open()), however, it would require updating (re-allocation) 
> again before the map can be memcpy()ed before streaming. Because the map 
> dimensions would have changed.

If the map dimensions change, that invalidates the map contents set by
userspace. This is currently handled by dw100_ctrl_dewarping_map_init().
You won't be able to just memcpy() the previous control to the new one.

> See dw100_s_fmt()
> 
> ...
> dims[0] = dw100_get_n_vertices_from_length(q_data->pix_fmt.width);
> dims[1] = dw100_get_n_vertices_from_length(q_data->pix_fmt.height);
> 
> ret = v4l2_ctrl_modify_dimensions(ctrl, dims);
> ```
> 
> I checked the v4l2_ctrl_modify_dimensions definition to check if it 
> issues a call v4l2_ctrl_type_ops.init  (where the map dimensions are 
> updated for dw100) and it does.
> 
> So, I think I will have to introduce allocations in dw100_open() so that 
> NULL pointer issue doesn't occur and let the dma allocation get 
> re-allocated with new dimensions just before stream start.

That seems a bit pointless, if the map will be invalidated by a call to
VIDIOC_S_FMT anyway. The only case where it would be useful is if
userspace sets the control before starting streaming and doesn't call
VIDIOC_S_FMT.

I'm increasingly thinking the driver should use the request API to
synchronize the control with the jobs.

> Also, we do not have to move the ctx->map_width, ctx->height abd 
> ctx->map_size inititlisation, since they are already gets initialised to 
> defaults, on the open() path when v4l2_ctrl_new_custom() is done.
> 
> >>> +		mutex_unlock(&ctx->maps_mutex);
> >>> +
> >>>    		ctx->user_map_is_set = true;
> >>>    		break;
> >>>    	}
> >>> @@ -655,6 +680,8 @@ static int dw100_open(struct file *file)
> >>>    
> >>>    	v4l2_fh_add(&ctx->fh);
> >>>    
> >>> +	mutex_init(&ctx->maps_mutex);
> >>> +
> >>>    	return 0;
> >>>    
> >>>    err:
> >>> @@ -675,6 +702,7 @@ static int dw100_release(struct file *file)
> >>>    	v4l2_ctrl_handler_free(&ctx->hdl);
> >>>    	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> >>>    	mutex_destroy(&ctx->vq_mutex);
> >>> +	mutex_destroy(&ctx->maps_mutex);
> >>>    	kfree(ctx);
> >>>    
> >>>    	return 0;
> >>> @@ -1453,8 +1481,19 @@ static void dw100_start(struct dw100_ctx *ctx, struct vb2_v4l2_buffer *in_vb,
> >>>    	dw100_hw_set_destination(dw_dev, &ctx->q_data[DW100_QUEUE_DST],
> >>>    				 ctx->q_data[DW100_QUEUE_SRC].fmt,
> >>>    				 &out_vb->vb2_buf);
> >>> -	dw100_hw_set_mapping(dw_dev, ctx->map_dma,
> >>> -			     ctx->map_width, ctx->map_height);
> >>> +
> >>> +
> >>> +	mutex_lock(&ctx->maps_mutex);
> >>> +	if (ctx->user_map_is_updated) {
> >>
> >> The hardware register must unconditionally be updated while starting a
> >> new context, as a v4l2 m2m supports multi context operations. Otherwise,
> >> you may be running with the user mapping used by the previous context.
> >>
> >> Moreover, the hardware mapping will not be set in case you use the
> >> driver as a simple scaler without user mapping, which causes the process
> >> to hang as the run does not start and never completes.
> >>
> >>> +		unsigned int id = ctx->applied_map_id ? 0 : 1;
> >>> +
> >>> +		dw100_hw_set_mapping(dw_dev, ctx->maps[id].map_dma,
> >>> +				     ctx->map_width, ctx->map_height);
> >>> +		ctx->applied_map_id = id;
> >>> +		ctx->user_map_is_updated = false;
> >>> +	}
> >>> +	mutex_unlock(&ctx->maps_mutex);
> >>> +
> >>>    	dw100_hw_enable_irq(dw_dev);
> >>>    	dw100_hw_dewarp_start(dw_dev);
> >>>    
> >>
> >> It sounds as this patch requires a collaborative application for running
> >> well. All my simple tests failed.
> >>
> >> You can test a simple scaler/pixfmt conversion operation with v4l2 utils:
> >>
> >>
> >> v4l2-ctl \
> >> -d 0 \
> >> --set-fmt-video-out width=640,height=480,pixelformat=NV12,field=none \
> >> --set-fmt-video width=640,height=480,pixelformat=NV21,field=none \
> >> --stream-out-pattern 3 \
> >> --set-selection-output\
> >> target=crop,top=0,left=0,width=640,height=480,flags= \
> >> --stream-count 100 \
> >> --stream-mmap \
> >> --stream-out-mmap

Umang Jain Oct. 28, 2024, 2:58 p.m. UTC | #9

Hi Laurent,

On 28/10/24 8:02 pm, Laurent Pinchart wrote:
> Hi Umang,
>
> On Mon, Oct 28, 2024 at 12:57:52PM +0530, Umang Jain wrote:
>> On 27/10/24 8:10 pm, Laurent Pinchart wrote:
>>> On Sat, Oct 26, 2024 at 09:52:43PM +0200, Xavier Roumegue wrote:
>>>> On 10/22/24 8:31 AM, Umang Jain wrote:
>>>>> Currently, vertex maps cannot be updated dynamically while dw100
>>>>> is streaming. This patch enables the support to update the vertex
>>>>> map dynamically at runtime.
>>>>>
>>>>> To support this functionality, we need to allocate and track two
>>>>> sets of DMA-allocated vertex maps, one for the currently applied map
>>>>> and another for the updated (pending) map. Before the start of next
>>>>> frame, if a new user-supplied vertex map is available, the hardware
>>>>> mapping is changed to use new vertex map, thus enabling the user to
>>>>> update the vertex map at runtime.
>>> How do you synchronize the new map with the jobs ? That doesn't seem to
>>> be supported by the patch, is it a feature that you don't need ?
>> First question around running jobs - isn't the DW100 dewarper run with 1
>> src buffer and 1 dst buffer per context ? Do we need addiitonal
>> synchronisation there? I think same point was mentioned by you for
>> scaler crop setting but maybe I am missing something to see here.
>>
>>   From the description of .job_ready():
>>
>>    * @job_ready:  optional. Should return 0 if the driver does not have a job
>>    *              fully prepared to run yet (i.e. it will not be able to finish a
>>    *              transaction without sleeping). If not provided, it will be
>>    *              assumed that one source and one destination buffer are all
>>    *              that is required for the driver to perform one full transaction.
>>    *              This method may not sleep.
>>
>> And when the IRQ handler completes, it calls to v4l2_m2m_job_finish
>> which then schedules the next job through v4l2_m2m_schedule_next_job()
>>
>> So I assumed that, .device_run() and ... in the IRQ handler (call
>> to v4l2_m2m_job_finish) will be counted as one job transaction - since
>> we are not working with multiple source and destination buffers for one
>> transaction.
> Correct, but userspace can queue multiple jobs by queuing multiple
> output and capture buffers, one of each per job. It doesn't have to wait
> for the completion of a job before starting the next one.

Indeed, it seems to be possible - nothing is preventing the driver to 
accept multiple jobs

>
>> When the next job is started (i.e. dw100_start()) the vertex map index
>> which is updated, is programmed to the register. That's is why this
>> patch introduced two vertex maps for tracking - one which is applied and
>> other one which can be updated by user.
> When multiple jobs are queued, setting the control will become
> asynchronous relative to the jobs, you won't know to which job the new
> map applies. This is due to the asynchronous nature of V4L2 controls.
> The way to synchronize controls with jobs would be to use the request
> API, which adds complexity. Another option is to avoid queuing multiple
> jobs, which is a bit of a workaround that results in reduce throughput.
>
> What I wonder, and would like to have your feedback on, is whether
> synchronous operation is needed in some of the DW100 use cases (and in
> particular in your use cases).

We are probably operating from the view point that controls are applied 
for the next job that the the DW100 processes. But indeed, that's not 
the case due to the async V4L2 controls, as you said.

 From per-frame metadata reporting purposes - I think we will need to 
have synchronizing mechanism that the driver undertakes, to actually 
match that view point.

>
>>>>> We should ensure no race occurs when the vertex map is updated multiple
>>>>> times when a frame is processing. Hence, vertex map is never updated to
>>>>> the applied vertex map index in .s_ctrl(). It is always updated on the
>>>>> pending vertex map slot, with `maps_mutex` lock held. `maps_mutex` lock
>>>>> is also held when the pending vertex map is applied to the hardware in
>>>>> dw100_start().
>>>>>
>>>>> Ability to update the vertex map at runtime, enables abritrary rotation
>>> s/abritrary/arbitrary/
>>>
>>>>> and digital zoom features for the input frames, through the dw100
>>>>> hardware.
>>>>>
>>>>> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
>>>>> ---
>>>>>     drivers/media/platform/nxp/dw100/dw100.c | 73 ++++++++++++++++++------
>>>>>     1 file changed, 56 insertions(+), 17 deletions(-)
>>>>>
>>>>> diff --git a/drivers/media/platform/nxp/dw100/dw100.c b/drivers/media/platform/nxp/dw100/dw100.c
>>>>> index 54ebf59df682..42712ccff754 100644
>>>>> --- a/drivers/media/platform/nxp/dw100/dw100.c
>>>>> +++ b/drivers/media/platform/nxp/dw100/dw100.c
>>>>> @@ -83,6 +83,11 @@ struct dw100_q_data {
>>>>>     	struct v4l2_rect		crop;
>>>>>     };
>>>>>     
>>>>> +struct dw100_map {
>>>>> +	unsigned int *map;
>>>>> +	dma_addr_t map_dma;
>>> I would have called the field just 'dma' as it's already qualified by
>>> the structure name or the field name in dw100_ctx.
>>>
>>>>> +};
>>>>> +
>>>>>     struct dw100_ctx {
>>>>>     	struct v4l2_fh			fh;
>>>>>     	struct dw100_device		*dw_dev;
>>>>> @@ -92,12 +97,14 @@ struct dw100_ctx {
>>>>>     	struct mutex			vq_mutex;
>>>>>     
>>>>>     	/* Look Up Table for pixel remapping */
>>>>> -	unsigned int			*map;
>>>>> -	dma_addr_t			map_dma;
>>>>> +	struct dw100_map		maps[2];
>>>>> +	unsigned int			applied_map_id;
>>>>>     	size_t				map_size;
>>>>>     	unsigned int			map_width;
>>>>>     	unsigned int			map_height;
>>>>>     	bool				user_map_is_set;
>>>>> +	bool				user_map_is_updated;
>>>>> +	struct mutex			maps_mutex;
>>>>>     
>>>>>     	/* Source and destination queue data */
>>>>>     	struct dw100_q_data		q_data[2];
>>>>> @@ -308,24 +315,31 @@ static int dw100_create_mapping(struct dw100_ctx *ctx)
>>>>>     {
>>>>>     	u32 *user_map;
>>>>>     
>>>>> -	if (ctx->map)
>>>>> -		dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>>>> -				  ctx->map, ctx->map_dma);
>>>>> +	for (unsigned int i = 0; i < 2; i++) {
>>> 	for (unsigned int i = 0; i < ARRAY_SIZE(ctx->maps); i++) {
>>> 		struct dw100_map *map = &ctx->maps[i];
>>>
>>> and use map below.
>>>
>>>
>>>>> +		if (ctx->maps[i].map)
>>>>> +			dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>>>> +					  ctx->maps[i].map, ctx->maps[i].map_dma);
>>>>>     
>>>>> -	ctx->map = dma_alloc_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>>>> -				      &ctx->map_dma, GFP_KERNEL);
>>>>> +		ctx->maps[i].map = dma_alloc_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>>>> +						      &ctx->maps[i].map_dma, GFP_KERNEL);
>>>>>     
>>>>> -	if (!ctx->map)
>>>>> -		return -ENOMEM;
>>>>> +		if (!ctx->maps[i].map)
>>>>> +			return -ENOMEM;
>>>>> +	}
>>>>>     
>>>>>     	user_map = dw100_get_user_map(ctx);
>>>>> -	memcpy(ctx->map, user_map, ctx->map_size);
>>>>> +
>>>>> +	mutex_lock(&ctx->maps_mutex);
>>>>> +	ctx->applied_map_id = 0;
>>>>> +	memcpy(ctx->maps[ctx->applied_map_id].map, user_map, ctx->map_size);
>>>>> +	mutex_unlock(&ctx->maps_mutex);
>>>>>     
>>>>>     	dev_dbg(&ctx->dw_dev->pdev->dev,
>>>>>     		"%ux%u %s mapping created (d:%pad-c:%p) for stream %ux%u->%ux%u\n",
>>>>>     		ctx->map_width, ctx->map_height,
>>>>>     		ctx->user_map_is_set ? "user" : "identity",
>>>>> -		&ctx->map_dma, ctx->map,
>>>>> +		&ctx->maps[ctx->applied_map_id].map_dma,
>>>>> +		ctx->maps[ctx->applied_map_id].map,
>>>>>     		ctx->q_data[DW100_QUEUE_SRC].pix_fmt.width,
>>>>>     		ctx->q_data[DW100_QUEUE_DST].pix_fmt.height,
>>>>>     		ctx->q_data[DW100_QUEUE_SRC].pix_fmt.width,
>>>>> @@ -336,10 +350,12 @@ static int dw100_create_mapping(struct dw100_ctx *ctx)
>>>>>     
>>>>>     static void dw100_destroy_mapping(struct dw100_ctx *ctx)
>>>>>     {
>>>>> -	if (ctx->map) {
>>>>> -		dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>>>> -				  ctx->map, ctx->map_dma);
>>>>> -		ctx->map = NULL;
>>>>> +	for (unsigned int i = 0; i < 2; i++) {
>>> 	for (unsigned int i = 0; i < ARRAY_SIZE(ctx->maps); i++) {
>>> 		struct dw100_map *map = &ctx->maps[i];
>>>
>>> and use map below.
>>>
>>>>> +		if (ctx->maps[i].map)
>>>>> +			dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>>>> +					  ctx->maps[i].map, ctx->maps[i].map_dma);
>>>>> +
>>>>> +		ctx->maps[i].map = NULL;
>>>>>     	}
>>>>>     }
>>>>>     
>>>>> @@ -350,6 +366,15 @@ static int dw100_s_ctrl(struct v4l2_ctrl *ctrl)
>>>>>     
>>>>>     	switch (ctrl->id) {
>>>>>     	case V4L2_CID_DW100_DEWARPING_16x16_VERTEX_MAP:
>>>>> +		u32 *user_map = ctrl->p_new.p_u32;
>>>> A warning to fix here.
>>>>
>>>>> +		unsigned int id;
>>>>> +
>>>>> +		mutex_lock(&ctx->maps_mutex);
>>>>> +		id = ctx->applied_map_id ? 0 : 1;
>>>>> +		memcpy(ctx->maps[id].map, user_map, ctx->map_size);
>>>>> +		ctx->user_map_is_updated = true;
>>>> If you call the control before to start the stream, the dma mapping is
>>>> not yet done(dw100_create_mapping not yet called). Hence, copying the
>>>> user map to a NULL pointer.
>>> The maps could be allocated in dw100_open() when creating the context.
>>> That would likely require moving the initialization of ctx->map_width,
>>> ctx->map_height and ctx->map_size as well. The handling of the identity
>>> map would probably need to be rewritten too.
>>>
>>>>> +		mutex_unlock(&ctx->maps_mutex);
>>>>> +
>>>>>     		ctx->user_map_is_set = true;
>>>>>     		break;
>>>>>     	}
>>>>> @@ -655,6 +680,8 @@ static int dw100_open(struct file *file)
>>>>>     
>>>>>     	v4l2_fh_add(&ctx->fh);
>>>>>     
>>>>> +	mutex_init(&ctx->maps_mutex);
>>>>> +
>>>>>     	return 0;
>>>>>     
>>>>>     err:
>>>>> @@ -675,6 +702,7 @@ static int dw100_release(struct file *file)
>>>>>     	v4l2_ctrl_handler_free(&ctx->hdl);
>>>>>     	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
>>>>>     	mutex_destroy(&ctx->vq_mutex);
>>>>> +	mutex_destroy(&ctx->maps_mutex);
>>>>>     	kfree(ctx);
>>>>>     
>>>>>     	return 0;
>>>>> @@ -1453,8 +1481,19 @@ static void dw100_start(struct dw100_ctx *ctx, struct vb2_v4l2_buffer *in_vb,
>>>>>     	dw100_hw_set_destination(dw_dev, &ctx->q_data[DW100_QUEUE_DST],
>>>>>     				 ctx->q_data[DW100_QUEUE_SRC].fmt,
>>>>>     				 &out_vb->vb2_buf);
>>>>> -	dw100_hw_set_mapping(dw_dev, ctx->map_dma,
>>>>> -			     ctx->map_width, ctx->map_height);
>>>>> +
>>>>> +
>>>>> +	mutex_lock(&ctx->maps_mutex);
>>>>> +	if (ctx->user_map_is_updated) {
>>>> The hardware register must unconditionally be updated while starting a
>>>> new context, as a v4l2 m2m supports multi context operations. Otherwise,
>>>> you may be running with the user mapping used by the previous context.
>>>>
>>>> Moreover, the hardware mapping will not be set in case you use the
>>>> driver as a simple scaler without user mapping, which causes the process
>>>> to hang as the run does not start and never completes.
>>>>
>>>>> +		unsigned int id = ctx->applied_map_id ? 0 : 1;
>>>>> +
>>>>> +		dw100_hw_set_mapping(dw_dev, ctx->maps[id].map_dma,
>>>>> +				     ctx->map_width, ctx->map_height);
>>>>> +		ctx->applied_map_id = id;
>>>>> +		ctx->user_map_is_updated = false;
>>>>> +	}
>>>>> +	mutex_unlock(&ctx->maps_mutex);
>>>>> +
>>>>>     	dw100_hw_enable_irq(dw_dev);
>>>>>     	dw100_hw_dewarp_start(dw_dev);
>>>>>     
>>>> It sounds as this patch requires a collaborative application for running
>>>> well. All my simple tests failed.
>>>>
>>>> You can test a simple scaler/pixfmt conversion operation with v4l2 utils:
>>>>
>>>>
>>>> v4l2-ctl \
>>>> -d 0 \
>>>> --set-fmt-video-out width=640,height=480,pixelformat=NV12,field=none \
>>>> --set-fmt-video width=640,height=480,pixelformat=NV21,field=none \
>>>> --stream-out-pattern 3 \
>>>> --set-selection-output\
>>>> target=crop,top=0,left=0,width=640,height=480,flags= \
>>>> --stream-count 100 \
>>>> --stream-mmap \
>>>> --stream-out-mmap

Umang Jain Oct. 28, 2024, 3:02 p.m. UTC | #10

Hi Laurent,

On 28/10/24 8:11 pm, Laurent Pinchart wrote:
> Hi Umang,
>
> On Mon, Oct 28, 2024 at 07:47:46PM +0530, Umang Jain wrote:
>> On 27/10/24 8:10 pm, Laurent Pinchart wrote:
>>> On Sat, Oct 26, 2024 at 09:52:43PM +0200, Xavier Roumegue wrote:
>>>> On 10/22/24 8:31 AM, Umang Jain wrote:
>>>>> Currently, vertex maps cannot be updated dynamically while dw100
>>>>> is streaming. This patch enables the support to update the vertex
>>>>> map dynamically at runtime.
>>>>>
>>>>> To support this functionality, we need to allocate and track two
>>>>> sets of DMA-allocated vertex maps, one for the currently applied map
>>>>> and another for the updated (pending) map. Before the start of next
>>>>> frame, if a new user-supplied vertex map is available, the hardware
>>>>> mapping is changed to use new vertex map, thus enabling the user to
>>>>> update the vertex map at runtime.
>>> How do you synchronize the new map with the jobs ? That doesn't seem to
>>> be supported by the patch, is it a feature that you don't need ?
>>>
>>>>> We should ensure no race occurs when the vertex map is updated multiple
>>>>> times when a frame is processing. Hence, vertex map is never updated to
>>>>> the applied vertex map index in .s_ctrl(). It is always updated on the
>>>>> pending vertex map slot, with `maps_mutex` lock held. `maps_mutex` lock
>>>>> is also held when the pending vertex map is applied to the hardware in
>>>>> dw100_start().
>>>>>
>>>>> Ability to update the vertex map at runtime, enables abritrary rotation
>>> s/abritrary/arbitrary/
>>>
>>>>> and digital zoom features for the input frames, through the dw100
>>>>> hardware.
>>>>>
>>>>> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
>>>>> ---
>>>>>     drivers/media/platform/nxp/dw100/dw100.c | 73 ++++++++++++++++++------
>>>>>     1 file changed, 56 insertions(+), 17 deletions(-)
>>>>>
>>>>> diff --git a/drivers/media/platform/nxp/dw100/dw100.c b/drivers/media/platform/nxp/dw100/dw100.c
>>>>> index 54ebf59df682..42712ccff754 100644
>>>>> --- a/drivers/media/platform/nxp/dw100/dw100.c
>>>>> +++ b/drivers/media/platform/nxp/dw100/dw100.c
>>>>> @@ -83,6 +83,11 @@ struct dw100_q_data {
>>>>>     	struct v4l2_rect		crop;
>>>>>     };
>>>>>     
>>>>> +struct dw100_map {
>>>>> +	unsigned int *map;
>>>>> +	dma_addr_t map_dma;
>>> I would have called the field just 'dma' as it's already qualified by
>>> the structure name or the field name in dw100_ctx.
>>>
>>>>> +};
>>>>> +
>>>>>     struct dw100_ctx {
>>>>>     	struct v4l2_fh			fh;
>>>>>     	struct dw100_device		*dw_dev;
>>>>> @@ -92,12 +97,14 @@ struct dw100_ctx {
>>>>>     	struct mutex			vq_mutex;
>>>>>     
>>>>>     	/* Look Up Table for pixel remapping */
>>>>> -	unsigned int			*map;
>>>>> -	dma_addr_t			map_dma;
>>>>> +	struct dw100_map		maps[2];
>>>>> +	unsigned int			applied_map_id;
>>>>>     	size_t				map_size;
>>>>>     	unsigned int			map_width;
>>>>>     	unsigned int			map_height;
>>>>>     	bool				user_map_is_set;
>>>>> +	bool				user_map_is_updated;
>>>>> +	struct mutex			maps_mutex;
>>>>>     
>>>>>     	/* Source and destination queue data */
>>>>>     	struct dw100_q_data		q_data[2];
>>>>> @@ -308,24 +315,31 @@ static int dw100_create_mapping(struct dw100_ctx *ctx)
>>>>>     {
>>>>>     	u32 *user_map;
>>>>>     
>>>>> -	if (ctx->map)
>>>>> -		dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>>>> -				  ctx->map, ctx->map_dma);
>>>>> +	for (unsigned int i = 0; i < 2; i++) {
>>> 	for (unsigned int i = 0; i < ARRAY_SIZE(ctx->maps); i++) {
>>> 		struct dw100_map *map = &ctx->maps[i];
>>>
>>> and use map below.
>>>
>>>
>>>>> +		if (ctx->maps[i].map)
>>>>> +			dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>>>> +					  ctx->maps[i].map, ctx->maps[i].map_dma);
>>>>>     
>>>>> -	ctx->map = dma_alloc_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>>>> -				      &ctx->map_dma, GFP_KERNEL);
>>>>> +		ctx->maps[i].map = dma_alloc_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>>>> +						      &ctx->maps[i].map_dma, GFP_KERNEL);
>>>>>     
>>>>> -	if (!ctx->map)
>>>>> -		return -ENOMEM;
>>>>> +		if (!ctx->maps[i].map)
>>>>> +			return -ENOMEM;
>>>>> +	}
>>>>>     
>>>>>     	user_map = dw100_get_user_map(ctx);
>>>>> -	memcpy(ctx->map, user_map, ctx->map_size);
>>>>> +
>>>>> +	mutex_lock(&ctx->maps_mutex);
>>>>> +	ctx->applied_map_id = 0;
>>>>> +	memcpy(ctx->maps[ctx->applied_map_id].map, user_map, ctx->map_size);
>>>>> +	mutex_unlock(&ctx->maps_mutex);
>>>>>     
>>>>>     	dev_dbg(&ctx->dw_dev->pdev->dev,
>>>>>     		"%ux%u %s mapping created (d:%pad-c:%p) for stream %ux%u->%ux%u\n",
>>>>>     		ctx->map_width, ctx->map_height,
>>>>>     		ctx->user_map_is_set ? "user" : "identity",
>>>>> -		&ctx->map_dma, ctx->map,
>>>>> +		&ctx->maps[ctx->applied_map_id].map_dma,
>>>>> +		ctx->maps[ctx->applied_map_id].map,
>>>>>     		ctx->q_data[DW100_QUEUE_SRC].pix_fmt.width,
>>>>>     		ctx->q_data[DW100_QUEUE_DST].pix_fmt.height,
>>>>>     		ctx->q_data[DW100_QUEUE_SRC].pix_fmt.width,
>>>>> @@ -336,10 +350,12 @@ static int dw100_create_mapping(struct dw100_ctx *ctx)
>>>>>     
>>>>>     static void dw100_destroy_mapping(struct dw100_ctx *ctx)
>>>>>     {
>>>>> -	if (ctx->map) {
>>>>> -		dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>>>> -				  ctx->map, ctx->map_dma);
>>>>> -		ctx->map = NULL;
>>>>> +	for (unsigned int i = 0; i < 2; i++) {
>>> 	for (unsigned int i = 0; i < ARRAY_SIZE(ctx->maps); i++) {
>>> 		struct dw100_map *map = &ctx->maps[i];
>>>
>>> and use map below.
>>>
>>>>> +		if (ctx->maps[i].map)
>>>>> +			dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
>>>>> +					  ctx->maps[i].map, ctx->maps[i].map_dma);
>>>>> +
>>>>> +		ctx->maps[i].map = NULL;
>>>>>     	}
>>>>>     }
>>>>>     
>>>>> @@ -350,6 +366,15 @@ static int dw100_s_ctrl(struct v4l2_ctrl *ctrl)
>>>>>     
>>>>>     	switch (ctrl->id) {
>>>>>     	case V4L2_CID_DW100_DEWARPING_16x16_VERTEX_MAP:
>>>>> +		u32 *user_map = ctrl->p_new.p_u32;
>>>> A warning to fix here.
>>>>
>>>>> +		unsigned int id;
>>>>> +
>>>>> +		mutex_lock(&ctx->maps_mutex);
>>>>> +		id = ctx->applied_map_id ? 0 : 1;
>>>>> +		memcpy(ctx->maps[id].map, user_map, ctx->map_size);
>>>>> +		ctx->user_map_is_updated = true;
>>>> If you call the control before to start the stream, the dma mapping is
>>>> not yet done(dw100_create_mapping not yet called). Hence, copying the
>>>> user map to a NULL pointer.
>>> The maps could be allocated in dw100_open() when creating the context.
>>> That would likely require moving the initialization of ctx->map_width,
>>> ctx->map_height and ctx->map_size as well. The handling of the identity
>>> map would probably need to be rewritten too.
>> The ctx->map_width, ctx->map_height and ctx->map_size would be updated
>> on s_fmt().
> I saw that ctx->map_width, ctx->map_height and ctx->map_size are set in
> dw100_ctrl_dewarping_map_init(), with
>
> 	mw = ctrl->dims[0];
> 	mh = ctrl->dims[1];
>
> 	[...]
>
> 	ctx->map_width = mw;
> 	ctx->map_height = mh;
> 	ctx->map_size = mh * mw * sizeof(u32);
>
> but overlooked the fact that the dimensions are set in dw100_s_fmt().
>
>> I think we can solve the NULL pointer issue by allocating when creating
>> the context (open()), however, it would require updating (re-allocation)
>> again before the map can be memcpy()ed before streaming. Because the map
>> dimensions would have changed.
> If the map dimensions change, that invalidates the map contents set by
> userspace. This is currently handled by dw100_ctrl_dewarping_map_init().
> You won't be able to just memcpy() the previous control to the new one.
>
>> See dw100_s_fmt()
>>
>> ...
>> dims[0] = dw100_get_n_vertices_from_length(q_data->pix_fmt.width);
>> dims[1] = dw100_get_n_vertices_from_length(q_data->pix_fmt.height);
>>
>> ret = v4l2_ctrl_modify_dimensions(ctrl, dims);
>> ```
>>
>> I checked the v4l2_ctrl_modify_dimensions definition to check if it
>> issues a call v4l2_ctrl_type_ops.init  (where the map dimensions are
>> updated for dw100) and it does.
>>
>> So, I think I will have to introduce allocations in dw100_open() so that
>> NULL pointer issue doesn't occur and let the dma allocation get
>> re-allocated with new dimensions just before stream start.
> That seems a bit pointless, if the map will be invalidated by a call to
> VIDIOC_S_FMT anyway. The only case where it would be useful is if
> userspace sets the control before starting streaming and doesn't call
> VIDIOC_S_FMT.

I was indeed not comfortable with the .open() dma-allocation approach 
and hence, I summarised it here for discussion.


>
> I'm increasingly thinking the driver should use the request API to
> synchronize the control with the jobs.

I will atleast consider and estimate how much complex it would be!

>
>> Also, we do not have to move the ctx->map_width, ctx->height abd
>> ctx->map_size inititlisation, since they are already gets initialised to
>> defaults, on the open() path when v4l2_ctrl_new_custom() is done.
>>
>>>>> +		mutex_unlock(&ctx->maps_mutex);
>>>>> +
>>>>>     		ctx->user_map_is_set = true;
>>>>>     		break;
>>>>>     	}
>>>>> @@ -655,6 +680,8 @@ static int dw100_open(struct file *file)
>>>>>     
>>>>>     	v4l2_fh_add(&ctx->fh);
>>>>>     
>>>>> +	mutex_init(&ctx->maps_mutex);
>>>>> +
>>>>>     	return 0;
>>>>>     
>>>>>     err:
>>>>> @@ -675,6 +702,7 @@ static int dw100_release(struct file *file)
>>>>>     	v4l2_ctrl_handler_free(&ctx->hdl);
>>>>>     	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
>>>>>     	mutex_destroy(&ctx->vq_mutex);
>>>>> +	mutex_destroy(&ctx->maps_mutex);
>>>>>     	kfree(ctx);
>>>>>     
>>>>>     	return 0;
>>>>> @@ -1453,8 +1481,19 @@ static void dw100_start(struct dw100_ctx *ctx, struct vb2_v4l2_buffer *in_vb,
>>>>>     	dw100_hw_set_destination(dw_dev, &ctx->q_data[DW100_QUEUE_DST],
>>>>>     				 ctx->q_data[DW100_QUEUE_SRC].fmt,
>>>>>     				 &out_vb->vb2_buf);
>>>>> -	dw100_hw_set_mapping(dw_dev, ctx->map_dma,
>>>>> -			     ctx->map_width, ctx->map_height);
>>>>> +
>>>>> +
>>>>> +	mutex_lock(&ctx->maps_mutex);
>>>>> +	if (ctx->user_map_is_updated) {
>>>> The hardware register must unconditionally be updated while starting a
>>>> new context, as a v4l2 m2m supports multi context operations. Otherwise,
>>>> you may be running with the user mapping used by the previous context.
>>>>
>>>> Moreover, the hardware mapping will not be set in case you use the
>>>> driver as a simple scaler without user mapping, which causes the process
>>>> to hang as the run does not start and never completes.
>>>>
>>>>> +		unsigned int id = ctx->applied_map_id ? 0 : 1;
>>>>> +
>>>>> +		dw100_hw_set_mapping(dw_dev, ctx->maps[id].map_dma,
>>>>> +				     ctx->map_width, ctx->map_height);
>>>>> +		ctx->applied_map_id = id;
>>>>> +		ctx->user_map_is_updated = false;
>>>>> +	}
>>>>> +	mutex_unlock(&ctx->maps_mutex);
>>>>> +
>>>>>     	dw100_hw_enable_irq(dw_dev);
>>>>>     	dw100_hw_dewarp_start(dw_dev);
>>>>>     
>>>> It sounds as this patch requires a collaborative application for running
>>>> well. All my simple tests failed.
>>>>
>>>> You can test a simple scaler/pixfmt conversion operation with v4l2 utils:
>>>>
>>>>
>>>> v4l2-ctl \
>>>> -d 0 \
>>>> --set-fmt-video-out width=640,height=480,pixelformat=NV12,field=none \
>>>> --set-fmt-video width=640,height=480,pixelformat=NV21,field=none \
>>>> --stream-out-pattern 3 \
>>>> --set-selection-output\
>>>> target=crop,top=0,left=0,width=640,height=480,flags= \
>>>> --stream-count 100 \
>>>> --stream-mmap \
>>>> --stream-out-mmap

Laurent Pinchart Oct. 28, 2024, 4:38 p.m. UTC | #11

On Mon, Oct 28, 2024 at 08:32:51PM +0530, Umang Jain wrote:
> Hi Laurent,
> 
> On 28/10/24 8:11 pm, Laurent Pinchart wrote:
> > Hi Umang,
> >
> > On Mon, Oct 28, 2024 at 07:47:46PM +0530, Umang Jain wrote:
> >> On 27/10/24 8:10 pm, Laurent Pinchart wrote:
> >>> On Sat, Oct 26, 2024 at 09:52:43PM +0200, Xavier Roumegue wrote:
> >>>> On 10/22/24 8:31 AM, Umang Jain wrote:
> >>>>> Currently, vertex maps cannot be updated dynamically while dw100
> >>>>> is streaming. This patch enables the support to update the vertex
> >>>>> map dynamically at runtime.
> >>>>>
> >>>>> To support this functionality, we need to allocate and track two
> >>>>> sets of DMA-allocated vertex maps, one for the currently applied map
> >>>>> and another for the updated (pending) map. Before the start of next
> >>>>> frame, if a new user-supplied vertex map is available, the hardware
> >>>>> mapping is changed to use new vertex map, thus enabling the user to
> >>>>> update the vertex map at runtime.
> >>> How do you synchronize the new map with the jobs ? That doesn't seem to
> >>> be supported by the patch, is it a feature that you don't need ?
> >>>
> >>>>> We should ensure no race occurs when the vertex map is updated multiple
> >>>>> times when a frame is processing. Hence, vertex map is never updated to
> >>>>> the applied vertex map index in .s_ctrl(). It is always updated on the
> >>>>> pending vertex map slot, with `maps_mutex` lock held. `maps_mutex` lock
> >>>>> is also held when the pending vertex map is applied to the hardware in
> >>>>> dw100_start().
> >>>>>
> >>>>> Ability to update the vertex map at runtime, enables abritrary rotation
> >>> s/abritrary/arbitrary/
> >>>
> >>>>> and digital zoom features for the input frames, through the dw100
> >>>>> hardware.
> >>>>>
> >>>>> Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
> >>>>> ---
> >>>>>     drivers/media/platform/nxp/dw100/dw100.c | 73 ++++++++++++++++++------
> >>>>>     1 file changed, 56 insertions(+), 17 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/media/platform/nxp/dw100/dw100.c b/drivers/media/platform/nxp/dw100/dw100.c
> >>>>> index 54ebf59df682..42712ccff754 100644
> >>>>> --- a/drivers/media/platform/nxp/dw100/dw100.c
> >>>>> +++ b/drivers/media/platform/nxp/dw100/dw100.c
> >>>>> @@ -83,6 +83,11 @@ struct dw100_q_data {
> >>>>>     	struct v4l2_rect		crop;
> >>>>>     };
> >>>>>     
> >>>>> +struct dw100_map {
> >>>>> +	unsigned int *map;
> >>>>> +	dma_addr_t map_dma;
> >>> I would have called the field just 'dma' as it's already qualified by
> >>> the structure name or the field name in dw100_ctx.
> >>>
> >>>>> +};
> >>>>> +
> >>>>>     struct dw100_ctx {
> >>>>>     	struct v4l2_fh			fh;
> >>>>>     	struct dw100_device		*dw_dev;
> >>>>> @@ -92,12 +97,14 @@ struct dw100_ctx {
> >>>>>     	struct mutex			vq_mutex;
> >>>>>     
> >>>>>     	/* Look Up Table for pixel remapping */
> >>>>> -	unsigned int			*map;
> >>>>> -	dma_addr_t			map_dma;
> >>>>> +	struct dw100_map		maps[2];
> >>>>> +	unsigned int			applied_map_id;
> >>>>>     	size_t				map_size;
> >>>>>     	unsigned int			map_width;
> >>>>>     	unsigned int			map_height;
> >>>>>     	bool				user_map_is_set;
> >>>>> +	bool				user_map_is_updated;
> >>>>> +	struct mutex			maps_mutex;
> >>>>>     
> >>>>>     	/* Source and destination queue data */
> >>>>>     	struct dw100_q_data		q_data[2];
> >>>>> @@ -308,24 +315,31 @@ static int dw100_create_mapping(struct dw100_ctx *ctx)
> >>>>>     {
> >>>>>     	u32 *user_map;
> >>>>>     
> >>>>> -	if (ctx->map)
> >>>>> -		dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> >>>>> -				  ctx->map, ctx->map_dma);
> >>>>> +	for (unsigned int i = 0; i < 2; i++) {
> >>> 	for (unsigned int i = 0; i < ARRAY_SIZE(ctx->maps); i++) {
> >>> 		struct dw100_map *map = &ctx->maps[i];
> >>>
> >>> and use map below.
> >>>
> >>>
> >>>>> +		if (ctx->maps[i].map)
> >>>>> +			dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> >>>>> +					  ctx->maps[i].map, ctx->maps[i].map_dma);
> >>>>>     
> >>>>> -	ctx->map = dma_alloc_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> >>>>> -				      &ctx->map_dma, GFP_KERNEL);
> >>>>> +		ctx->maps[i].map = dma_alloc_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> >>>>> +						      &ctx->maps[i].map_dma, GFP_KERNEL);
> >>>>>     
> >>>>> -	if (!ctx->map)
> >>>>> -		return -ENOMEM;
> >>>>> +		if (!ctx->maps[i].map)
> >>>>> +			return -ENOMEM;
> >>>>> +	}
> >>>>>     
> >>>>>     	user_map = dw100_get_user_map(ctx);
> >>>>> -	memcpy(ctx->map, user_map, ctx->map_size);
> >>>>> +
> >>>>> +	mutex_lock(&ctx->maps_mutex);
> >>>>> +	ctx->applied_map_id = 0;
> >>>>> +	memcpy(ctx->maps[ctx->applied_map_id].map, user_map, ctx->map_size);
> >>>>> +	mutex_unlock(&ctx->maps_mutex);
> >>>>>     
> >>>>>     	dev_dbg(&ctx->dw_dev->pdev->dev,
> >>>>>     		"%ux%u %s mapping created (d:%pad-c:%p) for stream %ux%u->%ux%u\n",
> >>>>>     		ctx->map_width, ctx->map_height,
> >>>>>     		ctx->user_map_is_set ? "user" : "identity",
> >>>>> -		&ctx->map_dma, ctx->map,
> >>>>> +		&ctx->maps[ctx->applied_map_id].map_dma,
> >>>>> +		ctx->maps[ctx->applied_map_id].map,
> >>>>>     		ctx->q_data[DW100_QUEUE_SRC].pix_fmt.width,
> >>>>>     		ctx->q_data[DW100_QUEUE_DST].pix_fmt.height,
> >>>>>     		ctx->q_data[DW100_QUEUE_SRC].pix_fmt.width,
> >>>>> @@ -336,10 +350,12 @@ static int dw100_create_mapping(struct dw100_ctx *ctx)
> >>>>>     
> >>>>>     static void dw100_destroy_mapping(struct dw100_ctx *ctx)
> >>>>>     {
> >>>>> -	if (ctx->map) {
> >>>>> -		dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> >>>>> -				  ctx->map, ctx->map_dma);
> >>>>> -		ctx->map = NULL;
> >>>>> +	for (unsigned int i = 0; i < 2; i++) {
> >>> 	for (unsigned int i = 0; i < ARRAY_SIZE(ctx->maps); i++) {
> >>> 		struct dw100_map *map = &ctx->maps[i];
> >>>
> >>> and use map below.
> >>>
> >>>>> +		if (ctx->maps[i].map)
> >>>>> +			dma_free_coherent(&ctx->dw_dev->pdev->dev, ctx->map_size,
> >>>>> +					  ctx->maps[i].map, ctx->maps[i].map_dma);
> >>>>> +
> >>>>> +		ctx->maps[i].map = NULL;
> >>>>>     	}
> >>>>>     }
> >>>>>     
> >>>>> @@ -350,6 +366,15 @@ static int dw100_s_ctrl(struct v4l2_ctrl *ctrl)
> >>>>>     
> >>>>>     	switch (ctrl->id) {
> >>>>>     	case V4L2_CID_DW100_DEWARPING_16x16_VERTEX_MAP:
> >>>>> +		u32 *user_map = ctrl->p_new.p_u32;
> >>>> A warning to fix here.
> >>>>
> >>>>> +		unsigned int id;
> >>>>> +
> >>>>> +		mutex_lock(&ctx->maps_mutex);
> >>>>> +		id = ctx->applied_map_id ? 0 : 1;
> >>>>> +		memcpy(ctx->maps[id].map, user_map, ctx->map_size);
> >>>>> +		ctx->user_map_is_updated = true;
> >>>> If you call the control before to start the stream, the dma mapping is
> >>>> not yet done(dw100_create_mapping not yet called). Hence, copying the
> >>>> user map to a NULL pointer.
> >>> The maps could be allocated in dw100_open() when creating the context.
> >>> That would likely require moving the initialization of ctx->map_width,
> >>> ctx->map_height and ctx->map_size as well. The handling of the identity
> >>> map would probably need to be rewritten too.
> >> The ctx->map_width, ctx->map_height and ctx->map_size would be updated
> >> on s_fmt().
> > I saw that ctx->map_width, ctx->map_height and ctx->map_size are set in
> > dw100_ctrl_dewarping_map_init(), with
> >
> > 	mw = ctrl->dims[0];
> > 	mh = ctrl->dims[1];
> >
> > 	[...]
> >
> > 	ctx->map_width = mw;
> > 	ctx->map_height = mh;
> > 	ctx->map_size = mh * mw * sizeof(u32);
> >
> > but overlooked the fact that the dimensions are set in dw100_s_fmt().
> >
> >> I think we can solve the NULL pointer issue by allocating when creating
> >> the context (open()), however, it would require updating (re-allocation)
> >> again before the map can be memcpy()ed before streaming. Because the map
> >> dimensions would have changed.
> > If the map dimensions change, that invalidates the map contents set by
> > userspace. This is currently handled by dw100_ctrl_dewarping_map_init().
> > You won't be able to just memcpy() the previous control to the new one.
> >
> >> See dw100_s_fmt()
> >>
> >> ...
> >> dims[0] = dw100_get_n_vertices_from_length(q_data->pix_fmt.width);
> >> dims[1] = dw100_get_n_vertices_from_length(q_data->pix_fmt.height);
> >>
> >> ret = v4l2_ctrl_modify_dimensions(ctrl, dims);
> >> ```
> >>
> >> I checked the v4l2_ctrl_modify_dimensions definition to check if it
> >> issues a call v4l2_ctrl_type_ops.init  (where the map dimensions are
> >> updated for dw100) and it does.
> >>
> >> So, I think I will have to introduce allocations in dw100_open() so that
> >> NULL pointer issue doesn't occur and let the dma allocation get
> >> re-allocated with new dimensions just before stream start.
> >
> > That seems a bit pointless, if the map will be invalidated by a call to
> > VIDIOC_S_FMT anyway. The only case where it would be useful is if
> > userspace sets the control before starting streaming and doesn't call
> > VIDIOC_S_FMT.
> 
> I was indeed not comfortable with the .open() dma-allocation approach 
> and hence, I summarised it here for discussion.
> 
> > I'm increasingly thinking the driver should use the request API to
> > synchronize the control with the jobs.
> 
> I will atleast consider and estimate how much complex it would be!

Good idea, let's make a decision based on the need and complexity.

> >> Also, we do not have to move the ctx->map_width, ctx->height abd
> >> ctx->map_size inititlisation, since they are already gets initialised to
> >> defaults, on the open() path when v4l2_ctrl_new_custom() is done.
> >>
> >>>>> +		mutex_unlock(&ctx->maps_mutex);
> >>>>> +
> >>>>>     		ctx->user_map_is_set = true;
> >>>>>     		break;
> >>>>>     	}
> >>>>> @@ -655,6 +680,8 @@ static int dw100_open(struct file *file)
> >>>>>     
> >>>>>     	v4l2_fh_add(&ctx->fh);
> >>>>>     
> >>>>> +	mutex_init(&ctx->maps_mutex);
> >>>>> +
> >>>>>     	return 0;
> >>>>>     
> >>>>>     err:
> >>>>> @@ -675,6 +702,7 @@ static int dw100_release(struct file *file)
> >>>>>     	v4l2_ctrl_handler_free(&ctx->hdl);
> >>>>>     	v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> >>>>>     	mutex_destroy(&ctx->vq_mutex);
> >>>>> +	mutex_destroy(&ctx->maps_mutex);
> >>>>>     	kfree(ctx);
> >>>>>     
> >>>>>     	return 0;
> >>>>> @@ -1453,8 +1481,19 @@ static void dw100_start(struct dw100_ctx *ctx, struct vb2_v4l2_buffer *in_vb,
> >>>>>     	dw100_hw_set_destination(dw_dev, &ctx->q_data[DW100_QUEUE_DST],
> >>>>>     				 ctx->q_data[DW100_QUEUE_SRC].fmt,
> >>>>>     				 &out_vb->vb2_buf);
> >>>>> -	dw100_hw_set_mapping(dw_dev, ctx->map_dma,
> >>>>> -			     ctx->map_width, ctx->map_height);
> >>>>> +
> >>>>> +
> >>>>> +	mutex_lock(&ctx->maps_mutex);
> >>>>> +	if (ctx->user_map_is_updated) {
> >>>> The hardware register must unconditionally be updated while starting a
> >>>> new context, as a v4l2 m2m supports multi context operations. Otherwise,
> >>>> you may be running with the user mapping used by the previous context.
> >>>>
> >>>> Moreover, the hardware mapping will not be set in case you use the
> >>>> driver as a simple scaler without user mapping, which causes the process
> >>>> to hang as the run does not start and never completes.
> >>>>
> >>>>> +		unsigned int id = ctx->applied_map_id ? 0 : 1;
> >>>>> +
> >>>>> +		dw100_hw_set_mapping(dw_dev, ctx->maps[id].map_dma,
> >>>>> +				     ctx->map_width, ctx->map_height);
> >>>>> +		ctx->applied_map_id = id;
> >>>>> +		ctx->user_map_is_updated = false;
> >>>>> +	}
> >>>>> +	mutex_unlock(&ctx->maps_mutex);
> >>>>> +
> >>>>>     	dw100_hw_enable_irq(dw_dev);
> >>>>>     	dw100_hw_dewarp_start(dw_dev);
> >>>>>     
> >>>> It sounds as this patch requires a collaborative application for running
> >>>> well. All my simple tests failed.
> >>>>
> >>>> You can test a simple scaler/pixfmt conversion operation with v4l2 utils:
> >>>>
> >>>>
> >>>> v4l2-ctl \
> >>>> -d 0 \
> >>>> --set-fmt-video-out width=640,height=480,pixelformat=NV12,field=none \
> >>>> --set-fmt-video width=640,height=480,pixelformat=NV21,field=none \
> >>>> --stream-out-pattern 3 \
> >>>> --set-selection-output\
> >>>> target=crop,top=0,left=0,width=640,height=480,flags= \
> >>>> --stream-count 100 \
> >>>> --stream-mmap \
> >>>> --stream-out-mmap

media: dw100: Enable dynamic vertex map

Commit Message

Comments

Patch