mbox series

[v2,0/6,WIP] : rockchip mpp for v4l2 video deocder

Message ID 20190307100316.925-1-randy.li@rock-chips.com (mailing list archive)
Headers show
Series : rockchip mpp for v4l2 video deocder | expand

Message

Randy Li March 7, 2019, 10:03 a.m. UTC
Although I really want to push those work after I added more codec
supports, but I found it is more urge to do those in v4l2 core framework and
userspace.

I would use this driver to present the current problems, write down a
summary here and I would reply to those threads later to push forward.

1. Slice construction is a bad idea. I think I have said my reason in
the IRC and mail before, vp9 is always good example.

And it would request the driver to update QP table/CABAC table every
slice.

I would make something to describe a buffer with some addtional meta
data.

But the current request API limit a buffer with associated with
an request buffer, which prevent sharing some sequence data, but it 
still can solve some problems.

2. Advantage DMA memory control.
I think I need to do some work at v4l2 core.

2.1 The DMA address of each planes. I have sent a mail before talked
about why multiple planes is necessary for the rockchip platform. And
it maybe required by the other platforms.

2.2 IOMMU resume
The most effective way to restore the decoder from critical error is
doing a restting  by reset controller.
Which would leading its slave IOMMU reset at the same time. Then none of
those v4l2 buffers are mapping in the IOMMU.

3. H.264 and HEVC header
I still think those structure have some not necessary fileds in dpb or
reference part, which I don't think hardware decoder would care about
that or can be predict from the other information.

I would join to talk later.

4. The work flow of V4L2
I need a method to prepare the register set before the device acutally
begin the transaction. Which is necessary for those high frame rate usecase.

Also it is useful for those device would share some hardware resources
with the other device and it can save more power.

I think I need to do some work at v4l2 core.

Randy Li (6):
  arm64: dts: rockchip: add power domain to iommu
  staging: video: rockchip: add v4l2 decoder
  [TEST]: rockchip: mpp: support qptable
  staging: video: rockchip: add video codec
  arm64: dts: rockchip: boost clocks for rk3328
  arm64: dts: rockchip: add video codec for rk3328

 .../arm64/boot/dts/rockchip/rk3328-rock64.dts |   32 +
 arch/arm64/boot/dts/rockchip/rk3328.dtsi      |  116 +-
 arch/arm64/boot/dts/rockchip/rk3399.dtsi      |    2 +
 drivers/staging/Kconfig                       |    2 +
 drivers/staging/Makefile                      |    1 +
 drivers/staging/rockchip-mpp/Kconfig          |   34 +
 drivers/staging/rockchip-mpp/Makefile         |   10 +
 drivers/staging/rockchip-mpp/mpp_debug.h      |   87 ++
 drivers/staging/rockchip-mpp/mpp_dev_common.c | 1367 +++++++++++++++++
 drivers/staging/rockchip-mpp/mpp_dev_common.h |  212 +++
 drivers/staging/rockchip-mpp/mpp_dev_rkvdec.c |  997 ++++++++++++
 drivers/staging/rockchip-mpp/mpp_dev_vdpu2.c  |  619 ++++++++
 drivers/staging/rockchip-mpp/mpp_service.c    |  197 +++
 drivers/staging/rockchip-mpp/mpp_service.h    |   38 +
 drivers/staging/rockchip-mpp/rkvdec/hal.h     |   63 +
 drivers/staging/rockchip-mpp/rkvdec/hevc.c    |  166 ++
 drivers/staging/rockchip-mpp/rkvdec/regs.h    |  608 ++++++++
 drivers/staging/rockchip-mpp/vdpu2/hal.h      |   52 +
 drivers/staging/rockchip-mpp/vdpu2/mpeg2.c    |  277 ++++
 drivers/staging/rockchip-mpp/vdpu2/regs.h     |  699 +++++++++
 20 files changed, 5575 insertions(+), 4 deletions(-)
 create mode 100644 drivers/staging/rockchip-mpp/Kconfig
 create mode 100644 drivers/staging/rockchip-mpp/Makefile
 create mode 100644 drivers/staging/rockchip-mpp/mpp_debug.h
 create mode 100644 drivers/staging/rockchip-mpp/mpp_dev_common.c
 create mode 100644 drivers/staging/rockchip-mpp/mpp_dev_common.h
 create mode 100644 drivers/staging/rockchip-mpp/mpp_dev_rkvdec.c
 create mode 100644 drivers/staging/rockchip-mpp/mpp_dev_vdpu2.c
 create mode 100644 drivers/staging/rockchip-mpp/mpp_service.c
 create mode 100644 drivers/staging/rockchip-mpp/mpp_service.h
 create mode 100644 drivers/staging/rockchip-mpp/rkvdec/hal.h
 create mode 100644 drivers/staging/rockchip-mpp/rkvdec/hevc.c
 create mode 100644 drivers/staging/rockchip-mpp/rkvdec/regs.h
 create mode 100644 drivers/staging/rockchip-mpp/vdpu2/hal.h
 create mode 100644 drivers/staging/rockchip-mpp/vdpu2/mpeg2.c
 create mode 100644 drivers/staging/rockchip-mpp/vdpu2/regs.h

Comments

Nicolas Dufresne March 7, 2019, 5:11 p.m. UTC | #1
Le jeudi 07 mars 2019 à 18:03 +0800, Randy Li a écrit :
> Although I really want to push those work after I added more codec
> supports, but I found it is more urge to do those in v4l2 core framework and
> userspace.
> 
> I would use this driver to present the current problems, write down a
> summary here and I would reply to those threads later to push forward.
> 
> 1. Slice construction is a bad idea. I think I have said my reason in
> the IRC and mail before, vp9 is always good example.
> 
> And it would request the driver to update QP table/CABAC table every
> slice.
> 
> I would make something to describe a buffer with some addtional meta
> data.
> 
> But the current request API limit a buffer with associated with
> an request buffer, which prevent sharing some sequence data, but it 
> still can solve some problems.

I guess you are trying to make some argument here, but I don't really
understand what you are referring to. Right now there is two concurrent
drivers for the Rockchip, yours and Ezequiel. Ezequiel does not seem to
have raised any blockers around any of this (yet).

> 
> 2. Advantage DMA memory control.
> I think I need to do some work at v4l2 core.
> 
> 2.1 The DMA address of each planes. I have sent a mail before talked
> about why multiple planes is necessary for the rockchip platform. And
> it maybe required by the other platforms.
> 
> 2.2 IOMMU resume
> The most effective way to restore the decoder from critical error is
> doing a restting  by reset controller.
> Which would leading its slave IOMMU reset at the same time. Then none of
> those v4l2 buffers are mapping in the IOMMU.

You can't invalidate application memory mapping in V4L2, this is
generic to V4L2 interface. If you still need to do this, you'll have to
tell user-space through the SRC_CHANGE event, forcing a reconfiguration
hence a reallocation of the buffers. Then your driver would be
responsible for caching the allocation in order to not introduce
delays. The framework does not prevent you from doing so, but yet, this
is likely difficult.

> 
> 3. H.264 and HEVC header
> I still think those structure have some not necessary fileds in dpb or
> reference part, which I don't think hardware decoder would care about
> that or can be predict from the other information.

This was discussed during the review, but all the information in there
exist in the in the slice headers bitstream. There is no reason some
information should not be made available to the driver, used or not. 

This is the only way we can guaranty that we won't prevent other HW to
be integrated in the future. This is specially needed for
H264_SLICE_RAW format used by Allwinner. In Ezequiel H264 decoder
patchset for Rockchip HW, the format is H264_SLICE_ANNEX_B, of course
in this format one could always do a bit of parsing in the kernel, but
we don't want that really.

That being said, I'd like to remind that we don't expose the control
publicly yet, these are still unstable. We will freeze these after
drivers get added to mainline and are considered stable.

> 
> I would join to talk later.
> 
> 4. The work flow of V4L2
> I need a method to prepare the register set before the device acutally
> begin the transaction. Which is necessary for those high frame rate usecase.

Can't you just increase the buffer queue size ? Then prepare the
buffers on application thread, and process on another or something.
This seems like driver specific, not an API thing.

> 
> Also it is useful for those device would share some hardware resources
> with the other device and it can save more power.
> 
> I think I need to do some work at v4l2 core.

Seems all possible to optimize, imho, we should aim at getting a
working driver into mainline first and then progressively tweak it to
gain best performance. The driver and userspace need complete re-
implementation, and it won't happen all in one pass. I'm sure there is
multiple years of effort, and multiple iterations on the vendor kernel
and userspace. We can learn from that, but my point is that we are not
yet at a stage where we should focus on driver specific optimization.

From what you have said so far, I haven't found anything for which the
kernel interface that is being merged would prevent doing the suggested
optimization. Instead I read you as some of the interface decision will
require a bit more work for this specific driver. This is often the
tradeoff we have to do to make sure we can expose generically usable
interface. And that's likely why mainline drivers are often a tad more
complex then their vendor equivalent.

> 
> Randy Li (6):
>   arm64: dts: rockchip: add power domain to iommu
>   staging: video: rockchip: add v4l2 decoder
>   [TEST]: rockchip: mpp: support qptable
>   staging: video: rockchip: add video codec
>   arm64: dts: rockchip: boost clocks for rk3328
>   arm64: dts: rockchip: add video codec for rk3328
> 
>  .../arm64/boot/dts/rockchip/rk3328-rock64.dts |   32 +
>  arch/arm64/boot/dts/rockchip/rk3328.dtsi      |  116 +-
>  arch/arm64/boot/dts/rockchip/rk3399.dtsi      |    2 +
>  drivers/staging/Kconfig                       |    2 +
>  drivers/staging/Makefile                      |    1 +
>  drivers/staging/rockchip-mpp/Kconfig          |   34 +
>  drivers/staging/rockchip-mpp/Makefile         |   10 +
>  drivers/staging/rockchip-mpp/mpp_debug.h      |   87 ++
>  drivers/staging/rockchip-mpp/mpp_dev_common.c | 1367 +++++++++++++++++
>  drivers/staging/rockchip-mpp/mpp_dev_common.h |  212 +++
>  drivers/staging/rockchip-mpp/mpp_dev_rkvdec.c |  997 ++++++++++++
>  drivers/staging/rockchip-mpp/mpp_dev_vdpu2.c  |  619 ++++++++
>  drivers/staging/rockchip-mpp/mpp_service.c    |  197 +++
>  drivers/staging/rockchip-mpp/mpp_service.h    |   38 +
>  drivers/staging/rockchip-mpp/rkvdec/hal.h     |   63 +
>  drivers/staging/rockchip-mpp/rkvdec/hevc.c    |  166 ++
>  drivers/staging/rockchip-mpp/rkvdec/regs.h    |  608 ++++++++
>  drivers/staging/rockchip-mpp/vdpu2/hal.h      |   52 +
>  drivers/staging/rockchip-mpp/vdpu2/mpeg2.c    |  277 ++++
>  drivers/staging/rockchip-mpp/vdpu2/regs.h     |  699 +++++++++
>  20 files changed, 5575 insertions(+), 4 deletions(-)
>  create mode 100644 drivers/staging/rockchip-mpp/Kconfig
>  create mode 100644 drivers/staging/rockchip-mpp/Makefile
>  create mode 100644 drivers/staging/rockchip-mpp/mpp_debug.h
>  create mode 100644 drivers/staging/rockchip-mpp/mpp_dev_common.c
>  create mode 100644 drivers/staging/rockchip-mpp/mpp_dev_common.h
>  create mode 100644 drivers/staging/rockchip-mpp/mpp_dev_rkvdec.c
>  create mode 100644 drivers/staging/rockchip-mpp/mpp_dev_vdpu2.c
>  create mode 100644 drivers/staging/rockchip-mpp/mpp_service.c
>  create mode 100644 drivers/staging/rockchip-mpp/mpp_service.h
>  create mode 100644 drivers/staging/rockchip-mpp/rkvdec/hal.h
>  create mode 100644 drivers/staging/rockchip-mpp/rkvdec/hevc.c
>  create mode 100644 drivers/staging/rockchip-mpp/rkvdec/regs.h
>  create mode 100644 drivers/staging/rockchip-mpp/vdpu2/hal.h
>  create mode 100644 drivers/staging/rockchip-mpp/vdpu2/mpeg2.c
>  create mode 100644 drivers/staging/rockchip-mpp/vdpu2/regs.h
>
Joe Perches March 8, 2019, 2:33 a.m. UTC | #2
On Thu, 2019-03-07 at 18:03 +0800, Randy Li wrote:
> It is based on the vendor driver sent to mail list before.

trivial notes:

> diff --git a/drivers/staging/rockchip-mpp/mpp_debug.h b/drivers/staging/rockchip-mpp/mpp_debug.h
[]
> +#define mpp_debug_func(type, fmt, args...)			\
> +	do {							\
> +		if (unlikely(debug & type)) {			\
> +			pr_info("%s:%d: " fmt,			\
> +				 __func__, __LINE__, ##args);	\
> +		}						\
> +	} while (0)
> +#define mpp_debug(type, fmt, args...)				\
> +	do {							\
> +		if (unlikely(debug & type)) {			\
> +			pr_info(fmt, ##args);			\
> +		}						\
> +	} while (0)
> +

It's generally better to emit debug messages at KERN_DEBUG

> +#define mpp_debug_enter()					\
> +	do {							\
> +		if (unlikely(debug & DEBUG_FUNCTION)) {		\
> +			pr_info("%s:%d: enter\n",		\
> +				 __func__, __LINE__);		\
> +		}						\
> +	} while (0)
> +
> +#define mpp_debug_leave()					\
> +	do {							\
> +		if (unlikely(debug & DEBUG_FUNCTION)) {		\
> +			pr_info("%s:%d: leave\n",		\
> +				 __func__, __LINE__);		\
> +		}						\
> +	} while (0)

I suggest removal of these macros and uses.

There's not much value in enter/leave markings as
the generic ftrace facility does this already.

> +
> +#define mpp_err(fmt, args...)					\
> +		pr_err("%s:%d: " fmt, __func__, __LINE__, ##args)

__func__, __LINE__ markings generally have little value.