Message ID | HE1PR06MB4011FF930111A869E4645C8CAC790@HE1PR06MB4011.eurprd06.prod.outlook.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | media: hantro: H264 fixes and improvements | expand |
Hi Jonas, On Thu, Nov 7, 2019 at 7:34 AM Jonas Karlman <jonas@kwiboo.se> wrote: > > A decoded 8-bit 4:2:0 frame need memory for up to 448 bytes per > macroblock with additional 32 bytes on multi-core variants. > > Memory layout is as follow: > > +---------------------------+ > | Y-plane 256 bytes x MBs | > +---------------------------+ > | UV-plane 128 bytes x MBs | > +---------------------------+ > | MV buffer 64 bytes x MBs | > +---------------------------+ > | MC sync 32 bytes | > +---------------------------+ > > Reduce the extra space allocated now that motion vector buffer offset no > longer is based on the extra space. > > Only allocate extra space for 64 bytes x MBs of motion vector buffer > and 32 bytes for multi-core sync. > > Fixes: a9471e25629b ("media: hantro: Add core bits to support H264 decoding") > Signed-off-by: Jonas Karlman <jonas@kwiboo.se> > Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> > --- > Changes in v3: > - add memory layout to code comment (Boris) > Changes in v2: > - updated commit message > --- > drivers/staging/media/hantro/hantro_v4l2.c | 20 ++++++++++++++++++-- > 1 file changed, 18 insertions(+), 2 deletions(-) > Thanks for the patch! What platform did you test it on and how? Was it tested with IOMMU enabled? Best regards, Tomasz > diff --git a/drivers/staging/media/hantro/hantro_v4l2.c b/drivers/staging/media/hantro/hantro_v4l2.c > index 3dae52abb96c..c8c896a06f58 100644 > --- a/drivers/staging/media/hantro/hantro_v4l2.c > +++ b/drivers/staging/media/hantro/hantro_v4l2.c > @@ -240,14 +240,30 @@ static int vidioc_try_fmt(struct file *file, void *priv, struct v4l2_format *f, > v4l2_fill_pixfmt_mp(pix_mp, fmt->fourcc, pix_mp->width, > pix_mp->height); > /* > + * A decoded 8-bit 4:2:0 NV12 frame may need memory for up to > + * 448 bytes per macroblock with additional 32 bytes on > + * multi-core variants. > + * > * The H264 decoder needs extra space on the output buffers > * to store motion vectors. This is needed for reference > * frames. > + * > + * Memory layout is as follow: > + * > + * +---------------------------+ > + * | Y-plane 256 bytes x MBs | > + * +---------------------------+ > + * | UV-plane 128 bytes x MBs | > + * +---------------------------+ > + * | MV buffer 64 bytes x MBs | > + * +---------------------------+ > + * | MC sync 32 bytes | > + * +---------------------------+ > */ > if (ctx->vpu_src_fmt->fourcc == V4L2_PIX_FMT_H264_SLICE) > pix_mp->plane_fmt[0].sizeimage += > - 128 * DIV_ROUND_UP(pix_mp->width, 16) * > - DIV_ROUND_UP(pix_mp->height, 16); > + 64 * MB_WIDTH(pix_mp->width) * > + MB_WIDTH(pix_mp->height) + 32; > } else if (!pix_mp->plane_fmt[0].sizeimage) { > /* > * For coded formats the application can specify > -- > 2.17.1 >
On Wed, 2019-11-20 at 21:44 +0900, Tomasz Figa wrote: > Hi Jonas, > > On Thu, Nov 7, 2019 at 7:34 AM Jonas Karlman <jonas@kwiboo.se> wrote: > > A decoded 8-bit 4:2:0 frame need memory for up to 448 bytes per > > macroblock with additional 32 bytes on multi-core variants. > > > > Memory layout is as follow: > > > > +---------------------------+ > > > Y-plane 256 bytes x MBs | > > +---------------------------+ > > > UV-plane 128 bytes x MBs | > > +---------------------------+ > > > MV buffer 64 bytes x MBs | > > +---------------------------+ > > > MC sync 32 bytes | > > +---------------------------+ > > > > Reduce the extra space allocated now that motion vector buffer offset no > > longer is based on the extra space. > > > > Only allocate extra space for 64 bytes x MBs of motion vector buffer > > and 32 bytes for multi-core sync. > > > > Fixes: a9471e25629b ("media: hantro: Add core bits to support H264 decoding") > > Signed-off-by: Jonas Karlman <jonas@kwiboo.se> > > Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> > > --- > > Changes in v3: > > - add memory layout to code comment (Boris) > > Changes in v2: > > - updated commit message > > --- > > drivers/staging/media/hantro/hantro_v4l2.c | 20 ++++++++++++++++++-- > > 1 file changed, 18 insertions(+), 2 deletions(-) > > > > Thanks for the patch! > > What platform did you test it on and how? Was it tested with IOMMU enabled? Hello Tomasz, Please note that this series has been picked-up and is merged in v5.5-rc1. IIRC, we tested these patches on RK3399 and RK3288 (that means with an IOMMU). I've just ran some more extensive tests on RK3288, on media/master; and I plan to test some more on RK3399 later this week. Do you have any specific concern in mind? Thanks, Ezequiel
On Tue, Dec 10, 2019 at 3:11 AM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > On Wed, 2019-11-20 at 21:44 +0900, Tomasz Figa wrote: > > Hi Jonas, > > > > On Thu, Nov 7, 2019 at 7:34 AM Jonas Karlman <jonas@kwiboo.se> wrote: > > > A decoded 8-bit 4:2:0 frame need memory for up to 448 bytes per > > > macroblock with additional 32 bytes on multi-core variants. > > > > > > Memory layout is as follow: > > > > > > +---------------------------+ > > > > Y-plane 256 bytes x MBs | > > > +---------------------------+ > > > > UV-plane 128 bytes x MBs | > > > +---------------------------+ > > > > MV buffer 64 bytes x MBs | > > > +---------------------------+ > > > > MC sync 32 bytes | > > > +---------------------------+ > > > > > > Reduce the extra space allocated now that motion vector buffer offset no > > > longer is based on the extra space. > > > > > > Only allocate extra space for 64 bytes x MBs of motion vector buffer > > > and 32 bytes for multi-core sync. > > > > > > Fixes: a9471e25629b ("media: hantro: Add core bits to support H264 decoding") > > > Signed-off-by: Jonas Karlman <jonas@kwiboo.se> > > > Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> > > > --- > > > Changes in v3: > > > - add memory layout to code comment (Boris) > > > Changes in v2: > > > - updated commit message > > > --- > > > drivers/staging/media/hantro/hantro_v4l2.c | 20 ++++++++++++++++++-- > > > 1 file changed, 18 insertions(+), 2 deletions(-) > > > > > > > Thanks for the patch! > > > > What platform did you test it on and how? Was it tested with IOMMU enabled? > > Hello Tomasz, > > Please note that this series has been picked-up and is merged > in v5.5-rc1. > > IIRC, we tested these patches on RK3399 and RK3288 (that means > with an IOMMU). I've just ran some more extensive tests on RK3288, > on media/master; and I plan to test some more on RK3399 later this week. > > Do you have any specific concern in mind? I specifically want to know whether we're 100% sure that those sizes are correct. The IOMMU still works on page granularity so it's possible that the allocation could be just big enough by luck. Best regards, Tomasz
On 2020-01-08 13:59, Tomasz Figa wrote: > On Tue, Dec 10, 2019 at 3:11 AM Ezequiel Garcia <ezequiel@collabora.com> wrote: >> >> On Wed, 2019-11-20 at 21:44 +0900, Tomasz Figa wrote: >>> Hi Jonas, >>> >>> On Thu, Nov 7, 2019 at 7:34 AM Jonas Karlman <jonas@kwiboo.se> wrote: >>>> A decoded 8-bit 4:2:0 frame need memory for up to 448 bytes per >>>> macroblock with additional 32 bytes on multi-core variants. >>>> >>>> Memory layout is as follow: >>>> >>>> +---------------------------+ >>>>> Y-plane 256 bytes x MBs | >>>> +---------------------------+ >>>>> UV-plane 128 bytes x MBs | >>>> +---------------------------+ >>>>> MV buffer 64 bytes x MBs | >>>> +---------------------------+ >>>>> MC sync 32 bytes | >>>> +---------------------------+ >>>> >>>> Reduce the extra space allocated now that motion vector buffer offset no >>>> longer is based on the extra space. >>>> >>>> Only allocate extra space for 64 bytes x MBs of motion vector buffer >>>> and 32 bytes for multi-core sync. >>>> >>>> Fixes: a9471e25629b ("media: hantro: Add core bits to support H264 decoding") >>>> Signed-off-by: Jonas Karlman <jonas@kwiboo.se> >>>> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> >>>> --- >>>> Changes in v3: >>>> - add memory layout to code comment (Boris) >>>> Changes in v2: >>>> - updated commit message >>>> --- >>>> drivers/staging/media/hantro/hantro_v4l2.c | 20 ++++++++++++++++++-- >>>> 1 file changed, 18 insertions(+), 2 deletions(-) >>>> >>> >>> Thanks for the patch! >>> >>> What platform did you test it on and how? Was it tested with IOMMU enabled? >> >> Hello Tomasz, >> >> Please note that this series has been picked-up and is merged >> in v5.5-rc1. >> >> IIRC, we tested these patches on RK3399 and RK3288 (that means >> with an IOMMU). I've just ran some more extensive tests on RK3288, >> on media/master; and I plan to test some more on RK3399 later this week. >> >> Do you have any specific concern in mind? > > I specifically want to know whether we're 100% sure that those sizes > are correct. The IOMMU still works on page granularity so it's > possible that the allocation could be just big enough by luck. One of my RK3288 TRM [1] contains the following: Direct mode motion vector write: Its base addr is right after decode output picture data Its length is mbwidth*mbheight*64 Also both the mpp library and imx-vpu-hantro code both use mbwidth*mbheight*64. So I feel confident that the buffer size is correct. [1] Rockchip RK3288TRM V1.1 Part3-Graphic and multi-media.pdf Regards, Jonas > > Best regards, > Tomasz >
On Thu, Jan 9, 2020 at 12:10 AM Jonas Karlman <jonas@kwiboo.se> wrote: > > On 2020-01-08 13:59, Tomasz Figa wrote: > > On Tue, Dec 10, 2019 at 3:11 AM Ezequiel Garcia <ezequiel@collabora.com> wrote: > >> > >> On Wed, 2019-11-20 at 21:44 +0900, Tomasz Figa wrote: > >>> Hi Jonas, > >>> > >>> On Thu, Nov 7, 2019 at 7:34 AM Jonas Karlman <jonas@kwiboo.se> wrote: > >>>> A decoded 8-bit 4:2:0 frame need memory for up to 448 bytes per > >>>> macroblock with additional 32 bytes on multi-core variants. > >>>> > >>>> Memory layout is as follow: > >>>> > >>>> +---------------------------+ > >>>>> Y-plane 256 bytes x MBs | > >>>> +---------------------------+ > >>>>> UV-plane 128 bytes x MBs | > >>>> +---------------------------+ > >>>>> MV buffer 64 bytes x MBs | > >>>> +---------------------------+ > >>>>> MC sync 32 bytes | > >>>> +---------------------------+ > >>>> > >>>> Reduce the extra space allocated now that motion vector buffer offset no > >>>> longer is based on the extra space. > >>>> > >>>> Only allocate extra space for 64 bytes x MBs of motion vector buffer > >>>> and 32 bytes for multi-core sync. > >>>> > >>>> Fixes: a9471e25629b ("media: hantro: Add core bits to support H264 decoding") > >>>> Signed-off-by: Jonas Karlman <jonas@kwiboo.se> > >>>> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> > >>>> --- > >>>> Changes in v3: > >>>> - add memory layout to code comment (Boris) > >>>> Changes in v2: > >>>> - updated commit message > >>>> --- > >>>> drivers/staging/media/hantro/hantro_v4l2.c | 20 ++++++++++++++++++-- > >>>> 1 file changed, 18 insertions(+), 2 deletions(-) > >>>> > >>> > >>> Thanks for the patch! > >>> > >>> What platform did you test it on and how? Was it tested with IOMMU enabled? > >> > >> Hello Tomasz, > >> > >> Please note that this series has been picked-up and is merged > >> in v5.5-rc1. > >> > >> IIRC, we tested these patches on RK3399 and RK3288 (that means > >> with an IOMMU). I've just ran some more extensive tests on RK3288, > >> on media/master; and I plan to test some more on RK3399 later this week. > >> > >> Do you have any specific concern in mind? > > > > I specifically want to know whether we're 100% sure that those sizes > > are correct. The IOMMU still works on page granularity so it's > > possible that the allocation could be just big enough by luck. > > One of my RK3288 TRM [1] contains the following: > > Direct mode motion vector write: > Its base addr is right after decode output picture data > Its length is mbwidth*mbheight*64 > > Also both the mpp library and imx-vpu-hantro code both use mbwidth*mbheight*64. > So I feel confident that the buffer size is correct. > > [1] Rockchip RK3288TRM V1.1 Part3-Graphic and multi-media.pdf Fair enough. Thanks! Best regards, Tomasz
diff --git a/drivers/staging/media/hantro/hantro_v4l2.c b/drivers/staging/media/hantro/hantro_v4l2.c index 3dae52abb96c..c8c896a06f58 100644 --- a/drivers/staging/media/hantro/hantro_v4l2.c +++ b/drivers/staging/media/hantro/hantro_v4l2.c @@ -240,14 +240,30 @@ static int vidioc_try_fmt(struct file *file, void *priv, struct v4l2_format *f, v4l2_fill_pixfmt_mp(pix_mp, fmt->fourcc, pix_mp->width, pix_mp->height); /* + * A decoded 8-bit 4:2:0 NV12 frame may need memory for up to + * 448 bytes per macroblock with additional 32 bytes on + * multi-core variants. + * * The H264 decoder needs extra space on the output buffers * to store motion vectors. This is needed for reference * frames. + * + * Memory layout is as follow: + * + * +---------------------------+ + * | Y-plane 256 bytes x MBs | + * +---------------------------+ + * | UV-plane 128 bytes x MBs | + * +---------------------------+ + * | MV buffer 64 bytes x MBs | + * +---------------------------+ + * | MC sync 32 bytes | + * +---------------------------+ */ if (ctx->vpu_src_fmt->fourcc == V4L2_PIX_FMT_H264_SLICE) pix_mp->plane_fmt[0].sizeimage += - 128 * DIV_ROUND_UP(pix_mp->width, 16) * - DIV_ROUND_UP(pix_mp->height, 16); + 64 * MB_WIDTH(pix_mp->width) * + MB_WIDTH(pix_mp->height) + 32; } else if (!pix_mp->plane_fmt[0].sizeimage) { /* * For coded formats the application can specify