mbox series

[v2,0/6] media: cedrus: h264: Support multi-slice frames

Message ID 20190929200023.215831-1-jernej.skrabec@siol.net (mailing list archive)
Headers show
Series media: cedrus: h264: Support multi-slice frames | expand

Message

Jernej Škrabec Sept. 29, 2019, 8 p.m. UTC
This series adds support for decoding multi-slice H264 frames along with
support for V4L2_DEC_CMD_FLUSH and V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF.

Code was tested by modified ffmpeg, which can be found here:
https://github.com/jernejsk/FFmpeg, branch mainline-test
It has to be configured with at least following options:
--enable-v4l2-request --enable-libudev --enable-libdrm

Samples used for testing:
http://jernej.libreelec.tv/videos/h264/BA1_FT_C.mp4
http://jernej.libreelec.tv/videos/h264/h264.mp4

Command line used for testing:
ffmpeg -hwaccel drm -hwaccel_device /dev/dri/card0 -i h264.mp4 -pix_fmt bgra -f fbdev /dev/fb0

Please note that V4L2_DEC_CMD_FLUSH was not tested because I'm
not sure how. ffmpeg follows exactly which slice is last in frame
and sets hold flag accordingly. Improper usage of hold flag would
corrupt ffmpeg assumptions and it would probably crash. Any ideas
how to test this are welcome!

Thanks to Jonas for adjusting ffmpeg.

Please let me know what you think.

Best regards,
Jernej

Changes from v1:
- added Rb tags
- updated V4L2_DEC_CMD_FLUSH documentation
- updated first slice detection in Cedrus
- hold capture buffer flag is set according to source format
- added v4l m2m stateless_(try_)decoder_cmd ioctl helpers

Hans Verkuil (2):
  vb2: add V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF
  videodev2.h: add V4L2_DEC_CMD_FLUSH

Jernej Skrabec (4):
  media: v4l2-mem2mem: add stateless_(try_)decoder_cmd ioctl helpers
  media: cedrus: Detect first slice of a frame
  media: cedrus: h264: Support multiple slices per frame
  media: cedrus: Add support for holding capture buffer

 Documentation/media/uapi/v4l/buffer.rst       | 13 ++++++
 .../media/uapi/v4l/vidioc-decoder-cmd.rst     | 10 +++-
 .../media/uapi/v4l/vidioc-reqbufs.rst         |  6 +++
 .../media/videodev2.h.rst.exceptions          |  1 +
 .../media/common/videobuf2/videobuf2-v4l2.c   |  8 +++-
 drivers/media/v4l2-core/v4l2-mem2mem.c        | 35 ++++++++++++++
 drivers/staging/media/sunxi/cedrus/cedrus.h   |  1 +
 .../staging/media/sunxi/cedrus/cedrus_dec.c   | 11 +++++
 .../staging/media/sunxi/cedrus/cedrus_h264.c  | 11 ++++-
 .../staging/media/sunxi/cedrus/cedrus_hw.c    |  8 ++--
 .../staging/media/sunxi/cedrus/cedrus_video.c | 14 ++++++
 include/media/v4l2-mem2mem.h                  | 46 +++++++++++++++++++
 include/media/videobuf2-core.h                |  3 ++
 include/media/videobuf2-v4l2.h                |  5 ++
 include/uapi/linux/videodev2.h                | 14 ++++--
 15 files changed, 175 insertions(+), 11 deletions(-)

Comments

Hans Verkuil Sept. 30, 2019, 8:10 a.m. UTC | #1
On 9/29/19 10:00 PM, Jernej Skrabec wrote:
> This series adds support for decoding multi-slice H264 frames along with
> support for V4L2_DEC_CMD_FLUSH and V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF.
> 
> Code was tested by modified ffmpeg, which can be found here:
> https://github.com/jernejsk/FFmpeg, branch mainline-test
> It has to be configured with at least following options:
> --enable-v4l2-request --enable-libudev --enable-libdrm
> 
> Samples used for testing:
> http://jernej.libreelec.tv/videos/h264/BA1_FT_C.mp4
> http://jernej.libreelec.tv/videos/h264/h264.mp4
> 
> Command line used for testing:
> ffmpeg -hwaccel drm -hwaccel_device /dev/dri/card0 -i h264.mp4 -pix_fmt bgra -f fbdev /dev/fb0
> 
> Please note that V4L2_DEC_CMD_FLUSH was not tested because I'm
> not sure how. ffmpeg follows exactly which slice is last in frame
> and sets hold flag accordingly. Improper usage of hold flag would
> corrupt ffmpeg assumptions and it would probably crash. Any ideas
> how to test this are welcome!

It can be tested partially at least: if ffmpeg tells you it is the last
slice, then queue the slice with the HOLD flag set, then call FLUSH afterwards.
This should clear the HOLD flag again. In this case the queued buffer is
probably not yet processed, so the flag is cleared before the decode job starts.

You can also try to add a sleep before calling FLUSH to see what happens
if the last queued buffer is already decoded.

Regards,

	Hans

> 
> Thanks to Jonas for adjusting ffmpeg.
> 
> Please let me know what you think.
> 
> Best regards,
> Jernej
> 
> Changes from v1:
> - added Rb tags
> - updated V4L2_DEC_CMD_FLUSH documentation
> - updated first slice detection in Cedrus
> - hold capture buffer flag is set according to source format
> - added v4l m2m stateless_(try_)decoder_cmd ioctl helpers
> 
> Hans Verkuil (2):
>   vb2: add V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF
>   videodev2.h: add V4L2_DEC_CMD_FLUSH
> 
> Jernej Skrabec (4):
>   media: v4l2-mem2mem: add stateless_(try_)decoder_cmd ioctl helpers
>   media: cedrus: Detect first slice of a frame
>   media: cedrus: h264: Support multiple slices per frame
>   media: cedrus: Add support for holding capture buffer
> 
>  Documentation/media/uapi/v4l/buffer.rst       | 13 ++++++
>  .../media/uapi/v4l/vidioc-decoder-cmd.rst     | 10 +++-
>  .../media/uapi/v4l/vidioc-reqbufs.rst         |  6 +++
>  .../media/videodev2.h.rst.exceptions          |  1 +
>  .../media/common/videobuf2/videobuf2-v4l2.c   |  8 +++-
>  drivers/media/v4l2-core/v4l2-mem2mem.c        | 35 ++++++++++++++
>  drivers/staging/media/sunxi/cedrus/cedrus.h   |  1 +
>  .../staging/media/sunxi/cedrus/cedrus_dec.c   | 11 +++++
>  .../staging/media/sunxi/cedrus/cedrus_h264.c  | 11 ++++-
>  .../staging/media/sunxi/cedrus/cedrus_hw.c    |  8 ++--
>  .../staging/media/sunxi/cedrus/cedrus_video.c | 14 ++++++
>  include/media/v4l2-mem2mem.h                  | 46 +++++++++++++++++++
>  include/media/videobuf2-core.h                |  3 ++
>  include/media/videobuf2-v4l2.h                |  5 ++
>  include/uapi/linux/videodev2.h                | 14 ++++--
>  15 files changed, 175 insertions(+), 11 deletions(-)
>
Jernej Škrabec Sept. 30, 2019, 10:27 p.m. UTC | #2
Dne ponedeljek, 30. september 2019 ob 10:10:48 CEST je Hans Verkuil 
napisal(a):
> On 9/29/19 10:00 PM, Jernej Skrabec wrote:
> > This series adds support for decoding multi-slice H264 frames along with
> > support for V4L2_DEC_CMD_FLUSH and V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF.
> > 
> > Code was tested by modified ffmpeg, which can be found here:
> > https://github.com/jernejsk/FFmpeg, branch mainline-test
> > It has to be configured with at least following options:
> > --enable-v4l2-request --enable-libudev --enable-libdrm
> > 
> > Samples used for testing:
> > http://jernej.libreelec.tv/videos/h264/BA1_FT_C.mp4
> > http://jernej.libreelec.tv/videos/h264/h264.mp4
> > 
> > Command line used for testing:
> > ffmpeg -hwaccel drm -hwaccel_device /dev/dri/card0 -i h264.mp4 -pix_fmt
> > bgra -f fbdev /dev/fb0
> > 
> > Please note that V4L2_DEC_CMD_FLUSH was not tested because I'm
> > not sure how. ffmpeg follows exactly which slice is last in frame
> > and sets hold flag accordingly. Improper usage of hold flag would
> > corrupt ffmpeg assumptions and it would probably crash. Any ideas
> > how to test this are welcome!
> 
> It can be tested partially at least: if ffmpeg tells you it is the last
> slice, then queue the slice with the HOLD flag set, then call FLUSH
> afterwards. This should clear the HOLD flag again. In this case the queued
> buffer is probably not yet processed, so the flag is cleared before the
> decode job starts.
> 
> You can also try to add a sleep before calling FLUSH to see what happens
> if the last queued buffer is already decoded.

Ok, I tried following code:
https://github.com/jernejsk/FFmpeg/blob/flush_test/libavcodec/
v4l2_request.c#L220-L233

But results are not good. It seems that out_vb in flush command is always NULL 
and so it always marks capture buffer as done, which leads to kernel warnings.

dmesg output with debugging messages is here: http://ix.io/1Ks8

Currently I'm not sure what is going on. Shouldn't be output buffer queued and 
waiting to MEDIA_REQUEST_IOC_QUEUE which is executed after flush command as it 
can be seen from ffmpeg code linked above?

Best regards,
Jernej

> 
> Regards,
> 
> 	Hans
> 
> > Thanks to Jonas for adjusting ffmpeg.
> > 
> > Please let me know what you think.
> > 
> > Best regards,
> > Jernej
> > 
> > Changes from v1:
> > - added Rb tags
> > - updated V4L2_DEC_CMD_FLUSH documentation
> > - updated first slice detection in Cedrus
> > - hold capture buffer flag is set according to source format
> > - added v4l m2m stateless_(try_)decoder_cmd ioctl helpers
> > 
> > Hans Verkuil (2):
> >   vb2: add V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF
> >   videodev2.h: add V4L2_DEC_CMD_FLUSH
> > 
> > Jernej Skrabec (4):
> >   media: v4l2-mem2mem: add stateless_(try_)decoder_cmd ioctl helpers
> >   media: cedrus: Detect first slice of a frame
> >   media: cedrus: h264: Support multiple slices per frame
> >   media: cedrus: Add support for holding capture buffer
> >  
> >  Documentation/media/uapi/v4l/buffer.rst       | 13 ++++++
> >  .../media/uapi/v4l/vidioc-decoder-cmd.rst     | 10 +++-
> >  .../media/uapi/v4l/vidioc-reqbufs.rst         |  6 +++
> >  .../media/videodev2.h.rst.exceptions          |  1 +
> >  .../media/common/videobuf2/videobuf2-v4l2.c   |  8 +++-
> >  drivers/media/v4l2-core/v4l2-mem2mem.c        | 35 ++++++++++++++
> >  drivers/staging/media/sunxi/cedrus/cedrus.h   |  1 +
> >  .../staging/media/sunxi/cedrus/cedrus_dec.c   | 11 +++++
> >  .../staging/media/sunxi/cedrus/cedrus_h264.c  | 11 ++++-
> >  .../staging/media/sunxi/cedrus/cedrus_hw.c    |  8 ++--
> >  .../staging/media/sunxi/cedrus/cedrus_video.c | 14 ++++++
> >  include/media/v4l2-mem2mem.h                  | 46 +++++++++++++++++++
> >  include/media/videobuf2-core.h                |  3 ++
> >  include/media/videobuf2-v4l2.h                |  5 ++
> >  include/uapi/linux/videodev2.h                | 14 ++++--
> >  15 files changed, 175 insertions(+), 11 deletions(-)
Hans Verkuil Sept. 30, 2019, 10:43 p.m. UTC | #3
On 10/1/19 12:27 AM, Jernej Škrabec wrote:
> Dne ponedeljek, 30. september 2019 ob 10:10:48 CEST je Hans Verkuil 
> napisal(a):
>> On 9/29/19 10:00 PM, Jernej Skrabec wrote:
>>> This series adds support for decoding multi-slice H264 frames along with
>>> support for V4L2_DEC_CMD_FLUSH and V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF.
>>>
>>> Code was tested by modified ffmpeg, which can be found here:
>>> https://github.com/jernejsk/FFmpeg, branch mainline-test
>>> It has to be configured with at least following options:
>>> --enable-v4l2-request --enable-libudev --enable-libdrm
>>>
>>> Samples used for testing:
>>> http://jernej.libreelec.tv/videos/h264/BA1_FT_C.mp4
>>> http://jernej.libreelec.tv/videos/h264/h264.mp4
>>>
>>> Command line used for testing:
>>> ffmpeg -hwaccel drm -hwaccel_device /dev/dri/card0 -i h264.mp4 -pix_fmt
>>> bgra -f fbdev /dev/fb0
>>>
>>> Please note that V4L2_DEC_CMD_FLUSH was not tested because I'm
>>> not sure how. ffmpeg follows exactly which slice is last in frame
>>> and sets hold flag accordingly. Improper usage of hold flag would
>>> corrupt ffmpeg assumptions and it would probably crash. Any ideas
>>> how to test this are welcome!
>>
>> It can be tested partially at least: if ffmpeg tells you it is the last
>> slice, then queue the slice with the HOLD flag set, then call FLUSH
>> afterwards. This should clear the HOLD flag again. In this case the queued
>> buffer is probably not yet processed, so the flag is cleared before the
>> decode job starts.
>>
>> You can also try to add a sleep before calling FLUSH to see what happens
>> if the last queued buffer is already decoded.
> 
> Ok, I tried following code:
> https://github.com/jernejsk/FFmpeg/blob/flush_test/libavcodec/
> v4l2_request.c#L220-L233
> 
> But results are not good. It seems that out_vb in flush command is always NULL 
> and so it always marks capture buffer as done, which leads to kernel warnings.
> 
> dmesg output with debugging messages is here: http://ix.io/1Ks8
> 
> Currently I'm not sure what is going on. Shouldn't be output buffer queued and 
> waiting to MEDIA_REQUEST_IOC_QUEUE which is executed after flush command as it 
> can be seen from ffmpeg code linked above?

Argh, I forgot about the fact that this uses requests.

The FLUSH should happen *after* the MEDIA_REQUEST_IOC_QUEUE ioctl. Otherwise
it has no effect. As long as the request hasn't been queued, the buffer is also
not queued to the driver, so out_vb will indeed be NULL.

Sorry for the confusion.

Regards,

	Hans

> 
> Best regards,
> Jernej
> 
>>
>> Regards,
>>
>> 	Hans
>>
>>> Thanks to Jonas for adjusting ffmpeg.
>>>
>>> Please let me know what you think.
>>>
>>> Best regards,
>>> Jernej
>>>
>>> Changes from v1:
>>> - added Rb tags
>>> - updated V4L2_DEC_CMD_FLUSH documentation
>>> - updated first slice detection in Cedrus
>>> - hold capture buffer flag is set according to source format
>>> - added v4l m2m stateless_(try_)decoder_cmd ioctl helpers
>>>
>>> Hans Verkuil (2):
>>>   vb2: add V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF
>>>   videodev2.h: add V4L2_DEC_CMD_FLUSH
>>>
>>> Jernej Skrabec (4):
>>>   media: v4l2-mem2mem: add stateless_(try_)decoder_cmd ioctl helpers
>>>   media: cedrus: Detect first slice of a frame
>>>   media: cedrus: h264: Support multiple slices per frame
>>>   media: cedrus: Add support for holding capture buffer
>>>  
>>>  Documentation/media/uapi/v4l/buffer.rst       | 13 ++++++
>>>  .../media/uapi/v4l/vidioc-decoder-cmd.rst     | 10 +++-
>>>  .../media/uapi/v4l/vidioc-reqbufs.rst         |  6 +++
>>>  .../media/videodev2.h.rst.exceptions          |  1 +
>>>  .../media/common/videobuf2/videobuf2-v4l2.c   |  8 +++-
>>>  drivers/media/v4l2-core/v4l2-mem2mem.c        | 35 ++++++++++++++
>>>  drivers/staging/media/sunxi/cedrus/cedrus.h   |  1 +
>>>  .../staging/media/sunxi/cedrus/cedrus_dec.c   | 11 +++++
>>>  .../staging/media/sunxi/cedrus/cedrus_h264.c  | 11 ++++-
>>>  .../staging/media/sunxi/cedrus/cedrus_hw.c    |  8 ++--
>>>  .../staging/media/sunxi/cedrus/cedrus_video.c | 14 ++++++
>>>  include/media/v4l2-mem2mem.h                  | 46 +++++++++++++++++++
>>>  include/media/videobuf2-core.h                |  3 ++
>>>  include/media/videobuf2-v4l2.h                |  5 ++
>>>  include/uapi/linux/videodev2.h                | 14 ++++--
>>>  15 files changed, 175 insertions(+), 11 deletions(-)
> 
> 
> 
>
Jernej Škrabec Sept. 30, 2019, 10:58 p.m. UTC | #4
Dne torek, 01. oktober 2019 ob 00:43:34 CEST je Hans Verkuil napisal(a):
> On 10/1/19 12:27 AM, Jernej Škrabec wrote:
> > Dne ponedeljek, 30. september 2019 ob 10:10:48 CEST je Hans Verkuil
> > 
> > napisal(a):
> >> On 9/29/19 10:00 PM, Jernej Skrabec wrote:
> >>> This series adds support for decoding multi-slice H264 frames along with
> >>> support for V4L2_DEC_CMD_FLUSH and V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF.
> >>> 
> >>> Code was tested by modified ffmpeg, which can be found here:
> >>> https://github.com/jernejsk/FFmpeg, branch mainline-test
> >>> It has to be configured with at least following options:
> >>> --enable-v4l2-request --enable-libudev --enable-libdrm
> >>> 
> >>> Samples used for testing:
> >>> http://jernej.libreelec.tv/videos/h264/BA1_FT_C.mp4
> >>> http://jernej.libreelec.tv/videos/h264/h264.mp4
> >>> 
> >>> Command line used for testing:
> >>> ffmpeg -hwaccel drm -hwaccel_device /dev/dri/card0 -i h264.mp4 -pix_fmt
> >>> bgra -f fbdev /dev/fb0
> >>> 
> >>> Please note that V4L2_DEC_CMD_FLUSH was not tested because I'm
> >>> not sure how. ffmpeg follows exactly which slice is last in frame
> >>> and sets hold flag accordingly. Improper usage of hold flag would
> >>> corrupt ffmpeg assumptions and it would probably crash. Any ideas
> >>> how to test this are welcome!
> >> 
> >> It can be tested partially at least: if ffmpeg tells you it is the last
> >> slice, then queue the slice with the HOLD flag set, then call FLUSH
> >> afterwards. This should clear the HOLD flag again. In this case the
> >> queued
> >> buffer is probably not yet processed, so the flag is cleared before the
> >> decode job starts.
> >> 
> >> You can also try to add a sleep before calling FLUSH to see what happens
> >> if the last queued buffer is already decoded.
> > 
> > Ok, I tried following code:
> > https://github.com/jernejsk/FFmpeg/blob/flush_test/libavcodec/
> > v4l2_request.c#L220-L233
> > 
> > But results are not good. It seems that out_vb in flush command is always
> > NULL and so it always marks capture buffer as done, which leads to kernel
> > warnings.
> > 
> > dmesg output with debugging messages is here: http://ix.io/1Ks8
> > 
> > Currently I'm not sure what is going on. Shouldn't be output buffer queued
> > and waiting to MEDIA_REQUEST_IOC_QUEUE which is executed after flush
> > command as it can be seen from ffmpeg code linked above?
> 
> Argh, I forgot about the fact that this uses requests.
> 
> The FLUSH should happen *after* the MEDIA_REQUEST_IOC_QUEUE ioctl. Otherwise
> it has no effect. As long as the request hasn't been queued, the buffer is
> also not queued to the driver, so out_vb will indeed be NULL.

Well, flush cmd has effect if it is called before MEDIA_REQUEST_IOC_QUEUE ioctl 
as it can be seen from linked dmesg output. I guess there is nothing that we 
can do to prevent wrong usage?

BTW, if capture buffer is marked as done, shouldn't be also removed from the 
queue?

Best regards,
Jernej

> 
> Sorry for the confusion.
> 
> Regards,
> 
> 	Hans
> 
> > Best regards,
> > Jernej
> > 
> >> Regards,
> >> 
> >> 	Hans
> >> 	
> >>> Thanks to Jonas for adjusting ffmpeg.
> >>> 
> >>> Please let me know what you think.
> >>> 
> >>> Best regards,
> >>> Jernej
> >>> 
> >>> Changes from v1:
> >>> - added Rb tags
> >>> - updated V4L2_DEC_CMD_FLUSH documentation
> >>> - updated first slice detection in Cedrus
> >>> - hold capture buffer flag is set according to source format
> >>> - added v4l m2m stateless_(try_)decoder_cmd ioctl helpers
> >>> 
> >>> Hans Verkuil (2):
> >>>   vb2: add V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF
> >>>   videodev2.h: add V4L2_DEC_CMD_FLUSH
> >>> 
> >>> Jernej Skrabec (4):
> >>>   media: v4l2-mem2mem: add stateless_(try_)decoder_cmd ioctl helpers
> >>>   media: cedrus: Detect first slice of a frame
> >>>   media: cedrus: h264: Support multiple slices per frame
> >>>   media: cedrus: Add support for holding capture buffer
> >>>  
> >>>  Documentation/media/uapi/v4l/buffer.rst       | 13 ++++++
> >>>  .../media/uapi/v4l/vidioc-decoder-cmd.rst     | 10 +++-
> >>>  .../media/uapi/v4l/vidioc-reqbufs.rst         |  6 +++
> >>>  .../media/videodev2.h.rst.exceptions          |  1 +
> >>>  .../media/common/videobuf2/videobuf2-v4l2.c   |  8 +++-
> >>>  drivers/media/v4l2-core/v4l2-mem2mem.c        | 35 ++++++++++++++
> >>>  drivers/staging/media/sunxi/cedrus/cedrus.h   |  1 +
> >>>  .../staging/media/sunxi/cedrus/cedrus_dec.c   | 11 +++++
> >>>  .../staging/media/sunxi/cedrus/cedrus_h264.c  | 11 ++++-
> >>>  .../staging/media/sunxi/cedrus/cedrus_hw.c    |  8 ++--
> >>>  .../staging/media/sunxi/cedrus/cedrus_video.c | 14 ++++++
> >>>  include/media/v4l2-mem2mem.h                  | 46 +++++++++++++++++++
> >>>  include/media/videobuf2-core.h                |  3 ++
> >>>  include/media/videobuf2-v4l2.h                |  5 ++
> >>>  include/uapi/linux/videodev2.h                | 14 ++++--
> >>>  15 files changed, 175 insertions(+), 11 deletions(-)
Jernej Škrabec Oct. 1, 2019, 5:33 a.m. UTC | #5
Dne torek, 01. oktober 2019 ob 00:43:34 CEST je Hans Verkuil napisal(a):
> On 10/1/19 12:27 AM, Jernej Škrabec wrote:
> > Dne ponedeljek, 30. september 2019 ob 10:10:48 CEST je Hans Verkuil
> > 
> > napisal(a):
> >> On 9/29/19 10:00 PM, Jernej Skrabec wrote:
> >>> This series adds support for decoding multi-slice H264 frames along with
> >>> support for V4L2_DEC_CMD_FLUSH and V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF.
> >>> 
> >>> Code was tested by modified ffmpeg, which can be found here:
> >>> https://github.com/jernejsk/FFmpeg, branch mainline-test
> >>> It has to be configured with at least following options:
> >>> --enable-v4l2-request --enable-libudev --enable-libdrm
> >>> 
> >>> Samples used for testing:
> >>> http://jernej.libreelec.tv/videos/h264/BA1_FT_C.mp4
> >>> http://jernej.libreelec.tv/videos/h264/h264.mp4
> >>> 
> >>> Command line used for testing:
> >>> ffmpeg -hwaccel drm -hwaccel_device /dev/dri/card0 -i h264.mp4 -pix_fmt
> >>> bgra -f fbdev /dev/fb0
> >>> 
> >>> Please note that V4L2_DEC_CMD_FLUSH was not tested because I'm
> >>> not sure how. ffmpeg follows exactly which slice is last in frame
> >>> and sets hold flag accordingly. Improper usage of hold flag would
> >>> corrupt ffmpeg assumptions and it would probably crash. Any ideas
> >>> how to test this are welcome!
> >> 
> >> It can be tested partially at least: if ffmpeg tells you it is the last
> >> slice, then queue the slice with the HOLD flag set, then call FLUSH
> >> afterwards. This should clear the HOLD flag again. In this case the
> >> queued
> >> buffer is probably not yet processed, so the flag is cleared before the
> >> decode job starts.
> >> 
> >> You can also try to add a sleep before calling FLUSH to see what happens
> >> if the last queued buffer is already decoded.
> > 
> > Ok, I tried following code:
> > https://github.com/jernejsk/FFmpeg/blob/flush_test/libavcodec/
> > v4l2_request.c#L220-L233
> > 
> > But results are not good. It seems that out_vb in flush command is always
> > NULL and so it always marks capture buffer as done, which leads to kernel
> > warnings.
> > 
> > dmesg output with debugging messages is here: http://ix.io/1Ks8
> > 
> > Currently I'm not sure what is going on. Shouldn't be output buffer queued
> > and waiting to MEDIA_REQUEST_IOC_QUEUE which is executed after flush
> > command as it can be seen from ffmpeg code linked above?
> 
> Argh, I forgot about the fact that this uses requests.
> 
> The FLUSH should happen *after* the MEDIA_REQUEST_IOC_QUEUE ioctl. Otherwise
> it has no effect. As long as the request hasn't been queued, the buffer is
> also not queued to the driver, so out_vb will indeed be NULL.

It's better, but still not working. Currently ffmpeg sometimes reports such 
messages: https://pastebin.com/raw/9tVVtc20 This is dmesg output: http://
ix.io/1L1L

It seems to me like a race condition. Sometimes flush works as indendent and 
sometimes it influences next frame.

Best regards,
Jernje

> 
> Sorry for the confusion.
> 
> Regards,
> 
> 	Hans
> 
> > Best regards,
> > Jernej
> > 
> >> Regards,
> >> 
> >> 	Hans
> >> 	
> >>> Thanks to Jonas for adjusting ffmpeg.
> >>> 
> >>> Please let me know what you think.
> >>> 
> >>> Best regards,
> >>> Jernej
> >>> 
> >>> Changes from v1:
> >>> - added Rb tags
> >>> - updated V4L2_DEC_CMD_FLUSH documentation
> >>> - updated first slice detection in Cedrus
> >>> - hold capture buffer flag is set according to source format
> >>> - added v4l m2m stateless_(try_)decoder_cmd ioctl helpers
> >>> 
> >>> Hans Verkuil (2):
> >>>   vb2: add V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF
> >>>   videodev2.h: add V4L2_DEC_CMD_FLUSH
> >>> 
> >>> Jernej Skrabec (4):
> >>>   media: v4l2-mem2mem: add stateless_(try_)decoder_cmd ioctl helpers
> >>>   media: cedrus: Detect first slice of a frame
> >>>   media: cedrus: h264: Support multiple slices per frame
> >>>   media: cedrus: Add support for holding capture buffer
> >>>  
> >>>  Documentation/media/uapi/v4l/buffer.rst       | 13 ++++++
> >>>  .../media/uapi/v4l/vidioc-decoder-cmd.rst     | 10 +++-
> >>>  .../media/uapi/v4l/vidioc-reqbufs.rst         |  6 +++
> >>>  .../media/videodev2.h.rst.exceptions          |  1 +
> >>>  .../media/common/videobuf2/videobuf2-v4l2.c   |  8 +++-
> >>>  drivers/media/v4l2-core/v4l2-mem2mem.c        | 35 ++++++++++++++
> >>>  drivers/staging/media/sunxi/cedrus/cedrus.h   |  1 +
> >>>  .../staging/media/sunxi/cedrus/cedrus_dec.c   | 11 +++++
> >>>  .../staging/media/sunxi/cedrus/cedrus_h264.c  | 11 ++++-
> >>>  .../staging/media/sunxi/cedrus/cedrus_hw.c    |  8 ++--
> >>>  .../staging/media/sunxi/cedrus/cedrus_video.c | 14 ++++++
> >>>  include/media/v4l2-mem2mem.h                  | 46 +++++++++++++++++++
> >>>  include/media/videobuf2-core.h                |  3 ++
> >>>  include/media/videobuf2-v4l2.h                |  5 ++
> >>>  include/uapi/linux/videodev2.h                | 14 ++++--
> >>>  15 files changed, 175 insertions(+), 11 deletions(-)
Hans Verkuil Oct. 7, 2019, 10:44 a.m. UTC | #6
Hi Jernej,

On 9/29/19 10:00 PM, Jernej Skrabec wrote:
> This series adds support for decoding multi-slice H264 frames along with
> support for V4L2_DEC_CMD_FLUSH and V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF.
> 
> Code was tested by modified ffmpeg, which can be found here:
> https://github.com/jernejsk/FFmpeg, branch mainline-test
> It has to be configured with at least following options:
> --enable-v4l2-request --enable-libudev --enable-libdrm
> 
> Samples used for testing:
> http://jernej.libreelec.tv/videos/h264/BA1_FT_C.mp4
> http://jernej.libreelec.tv/videos/h264/h264.mp4
> 
> Command line used for testing:
> ffmpeg -hwaccel drm -hwaccel_device /dev/dri/card0 -i h264.mp4 -pix_fmt bgra -f fbdev /dev/fb0
> 
> Please note that V4L2_DEC_CMD_FLUSH was not tested because I'm
> not sure how. ffmpeg follows exactly which slice is last in frame
> and sets hold flag accordingly. Improper usage of hold flag would
> corrupt ffmpeg assumptions and it would probably crash. Any ideas
> how to test this are welcome!
> 
> Thanks to Jonas for adjusting ffmpeg.
> 
> Please let me know what you think.
> 
> Best regards,
> Jernej
> 
> Changes from v1:
> - added Rb tags
> - updated V4L2_DEC_CMD_FLUSH documentation
> - updated first slice detection in Cedrus
> - hold capture buffer flag is set according to source format
> - added v4l m2m stateless_(try_)decoder_cmd ioctl helpers
> 
> Hans Verkuil (2):
>   vb2: add V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF
>   videodev2.h: add V4L2_DEC_CMD_FLUSH
> 
> Jernej Skrabec (4):
>   media: v4l2-mem2mem: add stateless_(try_)decoder_cmd ioctl helpers
>   media: cedrus: Detect first slice of a frame
>   media: cedrus: h264: Support multiple slices per frame
>   media: cedrus: Add support for holding capture buffer
> 
>  Documentation/media/uapi/v4l/buffer.rst       | 13 ++++++
>  .../media/uapi/v4l/vidioc-decoder-cmd.rst     | 10 +++-
>  .../media/uapi/v4l/vidioc-reqbufs.rst         |  6 +++
>  .../media/videodev2.h.rst.exceptions          |  1 +
>  .../media/common/videobuf2/videobuf2-v4l2.c   |  8 +++-
>  drivers/media/v4l2-core/v4l2-mem2mem.c        | 35 ++++++++++++++
>  drivers/staging/media/sunxi/cedrus/cedrus.h   |  1 +
>  .../staging/media/sunxi/cedrus/cedrus_dec.c   | 11 +++++
>  .../staging/media/sunxi/cedrus/cedrus_h264.c  | 11 ++++-
>  .../staging/media/sunxi/cedrus/cedrus_hw.c    |  8 ++--
>  .../staging/media/sunxi/cedrus/cedrus_video.c | 14 ++++++
>  include/media/v4l2-mem2mem.h                  | 46 +++++++++++++++++++
>  include/media/videobuf2-core.h                |  3 ++
>  include/media/videobuf2-v4l2.h                |  5 ++
>  include/uapi/linux/videodev2.h                | 14 ++++--
>  15 files changed, 175 insertions(+), 11 deletions(-)
> 

I didn't want to make a v3 of this series, instead I hacked this patch that will
hopefully do all the locking right.

Basically I moved all the 'held' related code into v4l2-mem2mem under job_spinlock.
This simplifies the driver code as well.

But this is a hack that sits on top of this series. If your ffmpeg tests are now
successful, then I'll turn this into a proper series with correct documentation
(a lot of the comments are now wrong with this patch, so just ignore that).

Regards,

	Hans

diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
index 2677a07e4c9b..f81a8f2465ab 100644
--- a/drivers/media/v4l2-core/v4l2-mem2mem.c
+++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
@@ -412,25 +412,24 @@ static void v4l2_m2m_cancel_job(struct v4l2_m2m_ctx *m2m_ctx)
 	}
 }

-void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
-			 struct v4l2_m2m_ctx *m2m_ctx)
+static bool _v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
+			  struct v4l2_m2m_ctx *m2m_ctx)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
 	if (!m2m_dev->curr_ctx || m2m_dev->curr_ctx != m2m_ctx) {
-		spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
 		dprintk("Called by an instance not currently running\n");
-		return;
+		return false;
 	}

 	list_del(&m2m_dev->curr_ctx->queue);
 	m2m_dev->curr_ctx->job_flags &= ~(TRANS_QUEUED | TRANS_RUNNING);
 	wake_up(&m2m_dev->curr_ctx->finished);
 	m2m_dev->curr_ctx = NULL;
+	return true;
+}

-	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
-
+static void v4l2_m2m_job_next(struct v4l2_m2m_dev *m2m_dev,
+		       struct v4l2_m2m_ctx *m2m_ctx)
+{
 	/* This instance might have more buffers ready, but since we do not
 	 * allow more than one job on the job_queue per instance, each has
 	 * to be scheduled separately after the previous one finishes. */
@@ -441,8 +440,113 @@ void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
 	 */
 	schedule_work(&m2m_dev->job_work);
 }
+
+void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
+			 struct v4l2_m2m_ctx *m2m_ctx)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
+	if (!_v4l2_m2m_job_finish(m2m_dev, m2m_ctx)) {
+		spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
+		return;
+	}
+	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
+
+	v4l2_m2m_job_next(m2m_dev, m2m_ctx);
+}
 EXPORT_SYMBOL(v4l2_m2m_job_finish);

+void v4l2_m2m_job_finish_held(struct v4l2_m2m_dev *m2m_dev,
+			 struct v4l2_m2m_ctx *m2m_ctx,
+			 enum vb2_buffer_state state)
+{
+	struct vb2_v4l2_buffer *src_buf, *dst_buf;
+	unsigned long flags;
+
+	spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
+	src_buf = v4l2_m2m_src_buf_remove(m2m_ctx);
+	dst_buf = v4l2_m2m_next_dst_buf(m2m_ctx);
+
+	if (!src_buf || !dst_buf) {
+		pr_err("Missing source and/or destination buffers\n");
+		spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
+		return;
+	}
+	v4l2_m2m_buf_done(src_buf, state);
+	if (!dst_buf->is_held) {
+		v4l2_m2m_dst_buf_remove(m2m_ctx);
+		v4l2_m2m_buf_done(dst_buf, state);
+	}
+	if (!_v4l2_m2m_job_finish(m2m_dev, m2m_ctx)) {
+		spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
+		return;
+	}
+	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
+
+	v4l2_m2m_job_next(m2m_dev, m2m_ctx);
+}
+EXPORT_SYMBOL(v4l2_m2m_job_finish_held);
+
+/**
+ * v4l2_m2m_release_capture_buf() - check if the capture buffer should be
+ * released
+ *
+ * @out_vb: the output buffer
+ * @cap_vb: the capture buffer
+ *
+ * This helper function returns true if the current capture buffer should
+ * be released to vb2. This is the case if the output buffer specified that
+ * the capture buffer should be held (i.e. not returned to vb2) AND if the
+ * timestamp of the capture buffer differs from the output buffer timestamp.
+ *
+ * This helper is to be called at the start of the device_run callback:
+ *
+ * .. code-block:: c
+ *
+ *	if (v4l2_m2m_release_capture_buf(out_vb, cap_vb)) {
+ *		v4l2_m2m_dst_buf_remove(m2m_ctx);
+ *		v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
+ *		cap_vb = v4l2_m2m_next_dst_buf(m2m_ctx);
+ *	}
+ *	cap_vb->is_held = out_vb->flags & V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
+ *
+ *	...
+ *
+ *	v4l2_m2m_buf_done(out_vb, VB2_BUF_STATE_DONE);
+ *	if (!cap_vb->is_held) {
+ *		v4l2_m2m_dst_buf_remove(m2m_ctx);
+ *		v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
+ *	}
+ *
+ * This allows for multiple output buffers to be used to fill in a single
+ * capture buffer. This is typically used by stateless decoders where
+ * multiple e.g. H.264 slices contribute to a single decoded frame.
+ */
+struct vb2_v4l2_buffer *v4l2_m2m_release_capture_buf(struct v4l2_m2m_ctx *m2m_ctx)
+{
+	struct v4l2_m2m_dev *m2m_dev = m2m_ctx->m2m_dev;
+	struct vb2_v4l2_buffer *src, *dst;
+	unsigned long flags;
+
+	spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
+	src = v4l2_m2m_next_src_buf(m2m_ctx);
+	dst = v4l2_m2m_next_dst_buf(m2m_ctx);
+
+	if (dst->is_held && dst->vb2_buf.copied_timestamp &&
+	    src->vb2_buf.timestamp != dst->vb2_buf.timestamp) {
+		dst->is_held = false;
+		v4l2_m2m_dst_buf_remove(m2m_ctx);
+		v4l2_m2m_buf_done(dst, VB2_BUF_STATE_DONE);
+		dst = v4l2_m2m_next_dst_buf(m2m_ctx);
+	}
+	dst->is_held = src->flags & V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
+	src->flags &= ~V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
+	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
+	return dst;
+}
+EXPORT_SYMBOL(v4l2_m2m_release_capture_buf);
+
 int v4l2_m2m_reqbufs(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
 		     struct v4l2_requestbuffers *reqbufs)
 {
@@ -1171,19 +1275,28 @@ int v4l2_m2m_ioctl_stateless_decoder_cmd(struct file *file, void *priv,
 {
 	struct v4l2_fh *fh = file->private_data;
 	struct vb2_v4l2_buffer *out_vb, *cap_vb;
+	struct v4l2_m2m_dev *m2m_dev = fh->m2m_ctx->m2m_dev;
+	unsigned long flags;
 	int ret;

 	ret = v4l2_m2m_ioctl_stateless_try_decoder_cmd(file, priv, dc);
 	if (ret < 0)
 		return ret;

+	spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
 	out_vb = v4l2_m2m_last_src_buf(fh->m2m_ctx);
 	cap_vb = v4l2_m2m_last_dst_buf(fh->m2m_ctx);

-	if (out_vb)
+	if (out_vb && (out_vb->flags & V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF)) {
 		out_vb->flags &= ~V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
-	else if (cap_vb && cap_vb->is_held)
-		v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
+	} else if (cap_vb && cap_vb->is_held) {
+		cap_vb->is_held = false;
+		if (m2m_dev->curr_ctx) {
+			v4l2_m2m_dst_buf_remove(fh->m2m_ctx);
+			v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
+		}
+	}
+	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);

 	return 0;
 }
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
index 67f7d4326fc1..4e30f263b427 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
@@ -30,14 +30,7 @@ void cedrus_device_run(void *priv)
 	struct media_request *src_req;

 	run.src = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
-	run.dst = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
-
-	if (v4l2_m2m_release_capture_buf(run.src, run.dst)) {
-		v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
-		v4l2_m2m_buf_done(run.dst, VB2_BUF_STATE_DONE);
-		run.dst = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
-	}
-	run.dst->is_held = run.src->flags & V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
+	run.dst = v4l2_m2m_release_capture_buf(ctx->fh.m2m_ctx);

 	run.first_slice = !run.dst->vb2_buf.copied_timestamp ||
 		run.src->vb2_buf.timestamp != run.dst->vb2_buf.timestamp;
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
index 99fedec80224..242cad82cc8c 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
@@ -103,7 +103,6 @@ static irqreturn_t cedrus_irq(int irq, void *data)
 {
 	struct cedrus_dev *dev = data;
 	struct cedrus_ctx *ctx;
-	struct vb2_v4l2_buffer *src_buf, *dst_buf;
 	enum vb2_buffer_state state;
 	enum cedrus_irq_status status;

@@ -121,26 +120,12 @@ static irqreturn_t cedrus_irq(int irq, void *data)
 	dev->dec_ops[ctx->current_codec]->irq_disable(ctx);
 	dev->dec_ops[ctx->current_codec]->irq_clear(ctx);

-	src_buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
-	dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
-
-	if (!src_buf || !dst_buf) {
-		v4l2_err(&dev->v4l2_dev,
-			 "Missing source and/or destination buffers\n");
-		return IRQ_HANDLED;
-	}
-
 	if (status == CEDRUS_IRQ_ERROR)
 		state = VB2_BUF_STATE_ERROR;
 	else
 		state = VB2_BUF_STATE_DONE;

-	v4l2_m2m_buf_done(src_buf, state);
-	if (!dst_buf->is_held) {
-		v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
-		v4l2_m2m_buf_done(dst_buf, state);
-	}
-	v4l2_m2m_job_finish(ctx->dev->m2m_dev, ctx->fh.m2m_ctx);
+	v4l2_m2m_job_finish_held(ctx->dev->m2m_dev, ctx->fh.m2m_ctx, state);

 	return IRQ_HANDLED;
 }
diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
index 8ae2f56c7fa3..48ca7d3eaa3d 100644
--- a/include/media/v4l2-mem2mem.h
+++ b/include/media/v4l2-mem2mem.h
@@ -173,6 +173,10 @@ void v4l2_m2m_try_schedule(struct v4l2_m2m_ctx *m2m_ctx);
 void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
 			 struct v4l2_m2m_ctx *m2m_ctx);

+void v4l2_m2m_job_finish_held(struct v4l2_m2m_dev *m2m_dev,
+			 struct v4l2_m2m_ctx *m2m_ctx,
+			 enum vb2_buffer_state state);
+
 static inline void
 v4l2_m2m_buf_done(struct vb2_v4l2_buffer *buf, enum vb2_buffer_state state)
 {
@@ -644,47 +648,7 @@ void v4l2_m2m_buf_copy_metadata(const struct vb2_v4l2_buffer *out_vb,
 				struct vb2_v4l2_buffer *cap_vb,
 				bool copy_frame_flags);

-/**
- * v4l2_m2m_release_capture_buf() - check if the capture buffer should be
- * released
- *
- * @out_vb: the output buffer
- * @cap_vb: the capture buffer
- *
- * This helper function returns true if the current capture buffer should
- * be released to vb2. This is the case if the output buffer specified that
- * the capture buffer should be held (i.e. not returned to vb2) AND if the
- * timestamp of the capture buffer differs from the output buffer timestamp.
- *
- * This helper is to be called at the start of the device_run callback:
- *
- * .. code-block:: c
- *
- *	if (v4l2_m2m_release_capture_buf(out_vb, cap_vb)) {
- *		v4l2_m2m_dst_buf_remove(m2m_ctx);
- *		v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
- *		cap_vb = v4l2_m2m_next_dst_buf(m2m_ctx);
- *	}
- *	cap_vb->is_held = out_vb->flags & V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
- *
- *	...
- *
- *	v4l2_m2m_buf_done(out_vb, VB2_BUF_STATE_DONE);
- *	if (!cap_vb->is_held) {
- *		v4l2_m2m_dst_buf_remove(m2m_ctx);
- *		v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
- *	}
- *
- * This allows for multiple output buffers to be used to fill in a single
- * capture buffer. This is typically used by stateless decoders where
- * multiple e.g. H.264 slices contribute to a single decoded frame.
- */
-static inline bool v4l2_m2m_release_capture_buf(const struct vb2_v4l2_buffer *out_vb,
-						const struct vb2_v4l2_buffer *cap_vb)
-{
-	return cap_vb->is_held && cap_vb->vb2_buf.copied_timestamp &&
-	       out_vb->vb2_buf.timestamp != cap_vb->vb2_buf.timestamp;
-}
+struct vb2_v4l2_buffer *v4l2_m2m_release_capture_buf(struct v4l2_m2m_ctx *m2m_ctx);

 /* v4l2 request helper */
Jernej Škrabec Oct. 7, 2019, 7:01 p.m. UTC | #7
Dne ponedeljek, 07. oktober 2019 ob 12:44:24 CEST je Hans Verkuil napisal(a):
> Hi Jernej,
> 
> On 9/29/19 10:00 PM, Jernej Skrabec wrote:
> > This series adds support for decoding multi-slice H264 frames along with
> > support for V4L2_DEC_CMD_FLUSH and V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF.
> > 
> > Code was tested by modified ffmpeg, which can be found here:
> > https://github.com/jernejsk/FFmpeg, branch mainline-test
> > It has to be configured with at least following options:
> > --enable-v4l2-request --enable-libudev --enable-libdrm
> > 
> > Samples used for testing:
> > http://jernej.libreelec.tv/videos/h264/BA1_FT_C.mp4
> > http://jernej.libreelec.tv/videos/h264/h264.mp4
> > 
> > Command line used for testing:
> > ffmpeg -hwaccel drm -hwaccel_device /dev/dri/card0 -i h264.mp4 -pix_fmt
> > bgra -f fbdev /dev/fb0
> > 
> > Please note that V4L2_DEC_CMD_FLUSH was not tested because I'm
> > not sure how. ffmpeg follows exactly which slice is last in frame
> > and sets hold flag accordingly. Improper usage of hold flag would
> > corrupt ffmpeg assumptions and it would probably crash. Any ideas
> > how to test this are welcome!
> > 
> > Thanks to Jonas for adjusting ffmpeg.
> > 
> > Please let me know what you think.
> > 
> > Best regards,
> > Jernej
> > 
> > Changes from v1:
> > - added Rb tags
> > - updated V4L2_DEC_CMD_FLUSH documentation
> > - updated first slice detection in Cedrus
> > - hold capture buffer flag is set according to source format
> > - added v4l m2m stateless_(try_)decoder_cmd ioctl helpers
> > 
> > Hans Verkuil (2):
> >   vb2: add V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF
> >   videodev2.h: add V4L2_DEC_CMD_FLUSH
> > 
> > Jernej Skrabec (4):
> >   media: v4l2-mem2mem: add stateless_(try_)decoder_cmd ioctl helpers
> >   media: cedrus: Detect first slice of a frame
> >   media: cedrus: h264: Support multiple slices per frame
> >   media: cedrus: Add support for holding capture buffer
> >  
> >  Documentation/media/uapi/v4l/buffer.rst       | 13 ++++++
> >  .../media/uapi/v4l/vidioc-decoder-cmd.rst     | 10 +++-
> >  .../media/uapi/v4l/vidioc-reqbufs.rst         |  6 +++
> >  .../media/videodev2.h.rst.exceptions          |  1 +
> >  .../media/common/videobuf2/videobuf2-v4l2.c   |  8 +++-
> >  drivers/media/v4l2-core/v4l2-mem2mem.c        | 35 ++++++++++++++
> >  drivers/staging/media/sunxi/cedrus/cedrus.h   |  1 +
> >  .../staging/media/sunxi/cedrus/cedrus_dec.c   | 11 +++++
> >  .../staging/media/sunxi/cedrus/cedrus_h264.c  | 11 ++++-
> >  .../staging/media/sunxi/cedrus/cedrus_hw.c    |  8 ++--
> >  .../staging/media/sunxi/cedrus/cedrus_video.c | 14 ++++++
> >  include/media/v4l2-mem2mem.h                  | 46 +++++++++++++++++++
> >  include/media/videobuf2-core.h                |  3 ++
> >  include/media/videobuf2-v4l2.h                |  5 ++
> >  include/uapi/linux/videodev2.h                | 14 ++++--
> >  15 files changed, 175 insertions(+), 11 deletions(-)
> 
> I didn't want to make a v3 of this series, instead I hacked this patch that
> will hopefully do all the locking right.
> 
> Basically I moved all the 'held' related code into v4l2-mem2mem under
> job_spinlock. This simplifies the driver code as well.
> 
> But this is a hack that sits on top of this series. If your ffmpeg tests are
> now successful, then I'll turn this into a proper series with correct
> documentation (a lot of the comments are now wrong with this patch, so just
> ignore that).

Thanks for looking into this! With small fix mentioned below, it works! Note 
that both scenarios I tested (flushing during decoding and flushing after 
decoding is finished) are focused on capture queue. In order to trigger output 
queue flush, ffmpeg would need to queue multiple jobs and call flush before they 
are all processed. This is not something I can do at this time. Maybe Jonas 
can help with modifying ffmpeg appropriately. However, code for case seems 
correct to me.

> 
> Regards,
> 
> 	Hans
> 
> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c
> b/drivers/media/v4l2-core/v4l2-mem2mem.c index 2677a07e4c9b..f81a8f2465ab
> 100644
> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> @@ -412,25 +412,24 @@ static void v4l2_m2m_cancel_job(struct v4l2_m2m_ctx
> *m2m_ctx) }
>  }
> 
> -void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
> -			 struct v4l2_m2m_ctx *m2m_ctx)
> +static bool _v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
> +			  struct v4l2_m2m_ctx *m2m_ctx)
>  {
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
>  	if (!m2m_dev->curr_ctx || m2m_dev->curr_ctx != m2m_ctx) {
> -		spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
>  		dprintk("Called by an instance not currently 
running\n");
> -		return;
> +		return false;
>  	}
> 
>  	list_del(&m2m_dev->curr_ctx->queue);
>  	m2m_dev->curr_ctx->job_flags &= ~(TRANS_QUEUED | TRANS_RUNNING);
>  	wake_up(&m2m_dev->curr_ctx->finished);
>  	m2m_dev->curr_ctx = NULL;
> +	return true;
> +}
> 
> -	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> -
> +static void v4l2_m2m_job_next(struct v4l2_m2m_dev *m2m_dev,
> +		       struct v4l2_m2m_ctx *m2m_ctx)
> +{
>  	/* This instance might have more buffers ready, but since we do not
>  	 * allow more than one job on the job_queue per instance, each has
>  	 * to be scheduled separately after the previous one finishes. */
> @@ -441,8 +440,113 @@ void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
> */
>  	schedule_work(&m2m_dev->job_work);
>  }
> +
> +void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
> +			 struct v4l2_m2m_ctx *m2m_ctx)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
> +	if (!_v4l2_m2m_job_finish(m2m_dev, m2m_ctx)) {
> +		spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> +		return;
> +	}
> +	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> +
> +	v4l2_m2m_job_next(m2m_dev, m2m_ctx);
> +}
>  EXPORT_SYMBOL(v4l2_m2m_job_finish);
> 
> +void v4l2_m2m_job_finish_held(struct v4l2_m2m_dev *m2m_dev,
> +			 struct v4l2_m2m_ctx *m2m_ctx,
> +			 enum vb2_buffer_state state)
> +{
> +	struct vb2_v4l2_buffer *src_buf, *dst_buf;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
> +	src_buf = v4l2_m2m_src_buf_remove(m2m_ctx);
> +	dst_buf = v4l2_m2m_next_dst_buf(m2m_ctx);
> +
> +	if (!src_buf || !dst_buf) {
> +		pr_err("Missing source and/or destination buffers\n");
> +		spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> +		return;
> +	}
> +	v4l2_m2m_buf_done(src_buf, state);
> +	if (!dst_buf->is_held) {
> +		v4l2_m2m_dst_buf_remove(m2m_ctx);
> +		v4l2_m2m_buf_done(dst_buf, state);
> +	}
> +	if (!_v4l2_m2m_job_finish(m2m_dev, m2m_ctx)) {
> +		spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> +		return;
> +	}
> +	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> +
> +	v4l2_m2m_job_next(m2m_dev, m2m_ctx);
> +}
> +EXPORT_SYMBOL(v4l2_m2m_job_finish_held);
> +
> +/**
> + * v4l2_m2m_release_capture_buf() - check if the capture buffer should be
> + * released
> + *
> + * @out_vb: the output buffer
> + * @cap_vb: the capture buffer
> + *
> + * This helper function returns true if the current capture buffer should
> + * be released to vb2. This is the case if the output buffer specified that
> + * the capture buffer should be held (i.e. not returned to vb2) AND if the
> + * timestamp of the capture buffer differs from the output buffer
> timestamp. + *
> + * This helper is to be called at the start of the device_run callback:
> + *
> + * .. code-block:: c
> + *
> + *	if (v4l2_m2m_release_capture_buf(out_vb, cap_vb)) {
> + *		v4l2_m2m_dst_buf_remove(m2m_ctx);
> + *		v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
> + *		cap_vb = v4l2_m2m_next_dst_buf(m2m_ctx);
> + *	}
> + *	cap_vb->is_held = out_vb->flags & 
V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
> + *
> + *	...
> + *
> + *	v4l2_m2m_buf_done(out_vb, VB2_BUF_STATE_DONE);
> + *	if (!cap_vb->is_held) {
> + *		v4l2_m2m_dst_buf_remove(m2m_ctx);
> + *		v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
> + *	}
> + *
> + * This allows for multiple output buffers to be used to fill in a single
> + * capture buffer. This is typically used by stateless decoders where
> + * multiple e.g. H.264 slices contribute to a single decoded frame.
> + */
> +struct vb2_v4l2_buffer *v4l2_m2m_release_capture_buf(struct v4l2_m2m_ctx
> *m2m_ctx) +{
> +	struct v4l2_m2m_dev *m2m_dev = m2m_ctx->m2m_dev;
> +	struct vb2_v4l2_buffer *src, *dst;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
> +	src = v4l2_m2m_next_src_buf(m2m_ctx);
> +	dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> +
> +	if (dst->is_held && dst->vb2_buf.copied_timestamp &&
> +	    src->vb2_buf.timestamp != dst->vb2_buf.timestamp) {
> +		dst->is_held = false;
> +		v4l2_m2m_dst_buf_remove(m2m_ctx);
> +		v4l2_m2m_buf_done(dst, VB2_BUF_STATE_DONE);
> +		dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> +	}
> +	dst->is_held = src->flags & V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
> +	src->flags &= ~V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
> +	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> +	return dst;
> +}
> +EXPORT_SYMBOL(v4l2_m2m_release_capture_buf);
> +
>  int v4l2_m2m_reqbufs(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
>  		     struct v4l2_requestbuffers *reqbufs)
>  {
> @@ -1171,19 +1275,28 @@ int v4l2_m2m_ioctl_stateless_decoder_cmd(struct file
> *file, void *priv, {
>  	struct v4l2_fh *fh = file->private_data;
>  	struct vb2_v4l2_buffer *out_vb, *cap_vb;
> +	struct v4l2_m2m_dev *m2m_dev = fh->m2m_ctx->m2m_dev;
> +	unsigned long flags;
>  	int ret;
> 
>  	ret = v4l2_m2m_ioctl_stateless_try_decoder_cmd(file, priv, dc);
>  	if (ret < 0)
>  		return ret;
> 
> +	spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
>  	out_vb = v4l2_m2m_last_src_buf(fh->m2m_ctx);
>  	cap_vb = v4l2_m2m_last_dst_buf(fh->m2m_ctx);
> 
> -	if (out_vb)
> +	if (out_vb && (out_vb->flags & V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF)) 
{
>  		out_vb->flags &= ~V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
> -	else if (cap_vb && cap_vb->is_held)
> -		v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
> +	} else if (cap_vb && cap_vb->is_held) {
> +		cap_vb->is_held = false;
> +		if (m2m_dev->curr_ctx) {

Above condition should be negated.

Best regards,
Jernej

> +			v4l2_m2m_dst_buf_remove(fh->m2m_ctx);
> +			v4l2_m2m_buf_done(cap_vb, 
VB2_BUF_STATE_DONE);
> +		}
> +	}
> +	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> 
>  	return 0;
>  }
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
> b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c index
> 67f7d4326fc1..4e30f263b427 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
> @@ -30,14 +30,7 @@ void cedrus_device_run(void *priv)
>  	struct media_request *src_req;
> 
>  	run.src = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
> -	run.dst = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
> -
> -	if (v4l2_m2m_release_capture_buf(run.src, run.dst)) {
> -		v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
> -		v4l2_m2m_buf_done(run.dst, VB2_BUF_STATE_DONE);
> -		run.dst = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
> -	}
> -	run.dst->is_held = run.src->flags & 
V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
> +	run.dst = v4l2_m2m_release_capture_buf(ctx->fh.m2m_ctx);
> 
>  	run.first_slice = !run.dst->vb2_buf.copied_timestamp ||
>  		run.src->vb2_buf.timestamp != run.dst-
>vb2_buf.timestamp;
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
> b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c index
> 99fedec80224..242cad82cc8c 100644
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
> @@ -103,7 +103,6 @@ static irqreturn_t cedrus_irq(int irq, void *data)
>  {
>  	struct cedrus_dev *dev = data;
>  	struct cedrus_ctx *ctx;
> -	struct vb2_v4l2_buffer *src_buf, *dst_buf;
>  	enum vb2_buffer_state state;
>  	enum cedrus_irq_status status;
> 
> @@ -121,26 +120,12 @@ static irqreturn_t cedrus_irq(int irq, void *data)
>  	dev->dec_ops[ctx->current_codec]->irq_disable(ctx);
>  	dev->dec_ops[ctx->current_codec]->irq_clear(ctx);
> 
> -	src_buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
> -	dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
> -
> -	if (!src_buf || !dst_buf) {
> -		v4l2_err(&dev->v4l2_dev,
> -			 "Missing source and/or destination 
buffers\n");
> -		return IRQ_HANDLED;
> -	}
> -
>  	if (status == CEDRUS_IRQ_ERROR)
>  		state = VB2_BUF_STATE_ERROR;
>  	else
>  		state = VB2_BUF_STATE_DONE;
> 
> -	v4l2_m2m_buf_done(src_buf, state);
> -	if (!dst_buf->is_held) {
> -		v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
> -		v4l2_m2m_buf_done(dst_buf, state);
> -	}
> -	v4l2_m2m_job_finish(ctx->dev->m2m_dev, ctx->fh.m2m_ctx);
> +	v4l2_m2m_job_finish_held(ctx->dev->m2m_dev, ctx->fh.m2m_ctx, state);
> 
>  	return IRQ_HANDLED;
>  }
> diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
> index 8ae2f56c7fa3..48ca7d3eaa3d 100644
> --- a/include/media/v4l2-mem2mem.h
> +++ b/include/media/v4l2-mem2mem.h
> @@ -173,6 +173,10 @@ void v4l2_m2m_try_schedule(struct v4l2_m2m_ctx
> *m2m_ctx); void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
>  			 struct v4l2_m2m_ctx *m2m_ctx);
> 
> +void v4l2_m2m_job_finish_held(struct v4l2_m2m_dev *m2m_dev,
> +			 struct v4l2_m2m_ctx *m2m_ctx,
> +			 enum vb2_buffer_state state);
> +
>  static inline void
>  v4l2_m2m_buf_done(struct vb2_v4l2_buffer *buf, enum vb2_buffer_state state)
> {
> @@ -644,47 +648,7 @@ void v4l2_m2m_buf_copy_metadata(const struct
> vb2_v4l2_buffer *out_vb, struct vb2_v4l2_buffer *cap_vb,
>  				bool copy_frame_flags);
> 
> -/**
> - * v4l2_m2m_release_capture_buf() - check if the capture buffer should be
> - * released
> - *
> - * @out_vb: the output buffer
> - * @cap_vb: the capture buffer
> - *
> - * This helper function returns true if the current capture buffer should
> - * be released to vb2. This is the case if the output buffer specified that
> - * the capture buffer should be held (i.e. not returned to vb2) AND if the
> - * timestamp of the capture buffer differs from the output buffer
> timestamp. - *
> - * This helper is to be called at the start of the device_run callback:
> - *
> - * .. code-block:: c
> - *
> - *	if (v4l2_m2m_release_capture_buf(out_vb, cap_vb)) {
> - *		v4l2_m2m_dst_buf_remove(m2m_ctx);
> - *		v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
> - *		cap_vb = v4l2_m2m_next_dst_buf(m2m_ctx);
> - *	}
> - *	cap_vb->is_held = out_vb->flags & 
V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
> - *
> - *	...
> - *
> - *	v4l2_m2m_buf_done(out_vb, VB2_BUF_STATE_DONE);
> - *	if (!cap_vb->is_held) {
> - *		v4l2_m2m_dst_buf_remove(m2m_ctx);
> - *		v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
> - *	}
> - *
> - * This allows for multiple output buffers to be used to fill in a single
> - * capture buffer. This is typically used by stateless decoders where
> - * multiple e.g. H.264 slices contribute to a single decoded frame.
> - */
> -static inline bool v4l2_m2m_release_capture_buf(const struct
> vb2_v4l2_buffer *out_vb, -					
	const struct vb2_v4l2_buffer *cap_vb)
> -{
> -	return cap_vb->is_held && cap_vb->vb2_buf.copied_timestamp &&
> -	       out_vb->vb2_buf.timestamp != cap_vb->vb2_buf.timestamp;
> -}
> +struct vb2_v4l2_buffer *v4l2_m2m_release_capture_buf(struct v4l2_m2m_ctx
> *m2m_ctx);
> 
>  /* v4l2 request helper */
Hans Verkuil Oct. 9, 2019, 10:18 a.m. UTC | #8
On 10/7/19 9:01 PM, Jernej Škrabec wrote:
> Dne ponedeljek, 07. oktober 2019 ob 12:44:24 CEST je Hans Verkuil napisal(a):
>> Hi Jernej,
>>
>> On 9/29/19 10:00 PM, Jernej Skrabec wrote:
>>> This series adds support for decoding multi-slice H264 frames along with
>>> support for V4L2_DEC_CMD_FLUSH and V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF.
>>>
>>> Code was tested by modified ffmpeg, which can be found here:
>>> https://github.com/jernejsk/FFmpeg, branch mainline-test
>>> It has to be configured with at least following options:
>>> --enable-v4l2-request --enable-libudev --enable-libdrm
>>>
>>> Samples used for testing:
>>> http://jernej.libreelec.tv/videos/h264/BA1_FT_C.mp4
>>> http://jernej.libreelec.tv/videos/h264/h264.mp4
>>>
>>> Command line used for testing:
>>> ffmpeg -hwaccel drm -hwaccel_device /dev/dri/card0 -i h264.mp4 -pix_fmt
>>> bgra -f fbdev /dev/fb0
>>>
>>> Please note that V4L2_DEC_CMD_FLUSH was not tested because I'm
>>> not sure how. ffmpeg follows exactly which slice is last in frame
>>> and sets hold flag accordingly. Improper usage of hold flag would
>>> corrupt ffmpeg assumptions and it would probably crash. Any ideas
>>> how to test this are welcome!
>>>
>>> Thanks to Jonas for adjusting ffmpeg.
>>>
>>> Please let me know what you think.
>>>
>>> Best regards,
>>> Jernej
>>>
>>> Changes from v1:
>>> - added Rb tags
>>> - updated V4L2_DEC_CMD_FLUSH documentation
>>> - updated first slice detection in Cedrus
>>> - hold capture buffer flag is set according to source format
>>> - added v4l m2m stateless_(try_)decoder_cmd ioctl helpers
>>>
>>> Hans Verkuil (2):
>>>   vb2: add V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF
>>>   videodev2.h: add V4L2_DEC_CMD_FLUSH
>>>
>>> Jernej Skrabec (4):
>>>   media: v4l2-mem2mem: add stateless_(try_)decoder_cmd ioctl helpers
>>>   media: cedrus: Detect first slice of a frame
>>>   media: cedrus: h264: Support multiple slices per frame
>>>   media: cedrus: Add support for holding capture buffer
>>>  
>>>  Documentation/media/uapi/v4l/buffer.rst       | 13 ++++++
>>>  .../media/uapi/v4l/vidioc-decoder-cmd.rst     | 10 +++-
>>>  .../media/uapi/v4l/vidioc-reqbufs.rst         |  6 +++
>>>  .../media/videodev2.h.rst.exceptions          |  1 +
>>>  .../media/common/videobuf2/videobuf2-v4l2.c   |  8 +++-
>>>  drivers/media/v4l2-core/v4l2-mem2mem.c        | 35 ++++++++++++++
>>>  drivers/staging/media/sunxi/cedrus/cedrus.h   |  1 +
>>>  .../staging/media/sunxi/cedrus/cedrus_dec.c   | 11 +++++
>>>  .../staging/media/sunxi/cedrus/cedrus_h264.c  | 11 ++++-
>>>  .../staging/media/sunxi/cedrus/cedrus_hw.c    |  8 ++--
>>>  .../staging/media/sunxi/cedrus/cedrus_video.c | 14 ++++++
>>>  include/media/v4l2-mem2mem.h                  | 46 +++++++++++++++++++
>>>  include/media/videobuf2-core.h                |  3 ++
>>>  include/media/videobuf2-v4l2.h                |  5 ++
>>>  include/uapi/linux/videodev2.h                | 14 ++++--
>>>  15 files changed, 175 insertions(+), 11 deletions(-)
>>
>> I didn't want to make a v3 of this series, instead I hacked this patch that
>> will hopefully do all the locking right.
>>
>> Basically I moved all the 'held' related code into v4l2-mem2mem under
>> job_spinlock. This simplifies the driver code as well.
>>
>> But this is a hack that sits on top of this series. If your ffmpeg tests are
>> now successful, then I'll turn this into a proper series with correct
>> documentation (a lot of the comments are now wrong with this patch, so just
>> ignore that).
> 
> Thanks for looking into this! With small fix mentioned below, it works! Note 
> that both scenarios I tested (flushing during decoding and flushing after 
> decoding is finished) are focused on capture queue. In order to trigger output 
> queue flush, ffmpeg would need to queue multiple jobs and call flush before they 
> are all processed. This is not something I can do at this time. Maybe Jonas 
> can help with modifying ffmpeg appropriately. However, code for case seems 
> correct to me.
> 
>>
>> Regards,
>>
>> 	Hans
>>
>> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c
>> b/drivers/media/v4l2-core/v4l2-mem2mem.c index 2677a07e4c9b..f81a8f2465ab
>> 100644
>> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
>> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
>> @@ -412,25 +412,24 @@ static void v4l2_m2m_cancel_job(struct v4l2_m2m_ctx
>> *m2m_ctx) }
>>  }
>>
>> -void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
>> -			 struct v4l2_m2m_ctx *m2m_ctx)
>> +static bool _v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
>> +			  struct v4l2_m2m_ctx *m2m_ctx)
>>  {
>> -	unsigned long flags;
>> -
>> -	spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
>>  	if (!m2m_dev->curr_ctx || m2m_dev->curr_ctx != m2m_ctx) {
>> -		spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
>>  		dprintk("Called by an instance not currently 
> running\n");
>> -		return;
>> +		return false;
>>  	}
>>
>>  	list_del(&m2m_dev->curr_ctx->queue);
>>  	m2m_dev->curr_ctx->job_flags &= ~(TRANS_QUEUED | TRANS_RUNNING);
>>  	wake_up(&m2m_dev->curr_ctx->finished);
>>  	m2m_dev->curr_ctx = NULL;
>> +	return true;
>> +}
>>
>> -	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
>> -
>> +static void v4l2_m2m_job_next(struct v4l2_m2m_dev *m2m_dev,
>> +		       struct v4l2_m2m_ctx *m2m_ctx)
>> +{
>>  	/* This instance might have more buffers ready, but since we do not
>>  	 * allow more than one job on the job_queue per instance, each has
>>  	 * to be scheduled separately after the previous one finishes. */
>> @@ -441,8 +440,113 @@ void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
>> */
>>  	schedule_work(&m2m_dev->job_work);
>>  }
>> +
>> +void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
>> +			 struct v4l2_m2m_ctx *m2m_ctx)
>> +{
>> +	unsigned long flags;
>> +
>> +	spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
>> +	if (!_v4l2_m2m_job_finish(m2m_dev, m2m_ctx)) {
>> +		spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
>> +		return;
>> +	}
>> +	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
>> +
>> +	v4l2_m2m_job_next(m2m_dev, m2m_ctx);
>> +}
>>  EXPORT_SYMBOL(v4l2_m2m_job_finish);
>>
>> +void v4l2_m2m_job_finish_held(struct v4l2_m2m_dev *m2m_dev,
>> +			 struct v4l2_m2m_ctx *m2m_ctx,
>> +			 enum vb2_buffer_state state)
>> +{
>> +	struct vb2_v4l2_buffer *src_buf, *dst_buf;
>> +	unsigned long flags;
>> +
>> +	spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
>> +	src_buf = v4l2_m2m_src_buf_remove(m2m_ctx);
>> +	dst_buf = v4l2_m2m_next_dst_buf(m2m_ctx);
>> +
>> +	if (!src_buf || !dst_buf) {
>> +		pr_err("Missing source and/or destination buffers\n");
>> +		spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
>> +		return;
>> +	}
>> +	v4l2_m2m_buf_done(src_buf, state);
>> +	if (!dst_buf->is_held) {
>> +		v4l2_m2m_dst_buf_remove(m2m_ctx);
>> +		v4l2_m2m_buf_done(dst_buf, state);
>> +	}
>> +	if (!_v4l2_m2m_job_finish(m2m_dev, m2m_ctx)) {
>> +		spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
>> +		return;
>> +	}
>> +	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
>> +
>> +	v4l2_m2m_job_next(m2m_dev, m2m_ctx);
>> +}
>> +EXPORT_SYMBOL(v4l2_m2m_job_finish_held);
>> +
>> +/**
>> + * v4l2_m2m_release_capture_buf() - check if the capture buffer should be
>> + * released
>> + *
>> + * @out_vb: the output buffer
>> + * @cap_vb: the capture buffer
>> + *
>> + * This helper function returns true if the current capture buffer should
>> + * be released to vb2. This is the case if the output buffer specified that
>> + * the capture buffer should be held (i.e. not returned to vb2) AND if the
>> + * timestamp of the capture buffer differs from the output buffer
>> timestamp. + *
>> + * This helper is to be called at the start of the device_run callback:
>> + *
>> + * .. code-block:: c
>> + *
>> + *	if (v4l2_m2m_release_capture_buf(out_vb, cap_vb)) {
>> + *		v4l2_m2m_dst_buf_remove(m2m_ctx);
>> + *		v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
>> + *		cap_vb = v4l2_m2m_next_dst_buf(m2m_ctx);
>> + *	}
>> + *	cap_vb->is_held = out_vb->flags & 
> V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
>> + *
>> + *	...
>> + *
>> + *	v4l2_m2m_buf_done(out_vb, VB2_BUF_STATE_DONE);
>> + *	if (!cap_vb->is_held) {
>> + *		v4l2_m2m_dst_buf_remove(m2m_ctx);
>> + *		v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
>> + *	}
>> + *
>> + * This allows for multiple output buffers to be used to fill in a single
>> + * capture buffer. This is typically used by stateless decoders where
>> + * multiple e.g. H.264 slices contribute to a single decoded frame.
>> + */
>> +struct vb2_v4l2_buffer *v4l2_m2m_release_capture_buf(struct v4l2_m2m_ctx
>> *m2m_ctx) +{
>> +	struct v4l2_m2m_dev *m2m_dev = m2m_ctx->m2m_dev;
>> +	struct vb2_v4l2_buffer *src, *dst;
>> +	unsigned long flags;
>> +
>> +	spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
>> +	src = v4l2_m2m_next_src_buf(m2m_ctx);
>> +	dst = v4l2_m2m_next_dst_buf(m2m_ctx);
>> +
>> +	if (dst->is_held && dst->vb2_buf.copied_timestamp &&
>> +	    src->vb2_buf.timestamp != dst->vb2_buf.timestamp) {
>> +		dst->is_held = false;
>> +		v4l2_m2m_dst_buf_remove(m2m_ctx);
>> +		v4l2_m2m_buf_done(dst, VB2_BUF_STATE_DONE);
>> +		dst = v4l2_m2m_next_dst_buf(m2m_ctx);
>> +	}
>> +	dst->is_held = src->flags & V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
>> +	src->flags &= ~V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
>> +	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
>> +	return dst;
>> +}
>> +EXPORT_SYMBOL(v4l2_m2m_release_capture_buf);
>> +
>>  int v4l2_m2m_reqbufs(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
>>  		     struct v4l2_requestbuffers *reqbufs)
>>  {
>> @@ -1171,19 +1275,28 @@ int v4l2_m2m_ioctl_stateless_decoder_cmd(struct file
>> *file, void *priv, {
>>  	struct v4l2_fh *fh = file->private_data;
>>  	struct vb2_v4l2_buffer *out_vb, *cap_vb;
>> +	struct v4l2_m2m_dev *m2m_dev = fh->m2m_ctx->m2m_dev;
>> +	unsigned long flags;
>>  	int ret;
>>
>>  	ret = v4l2_m2m_ioctl_stateless_try_decoder_cmd(file, priv, dc);
>>  	if (ret < 0)
>>  		return ret;
>>
>> +	spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
>>  	out_vb = v4l2_m2m_last_src_buf(fh->m2m_ctx);
>>  	cap_vb = v4l2_m2m_last_dst_buf(fh->m2m_ctx);
>>
>> -	if (out_vb)
>> +	if (out_vb && (out_vb->flags & V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF)) 
> {
>>  		out_vb->flags &= ~V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
>> -	else if (cap_vb && cap_vb->is_held)
>> -		v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
>> +	} else if (cap_vb && cap_vb->is_held) {
>> +		cap_vb->is_held = false;
>> +		if (m2m_dev->curr_ctx) {
> 
> Above condition should be negated.

Close. It should check that this buffer isn't currently being processed.
So:

		if (m2m_dev->curr_ctx != fh->m2m_ctx) {

Can you test with this change? If this works, then I'll post a proper
series for this.

Thanks!

	Hans

> 
> Best regards,
> Jernej
> 
>> +			v4l2_m2m_dst_buf_remove(fh->m2m_ctx);
>> +			v4l2_m2m_buf_done(cap_vb, 
> VB2_BUF_STATE_DONE);
>> +		}
>> +	}
>> +	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
>>
>>  	return 0;
>>  }
>> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
>> b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c index
>> 67f7d4326fc1..4e30f263b427 100644
>> --- a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
>> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
>> @@ -30,14 +30,7 @@ void cedrus_device_run(void *priv)
>>  	struct media_request *src_req;
>>
>>  	run.src = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
>> -	run.dst = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
>> -
>> -	if (v4l2_m2m_release_capture_buf(run.src, run.dst)) {
>> -		v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
>> -		v4l2_m2m_buf_done(run.dst, VB2_BUF_STATE_DONE);
>> -		run.dst = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
>> -	}
>> -	run.dst->is_held = run.src->flags & 
> V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
>> +	run.dst = v4l2_m2m_release_capture_buf(ctx->fh.m2m_ctx);
>>
>>  	run.first_slice = !run.dst->vb2_buf.copied_timestamp ||
>>  		run.src->vb2_buf.timestamp != run.dst-
>> vb2_buf.timestamp;
>> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
>> b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c index
>> 99fedec80224..242cad82cc8c 100644
>> --- a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
>> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
>> @@ -103,7 +103,6 @@ static irqreturn_t cedrus_irq(int irq, void *data)
>>  {
>>  	struct cedrus_dev *dev = data;
>>  	struct cedrus_ctx *ctx;
>> -	struct vb2_v4l2_buffer *src_buf, *dst_buf;
>>  	enum vb2_buffer_state state;
>>  	enum cedrus_irq_status status;
>>
>> @@ -121,26 +120,12 @@ static irqreturn_t cedrus_irq(int irq, void *data)
>>  	dev->dec_ops[ctx->current_codec]->irq_disable(ctx);
>>  	dev->dec_ops[ctx->current_codec]->irq_clear(ctx);
>>
>> -	src_buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
>> -	dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
>> -
>> -	if (!src_buf || !dst_buf) {
>> -		v4l2_err(&dev->v4l2_dev,
>> -			 "Missing source and/or destination 
> buffers\n");
>> -		return IRQ_HANDLED;
>> -	}
>> -
>>  	if (status == CEDRUS_IRQ_ERROR)
>>  		state = VB2_BUF_STATE_ERROR;
>>  	else
>>  		state = VB2_BUF_STATE_DONE;
>>
>> -	v4l2_m2m_buf_done(src_buf, state);
>> -	if (!dst_buf->is_held) {
>> -		v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
>> -		v4l2_m2m_buf_done(dst_buf, state);
>> -	}
>> -	v4l2_m2m_job_finish(ctx->dev->m2m_dev, ctx->fh.m2m_ctx);
>> +	v4l2_m2m_job_finish_held(ctx->dev->m2m_dev, ctx->fh.m2m_ctx, state);
>>
>>  	return IRQ_HANDLED;
>>  }
>> diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
>> index 8ae2f56c7fa3..48ca7d3eaa3d 100644
>> --- a/include/media/v4l2-mem2mem.h
>> +++ b/include/media/v4l2-mem2mem.h
>> @@ -173,6 +173,10 @@ void v4l2_m2m_try_schedule(struct v4l2_m2m_ctx
>> *m2m_ctx); void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
>>  			 struct v4l2_m2m_ctx *m2m_ctx);
>>
>> +void v4l2_m2m_job_finish_held(struct v4l2_m2m_dev *m2m_dev,
>> +			 struct v4l2_m2m_ctx *m2m_ctx,
>> +			 enum vb2_buffer_state state);
>> +
>>  static inline void
>>  v4l2_m2m_buf_done(struct vb2_v4l2_buffer *buf, enum vb2_buffer_state state)
>> {
>> @@ -644,47 +648,7 @@ void v4l2_m2m_buf_copy_metadata(const struct
>> vb2_v4l2_buffer *out_vb, struct vb2_v4l2_buffer *cap_vb,
>>  				bool copy_frame_flags);
>>
>> -/**
>> - * v4l2_m2m_release_capture_buf() - check if the capture buffer should be
>> - * released
>> - *
>> - * @out_vb: the output buffer
>> - * @cap_vb: the capture buffer
>> - *
>> - * This helper function returns true if the current capture buffer should
>> - * be released to vb2. This is the case if the output buffer specified that
>> - * the capture buffer should be held (i.e. not returned to vb2) AND if the
>> - * timestamp of the capture buffer differs from the output buffer
>> timestamp. - *
>> - * This helper is to be called at the start of the device_run callback:
>> - *
>> - * .. code-block:: c
>> - *
>> - *	if (v4l2_m2m_release_capture_buf(out_vb, cap_vb)) {
>> - *		v4l2_m2m_dst_buf_remove(m2m_ctx);
>> - *		v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
>> - *		cap_vb = v4l2_m2m_next_dst_buf(m2m_ctx);
>> - *	}
>> - *	cap_vb->is_held = out_vb->flags & 
> V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
>> - *
>> - *	...
>> - *
>> - *	v4l2_m2m_buf_done(out_vb, VB2_BUF_STATE_DONE);
>> - *	if (!cap_vb->is_held) {
>> - *		v4l2_m2m_dst_buf_remove(m2m_ctx);
>> - *		v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
>> - *	}
>> - *
>> - * This allows for multiple output buffers to be used to fill in a single
>> - * capture buffer. This is typically used by stateless decoders where
>> - * multiple e.g. H.264 slices contribute to a single decoded frame.
>> - */
>> -static inline bool v4l2_m2m_release_capture_buf(const struct
>> vb2_v4l2_buffer *out_vb, -					
> 	const struct vb2_v4l2_buffer *cap_vb)
>> -{
>> -	return cap_vb->is_held && cap_vb->vb2_buf.copied_timestamp &&
>> -	       out_vb->vb2_buf.timestamp != cap_vb->vb2_buf.timestamp;
>> -}
>> +struct vb2_v4l2_buffer *v4l2_m2m_release_capture_buf(struct v4l2_m2m_ctx
>> *m2m_ctx);
>>
>>  /* v4l2 request helper */
> 
> 
> 
>
Jernej Škrabec Oct. 9, 2019, 2:44 p.m. UTC | #9
Dne sreda, 09. oktober 2019 ob 12:18:45 CEST je Hans Verkuil napisal(a):
> On 10/7/19 9:01 PM, Jernej Škrabec wrote:
> > Dne ponedeljek, 07. oktober 2019 ob 12:44:24 CEST je Hans Verkuil 
napisal(a):
> >> Hi Jernej,
> >> 
> >> On 9/29/19 10:00 PM, Jernej Skrabec wrote:
> >>> This series adds support for decoding multi-slice H264 frames along with
> >>> support for V4L2_DEC_CMD_FLUSH and V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF.
> >>> 
> >>> Code was tested by modified ffmpeg, which can be found here:
> >>> https://github.com/jernejsk/FFmpeg, branch mainline-test
> >>> It has to be configured with at least following options:
> >>> --enable-v4l2-request --enable-libudev --enable-libdrm
> >>> 
> >>> Samples used for testing:
> >>> http://jernej.libreelec.tv/videos/h264/BA1_FT_C.mp4
> >>> http://jernej.libreelec.tv/videos/h264/h264.mp4
> >>> 
> >>> Command line used for testing:
> >>> ffmpeg -hwaccel drm -hwaccel_device /dev/dri/card0 -i h264.mp4 -pix_fmt
> >>> bgra -f fbdev /dev/fb0
> >>> 
> >>> Please note that V4L2_DEC_CMD_FLUSH was not tested because I'm
> >>> not sure how. ffmpeg follows exactly which slice is last in frame
> >>> and sets hold flag accordingly. Improper usage of hold flag would
> >>> corrupt ffmpeg assumptions and it would probably crash. Any ideas
> >>> how to test this are welcome!
> >>> 
> >>> Thanks to Jonas for adjusting ffmpeg.
> >>> 
> >>> Please let me know what you think.
> >>> 
> >>> Best regards,
> >>> Jernej
> >>> 
> >>> Changes from v1:
> >>> - added Rb tags
> >>> - updated V4L2_DEC_CMD_FLUSH documentation
> >>> - updated first slice detection in Cedrus
> >>> - hold capture buffer flag is set according to source format
> >>> - added v4l m2m stateless_(try_)decoder_cmd ioctl helpers
> >>> 
> >>> Hans Verkuil (2):
> >>>   vb2: add V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF
> >>>   videodev2.h: add V4L2_DEC_CMD_FLUSH
> >>> 
> >>> Jernej Skrabec (4):
> >>>   media: v4l2-mem2mem: add stateless_(try_)decoder_cmd ioctl helpers
> >>>   media: cedrus: Detect first slice of a frame
> >>>   media: cedrus: h264: Support multiple slices per frame
> >>>   media: cedrus: Add support for holding capture buffer
> >>>  
> >>>  Documentation/media/uapi/v4l/buffer.rst       | 13 ++++++
> >>>  .../media/uapi/v4l/vidioc-decoder-cmd.rst     | 10 +++-
> >>>  .../media/uapi/v4l/vidioc-reqbufs.rst         |  6 +++
> >>>  .../media/videodev2.h.rst.exceptions          |  1 +
> >>>  .../media/common/videobuf2/videobuf2-v4l2.c   |  8 +++-
> >>>  drivers/media/v4l2-core/v4l2-mem2mem.c        | 35 ++++++++++++++
> >>>  drivers/staging/media/sunxi/cedrus/cedrus.h   |  1 +
> >>>  .../staging/media/sunxi/cedrus/cedrus_dec.c   | 11 +++++
> >>>  .../staging/media/sunxi/cedrus/cedrus_h264.c  | 11 ++++-
> >>>  .../staging/media/sunxi/cedrus/cedrus_hw.c    |  8 ++--
> >>>  .../staging/media/sunxi/cedrus/cedrus_video.c | 14 ++++++
> >>>  include/media/v4l2-mem2mem.h                  | 46 +++++++++++++++++++
> >>>  include/media/videobuf2-core.h                |  3 ++
> >>>  include/media/videobuf2-v4l2.h                |  5 ++
> >>>  include/uapi/linux/videodev2.h                | 14 ++++--
> >>>  15 files changed, 175 insertions(+), 11 deletions(-)
> >> 
> >> I didn't want to make a v3 of this series, instead I hacked this patch
> >> that
> >> will hopefully do all the locking right.
> >> 
> >> Basically I moved all the 'held' related code into v4l2-mem2mem under
> >> job_spinlock. This simplifies the driver code as well.
> >> 
> >> But this is a hack that sits on top of this series. If your ffmpeg tests
> >> are now successful, then I'll turn this into a proper series with
> >> correct documentation (a lot of the comments are now wrong with this
> >> patch, so just ignore that).
> > 
> > Thanks for looking into this! With small fix mentioned below, it works!
> > Note that both scenarios I tested (flushing during decoding and flushing
> > after decoding is finished) are focused on capture queue. In order to
> > trigger output queue flush, ffmpeg would need to queue multiple jobs and
> > call flush before they are all processed. This is not something I can do
> > at this time. Maybe Jonas can help with modifying ffmpeg appropriately.
> > However, code for case seems correct to me.
> > 
> >> Regards,
> >> 
> >> 	Hans
> >> 
> >> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c
> >> b/drivers/media/v4l2-core/v4l2-mem2mem.c index 2677a07e4c9b..f81a8f2465ab
> >> 100644
> >> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> >> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> >> @@ -412,25 +412,24 @@ static void v4l2_m2m_cancel_job(struct v4l2_m2m_ctx
> >> *m2m_ctx) }
> >> 
> >>  }
> >> 
> >> -void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
> >> -			 struct v4l2_m2m_ctx *m2m_ctx)
> >> +static bool _v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
> >> +			  struct v4l2_m2m_ctx *m2m_ctx)
> >> 
> >>  {
> >> 
> >> -	unsigned long flags;
> >> -
> >> -	spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
> >> 
> >>  	if (!m2m_dev->curr_ctx || m2m_dev->curr_ctx != m2m_ctx) {
> >> 
> >> -		spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> >> 
> >>  		dprintk("Called by an instance not currently
> > 
> > running\n");
> > 
> >> -		return;
> >> +		return false;
> >> 
> >>  	}
> >>  	
> >>  	list_del(&m2m_dev->curr_ctx->queue);
> >>  	m2m_dev->curr_ctx->job_flags &= ~(TRANS_QUEUED | TRANS_RUNNING);
> >>  	wake_up(&m2m_dev->curr_ctx->finished);
> >>  	m2m_dev->curr_ctx = NULL;
> >> 
> >> +	return true;
> >> +}
> >> 
> >> -	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> >> -
> >> +static void v4l2_m2m_job_next(struct v4l2_m2m_dev *m2m_dev,
> >> +		       struct v4l2_m2m_ctx *m2m_ctx)
> >> +{
> >> 
> >>  	/* This instance might have more buffers ready, but since we do not
> >>  	
> >>  	 * allow more than one job on the job_queue per instance, each has
> >>  	 * to be scheduled separately after the previous one finishes. */
> >> 
> >> @@ -441,8 +440,113 @@ void v4l2_m2m_job_finish(struct v4l2_m2m_dev
> >> *m2m_dev, */
> >> 
> >>  	schedule_work(&m2m_dev->job_work);
> >>  
> >>  }
> >> 
> >> +
> >> +void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
> >> +			 struct v4l2_m2m_ctx *m2m_ctx)
> >> +{
> >> +	unsigned long flags;
> >> +
> >> +	spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
> >> +	if (!_v4l2_m2m_job_finish(m2m_dev, m2m_ctx)) {
> >> +		spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> >> +		return;
> >> +	}
> >> +	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> >> +
> >> +	v4l2_m2m_job_next(m2m_dev, m2m_ctx);
> >> +}
> >> 
> >>  EXPORT_SYMBOL(v4l2_m2m_job_finish);
> >> 
> >> +void v4l2_m2m_job_finish_held(struct v4l2_m2m_dev *m2m_dev,
> >> +			 struct v4l2_m2m_ctx *m2m_ctx,
> >> +			 enum vb2_buffer_state state)
> >> +{
> >> +	struct vb2_v4l2_buffer *src_buf, *dst_buf;
> >> +	unsigned long flags;
> >> +
> >> +	spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
> >> +	src_buf = v4l2_m2m_src_buf_remove(m2m_ctx);
> >> +	dst_buf = v4l2_m2m_next_dst_buf(m2m_ctx);
> >> +
> >> +	if (!src_buf || !dst_buf) {
> >> +		pr_err("Missing source and/or destination buffers\n");
> >> +		spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> >> +		return;
> >> +	}
> >> +	v4l2_m2m_buf_done(src_buf, state);
> >> +	if (!dst_buf->is_held) {
> >> +		v4l2_m2m_dst_buf_remove(m2m_ctx);
> >> +		v4l2_m2m_buf_done(dst_buf, state);
> >> +	}
> >> +	if (!_v4l2_m2m_job_finish(m2m_dev, m2m_ctx)) {
> >> +		spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> >> +		return;
> >> +	}
> >> +	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> >> +
> >> +	v4l2_m2m_job_next(m2m_dev, m2m_ctx);
> >> +}
> >> +EXPORT_SYMBOL(v4l2_m2m_job_finish_held);
> >> +
> >> +/**
> >> + * v4l2_m2m_release_capture_buf() - check if the capture buffer should
> >> be
> >> + * released
> >> + *
> >> + * @out_vb: the output buffer
> >> + * @cap_vb: the capture buffer
> >> + *
> >> + * This helper function returns true if the current capture buffer
> >> should
> >> + * be released to vb2. This is the case if the output buffer specified
> >> that + * the capture buffer should be held (i.e. not returned to vb2)
> >> AND if the + * timestamp of the capture buffer differs from the output
> >> buffer timestamp. + *
> >> + * This helper is to be called at the start of the device_run callback:
> >> + *
> >> + * .. code-block:: c
> >> + *
> >> + *	if (v4l2_m2m_release_capture_buf(out_vb, cap_vb)) {
> >> + *		v4l2_m2m_dst_buf_remove(m2m_ctx);
> >> + *		v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
> >> + *		cap_vb = v4l2_m2m_next_dst_buf(m2m_ctx);
> >> + *	}
> >> + *	cap_vb->is_held = out_vb->flags &
> > 
> > V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
> > 
> >> + *
> >> + *	...
> >> + *
> >> + *	v4l2_m2m_buf_done(out_vb, VB2_BUF_STATE_DONE);
> >> + *	if (!cap_vb->is_held) {
> >> + *		v4l2_m2m_dst_buf_remove(m2m_ctx);
> >> + *		v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
> >> + *	}
> >> + *
> >> + * This allows for multiple output buffers to be used to fill in a
> >> single
> >> + * capture buffer. This is typically used by stateless decoders where
> >> + * multiple e.g. H.264 slices contribute to a single decoded frame.
> >> + */
> >> +struct vb2_v4l2_buffer *v4l2_m2m_release_capture_buf(struct v4l2_m2m_ctx
> >> *m2m_ctx) +{
> >> +	struct v4l2_m2m_dev *m2m_dev = m2m_ctx->m2m_dev;
> >> +	struct vb2_v4l2_buffer *src, *dst;
> >> +	unsigned long flags;
> >> +
> >> +	spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
> >> +	src = v4l2_m2m_next_src_buf(m2m_ctx);
> >> +	dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> >> +
> >> +	if (dst->is_held && dst->vb2_buf.copied_timestamp &&
> >> +	    src->vb2_buf.timestamp != dst->vb2_buf.timestamp) {
> >> +		dst->is_held = false;
> >> +		v4l2_m2m_dst_buf_remove(m2m_ctx);
> >> +		v4l2_m2m_buf_done(dst, VB2_BUF_STATE_DONE);
> >> +		dst = v4l2_m2m_next_dst_buf(m2m_ctx);
> >> +	}
> >> +	dst->is_held = src->flags & V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
> >> +	src->flags &= ~V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
> >> +	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> >> +	return dst;
> >> +}
> >> +EXPORT_SYMBOL(v4l2_m2m_release_capture_buf);
> >> +
> >> 
> >>  int v4l2_m2m_reqbufs(struct file *file, struct v4l2_m2m_ctx *m2m_ctx,
> >>  
> >>  		     struct v4l2_requestbuffers *reqbufs)
> >>  
> >>  {
> >> 
> >> @@ -1171,19 +1275,28 @@ int v4l2_m2m_ioctl_stateless_decoder_cmd(struct
> >> file *file, void *priv, {
> >> 
> >>  	struct v4l2_fh *fh = file->private_data;
> >>  	struct vb2_v4l2_buffer *out_vb, *cap_vb;
> >> 
> >> +	struct v4l2_m2m_dev *m2m_dev = fh->m2m_ctx->m2m_dev;
> >> +	unsigned long flags;
> >> 
> >>  	int ret;
> >>  	
> >>  	ret = v4l2_m2m_ioctl_stateless_try_decoder_cmd(file, priv, dc);
> >>  	if (ret < 0)
> >>  	
> >>  		return ret;
> >> 
> >> +	spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
> >> 
> >>  	out_vb = v4l2_m2m_last_src_buf(fh->m2m_ctx);
> >>  	cap_vb = v4l2_m2m_last_dst_buf(fh->m2m_ctx);
> >> 
> >> -	if (out_vb)
> >> +	if (out_vb && (out_vb->flags & V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF))
> > 
> > {
> > 
> >>  		out_vb->flags &= ~V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
> >> 
> >> -	else if (cap_vb && cap_vb->is_held)
> >> -		v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
> >> +	} else if (cap_vb && cap_vb->is_held) {
> >> +		cap_vb->is_held = false;
> >> +		if (m2m_dev->curr_ctx) {
> > 
> > Above condition should be negated.
> 
> Close. It should check that this buffer isn't currently being processed.
> So:
> 
> 		if (m2m_dev->curr_ctx != fh->m2m_ctx) {
> 
> Can you test with this change? If this works, then I'll post a proper
> series for this.

I confirm that it works.

Best regards,
Jernej

> 
> Thanks!
> 
> 	Hans
> 
> > Best regards,
> > Jernej
> > 
> >> +			v4l2_m2m_dst_buf_remove(fh->m2m_ctx);
> >> +			v4l2_m2m_buf_done(cap_vb,
> > 
> > VB2_BUF_STATE_DONE);
> > 
> >> +		}
> >> +	}
> >> +	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
> >> 
> >>  	return 0;
> >>  
> >>  }
> >> 
> >> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
> >> b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c index
> >> 67f7d4326fc1..4e30f263b427 100644
> >> --- a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
> >> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
> >> @@ -30,14 +30,7 @@ void cedrus_device_run(void *priv)
> >> 
> >>  	struct media_request *src_req;
> >>  	
> >>  	run.src = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
> >> 
> >> -	run.dst = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
> >> -
> >> -	if (v4l2_m2m_release_capture_buf(run.src, run.dst)) {
> >> -		v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
> >> -		v4l2_m2m_buf_done(run.dst, VB2_BUF_STATE_DONE);
> >> -		run.dst = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
> >> -	}
> >> -	run.dst->is_held = run.src->flags &
> > 
> > V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
> > 
> >> +	run.dst = v4l2_m2m_release_capture_buf(ctx->fh.m2m_ctx);
> >> 
> >>  	run.first_slice = !run.dst->vb2_buf.copied_timestamp ||
> >>  	
> >>  		run.src->vb2_buf.timestamp != run.dst-
> >> 
> >> vb2_buf.timestamp;
> >> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
> >> b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c index
> >> 99fedec80224..242cad82cc8c 100644
> >> --- a/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
> >> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_hw.c
> >> @@ -103,7 +103,6 @@ static irqreturn_t cedrus_irq(int irq, void *data)
> >> 
> >>  {
> >>  
> >>  	struct cedrus_dev *dev = data;
> >>  	struct cedrus_ctx *ctx;
> >> 
> >> -	struct vb2_v4l2_buffer *src_buf, *dst_buf;
> >> 
> >>  	enum vb2_buffer_state state;
> >>  	enum cedrus_irq_status status;
> >> 
> >> @@ -121,26 +120,12 @@ static irqreturn_t cedrus_irq(int irq, void *data)
> >> 
> >>  	dev->dec_ops[ctx->current_codec]->irq_disable(ctx);
> >>  	dev->dec_ops[ctx->current_codec]->irq_clear(ctx);
> >> 
> >> -	src_buf = v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
> >> -	dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
> >> -
> >> -	if (!src_buf || !dst_buf) {
> >> -		v4l2_err(&dev->v4l2_dev,
> >> -			 "Missing source and/or destination
> > 
> > buffers\n");
> > 
> >> -		return IRQ_HANDLED;
> >> -	}
> >> -
> >> 
> >>  	if (status == CEDRUS_IRQ_ERROR)
> >>  	
> >>  		state = VB2_BUF_STATE_ERROR;
> >>  	
> >>  	else
> >>  	
> >>  		state = VB2_BUF_STATE_DONE;
> >> 
> >> -	v4l2_m2m_buf_done(src_buf, state);
> >> -	if (!dst_buf->is_held) {
> >> -		v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
> >> -		v4l2_m2m_buf_done(dst_buf, state);
> >> -	}
> >> -	v4l2_m2m_job_finish(ctx->dev->m2m_dev, ctx->fh.m2m_ctx);
> >> +	v4l2_m2m_job_finish_held(ctx->dev->m2m_dev, ctx->fh.m2m_ctx, 
state);
> >> 
> >>  	return IRQ_HANDLED;
> >>  
> >>  }
> >> 
> >> diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
> >> index 8ae2f56c7fa3..48ca7d3eaa3d 100644
> >> --- a/include/media/v4l2-mem2mem.h
> >> +++ b/include/media/v4l2-mem2mem.h
> >> @@ -173,6 +173,10 @@ void v4l2_m2m_try_schedule(struct v4l2_m2m_ctx
> >> *m2m_ctx); void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
> >> 
> >>  			 struct v4l2_m2m_ctx *m2m_ctx);
> >> 
> >> +void v4l2_m2m_job_finish_held(struct v4l2_m2m_dev *m2m_dev,
> >> +			 struct v4l2_m2m_ctx *m2m_ctx,
> >> +			 enum vb2_buffer_state state);
> >> +
> >> 
> >>  static inline void
> >>  v4l2_m2m_buf_done(struct vb2_v4l2_buffer *buf, enum vb2_buffer_state
> >>  state)
> >> 
> >> {
> >> @@ -644,47 +648,7 @@ void v4l2_m2m_buf_copy_metadata(const struct
> >> vb2_v4l2_buffer *out_vb, struct vb2_v4l2_buffer *cap_vb,
> >> 
> >>  				bool copy_frame_flags);
> >> 
> >> -/**
> >> - * v4l2_m2m_release_capture_buf() - check if the capture buffer should
> >> be
> >> - * released
> >> - *
> >> - * @out_vb: the output buffer
> >> - * @cap_vb: the capture buffer
> >> - *
> >> - * This helper function returns true if the current capture buffer
> >> should
> >> - * be released to vb2. This is the case if the output buffer specified
> >> that - * the capture buffer should be held (i.e. not returned to vb2)
> >> AND if the - * timestamp of the capture buffer differs from the output
> >> buffer timestamp. - *
> >> - * This helper is to be called at the start of the device_run callback:
> >> - *
> >> - * .. code-block:: c
> >> - *
> >> - *	if (v4l2_m2m_release_capture_buf(out_vb, cap_vb)) {
> >> - *		v4l2_m2m_dst_buf_remove(m2m_ctx);
> >> - *		v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
> >> - *		cap_vb = v4l2_m2m_next_dst_buf(m2m_ctx);
> >> - *	}
> >> - *	cap_vb->is_held = out_vb->flags &
> > 
> > V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF;
> > 
> >> - *
> >> - *	...
> >> - *
> >> - *	v4l2_m2m_buf_done(out_vb, VB2_BUF_STATE_DONE);
> >> - *	if (!cap_vb->is_held) {
> >> - *		v4l2_m2m_dst_buf_remove(m2m_ctx);
> >> - *		v4l2_m2m_buf_done(cap_vb, VB2_BUF_STATE_DONE);
> >> - *	}
> >> - *
> >> - * This allows for multiple output buffers to be used to fill in a
> >> single
> >> - * capture buffer. This is typically used by stateless decoders where
> >> - * multiple e.g. H.264 slices contribute to a single decoded frame.
> >> - */
> >> -static inline bool v4l2_m2m_release_capture_buf(const struct
> >> vb2_v4l2_buffer *out_vb, -
> >> 
> > 	const struct vb2_v4l2_buffer *cap_vb)
> > 	
> >> -{
> >> -	return cap_vb->is_held && cap_vb->vb2_buf.copied_timestamp &&
> >> -	       out_vb->vb2_buf.timestamp != cap_vb->vb2_buf.timestamp;
> >> -}
> >> +struct vb2_v4l2_buffer *v4l2_m2m_release_capture_buf(struct v4l2_m2m_ctx
> >> *m2m_ctx);
> >> 
> >>  /* v4l2 request helper */