diff mbox

[RFC,1/2] media: docs-rst: Add decoder UAPI specification to Codec Interfaces

Message ID 20180605103328.176255-2-tfiga@chromium.org (mailing list archive)
State New, archived
Headers show

Commit Message

Tomasz Figa June 5, 2018, 10:33 a.m. UTC
Due to complexity of the video decoding process, the V4L2 drivers of
stateful decoder hardware require specific sequencies of V4L2 API calls
to be followed. These include capability enumeration, initialization,
decoding, seek, pause, dynamic resolution change, flush and end of
stream.

Specifics of the above have been discussed during Media Workshops at
LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
Conference Europe 2014 in Düsseldorf. The de facto Codec API that
originated at those events was later implemented by the drivers we already
have merged in mainline, such as s5p-mfc or mtk-vcodec.

The only thing missing was the real specification included as a part of
Linux Media documentation. Fix it now and document the decoder part of
the Codec API.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
---
 Documentation/media/uapi/v4l/dev-codec.rst | 771 +++++++++++++++++++++
 Documentation/media/uapi/v4l/v4l2.rst      |  14 +-
 2 files changed, 784 insertions(+), 1 deletion(-)

Comments

Philipp Zabel June 5, 2018, 11:41 a.m. UTC | #1
Hi Tomasz,

On Tue, 2018-06-05 at 19:33 +0900, Tomasz Figa wrote:
> Due to complexity of the video decoding process, the V4L2 drivers of
> stateful decoder hardware require specific sequencies of V4L2 API calls
> to be followed. These include capability enumeration, initialization,
> decoding, seek, pause, dynamic resolution change, flush and end of
> stream.
> 
> Specifics of the above have been discussed during Media Workshops at
> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> originated at those events was later implemented by the drivers we already
> have merged in mainline, such as s5p-mfc or mtk-vcodec.
> 
> The only thing missing was the real specification included as a part of
> Linux Media documentation. Fix it now and document the decoder part of
> the Codec API.
> 
> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> ---
>  Documentation/media/uapi/v4l/dev-codec.rst | 771 +++++++++++++++++++++
>  Documentation/media/uapi/v4l/v4l2.rst      |  14 +-
>  2 files changed, 784 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> index c61e938bd8dc..0483b10c205e 100644
> --- a/Documentation/media/uapi/v4l/dev-codec.rst
> +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> @@ -34,3 +34,774 @@ the codec and reprogram it whenever another file handler gets access.
>  This is different from the usual video node behavior where the video
>  properties are global to the device (i.e. changing something through one
>  file handle is visible through another file handle).
> +
> +This interface is generally appropriate for hardware that does not
> +require additional software involvement to parse/partially decode/manage
> +the stream before/after processing in hardware.
> +
> +Input data to the Stream API are buffers containing unprocessed video
> +stream (Annex-B H264/H265 stream, raw VP8/9 stream) only. The driver is
> +expected not to require any additional information from the client to
> +process these buffers, and to return decoded frames on the CAPTURE queue
> +in display order.
> +
> +Performing software parsing, processing etc. of the stream in the driver
> +in order to support stream API is strongly discouraged. In such case use
> +of Stateless Codec Interface (in development) is preferred.
> +
> +Conventions and notation used in this document
> +==============================================
> +
> +1. The general V4L2 API rules apply if not specified in this document
> +   otherwise.
> +
> +2. The meaning of words “must”, “may”, “should”, etc. is as per RFC
> +   2119.
> +
> +3. All steps not marked “optional” are required.
> +
> +4. :c:func:`VIDIOC_G_EXT_CTRLS`, :c:func:`VIDIOC_S_EXT_CTRLS` may be used interchangeably with
> +   :c:func:`VIDIOC_G_CTRL`, :c:func:`VIDIOC_S_CTRL`, unless specified otherwise.
> +
> +5. Single-plane API (see spec) and applicable structures may be used
> +   interchangeably with Multi-plane API, unless specified otherwise.
> +
> +6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
> +   [0..2]: i = 0, 1, 2.
> +
> +7. For OUTPUT buffer A, A’ represents a buffer on the CAPTURE queue
> +   containing data (decoded or encoded frame/stream) that resulted
> +   from processing buffer A.
> +
> +Glossary
> +========
> +
> +CAPTURE
> +   the destination buffer queue, decoded frames for
> +   decoders, encoded bitstream for encoders;
> +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE`` or
> +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``
> +
> +client
> +   application client communicating with the driver
> +   implementing this API
> +
> +coded format
> +   encoded/compressed video bitstream format (e.g.
> +   H.264, VP8, etc.); see raw format; this is not equivalent to fourcc
> +   (V4L2 pixelformat), as each coded format may be supported by multiple
> +   fourccs (e.g. ``V4L2_PIX_FMT_H264``, ``V4L2_PIX_FMT_H264_SLICE``, etc.)
> +
> +coded height
> +   height for given coded resolution
> +
> +coded resolution
> +   stream resolution in pixels aligned to codec
> +   format and hardware requirements; see also visible resolution
> +
> +coded width
> +   width for given coded resolution
> +
> +decode order
> +   the order in which frames are decoded; may differ
> +   from display (output) order if frame reordering (B frames) is active in
> +   the stream; OUTPUT buffers must be queued in decode order; for frame
> +   API, CAPTURE buffers must be returned by the driver in decode order;
> +
> +display order
> +   the order in which frames must be displayed
> +   (outputted); for stream API, CAPTURE buffers must be returned by the
> +   driver in display order;
> +
> +EOS
> +   end of stream
> +
> +input height
> +   height in pixels for given input resolution
> +
> +input resolution
> +   resolution in pixels of source frames being input
> +   to the encoder and subject to further cropping to the bounds of visible
> +   resolution
> +
> +input width
> +   width in pixels for given input resolution
> +
> +OUTPUT
> +   the source buffer queue, encoded bitstream for
> +   decoders, raw frames for encoders; ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
> +   ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``
> +
> +raw format
> +   uncompressed format containing raw pixel data (e.g.
> +   YUV, RGB formats)
> +
> +resume point
> +   a point in the bitstream from which decoding may
> +   start/continue, without any previous state/data present, e.g.: a
> +   keyframe (VPX) or SPS/PPS/IDR sequence (H.264); a resume point is
> +   required to start decode of a new stream, or to resume decoding after a
> +   seek;
> +
> +source buffer
> +   buffers allocated for source queue
> +
> +source queue
> +   queue containing buffers used for source data, i.e.
> +
> +visible height
> +   height for given visible resolution
> +
> +visible resolution
> +   stream resolution of the visible picture, in
> +   pixels, to be used for display purposes; must be smaller or equal to
> +   coded resolution;
> +
> +visible width
> +   width for given visible resolution
> +
> +Decoder
> +=======
> +
> +Querying capabilities
> +---------------------
> +
> +1. To enumerate the set of coded formats supported by the driver, the
> +   client uses :c:func:`VIDIOC_ENUM_FMT` for OUTPUT. The driver must always
> +   return the full set of supported formats, irrespective of the
> +   format set on the CAPTURE queue.
> +
> +2. To enumerate the set of supported raw formats, the client uses
> +   :c:func:`VIDIOC_ENUM_FMT` for CAPTURE. The driver must return only the
> +   formats supported for the format currently set on the OUTPUT
> +   queue.
> +   In order to enumerate raw formats supported by a given coded
> +   format, the client must first set that coded format on the
> +   OUTPUT queue and then enumerate the CAPTURE queue.
> +
> +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> +   resolutions for a given format, passing its fourcc in
> +   :c:type:`v4l2_frmivalenum` ``pixel_format``.

Is this a must-implement for drivers? coda currently doesn't implement
enum-framesizes.

> +
> +   a. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for coded formats
> +      must be maximums for given coded format for all supported raw
> +      formats.

I don't understand what maximums means in this context.

If I have a decoder that can decode from 16x16 up to 1920x1088, should
this return a continuous range from minimum frame size to maximum frame
size?

> +   b. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for raw formats must
> +      be maximums for given raw format for all supported coded
> +      formats.

Same here, this is unclear to me.

> +   c. The client should derive the supported resolution for a
> +      combination of coded+raw format by calculating the
> +      intersection of resolutions returned from calls to
> +      :c:func:`VIDIOC_ENUM_FRAMESIZES` for the given coded and raw formats.
> +
> +4. Supported profiles and levels for given format, if applicable, may be
> +   queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> +
> +5. The client may use :c:func:`VIDIOC_ENUM_FRAMEINTERVALS` to enumerate maximum
> +   supported framerates by the driver/hardware for a given
> +   format+resolution combination.

Same as above, is this must-implement for decoder drivers?

> +
> +Initialization sequence
> +-----------------------
> +
> +1. (optional) Enumerate supported OUTPUT formats and resolutions. See
> +   capability enumeration.
> +
> +2. Set a coded format on the source queue via :c:func:`VIDIOC_S_FMT`
> +
> +   a. Required fields:
> +
> +      i.   type = OUTPUT
> +
> +      ii.  fmt.pix_mp.pixelformat set to a coded format
> +
> +      iii. fmt.pix_mp.width, fmt.pix_mp.height only if cannot be
> +           parsed from the stream for the given coded format;
> +           ignored otherwise;

When this is set, does this also update the format on the CAPTURE queue
(i.e. would G_FMT(CAP), S_FMT(OUT), G_FMT(CAP) potentially return
different CAP formats?) I think this should be explained here.

What about colorimetry, does setting colorimetry here overwrite
colorimetry information that may potentially be contained in the stream?

> +   b. Return values:
> +
> +      i.  EINVAL: unsupported format.
> +
> +      ii. Others: per spec
> +
> +   .. note::
> +
> +      The driver must not adjust pixelformat, so if
> +      ``V4L2_PIX_FMT_H264`` is passed but only
> +      ``V4L2_PIX_FMT_H264_SLICE`` is supported, S_FMT will return
> +      -EINVAL. If both are acceptable by client, calling S_FMT for
> +      the other after one gets rejected may be required (or use
> +      :c:func:`VIDIOC_ENUM_FMT` to discover beforehand, see Capability
> +      enumeration).
> +
> +3.  (optional) Get minimum number of buffers required for OUTPUT queue
> +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> +    more buffers than minimum required by hardware/format (see
> +    allocation).
> +
> +    a. Required fields:
> +
> +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> +
> +    b. Return values: per spec.
> +
> +    c. Return fields:
> +
> +       i. value: required number of OUTPUT buffers for the currently set
> +          format;
> +
> +4.  Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on OUTPUT
> +    queue.
> +
> +    a. Required fields:
> +
> +       i.   count = n, where n > 0.
> +
> +       ii.  type = OUTPUT
> +
> +       iii. memory = as per spec
> +
> +    b. Return values: Per spec.
> +
> +    c. Return fields:
> +
> +       i. count: adjusted to allocated number of buffers
> +
> +    d. The driver must adjust count to minimum of required number of
> +       source buffers for given format and count passed. The client
> +       must check this value after the ioctl returns to get the
> +       number of buffers allocated.
> +
> +    .. note::
> +
> +       Passing count = 1 is useful for letting the driver choose
> +       the minimum according to the selected format/hardware
> +       requirements.
> +
> +    .. note::
> +
> +       To allocate more than minimum number of buffers (for pipeline
> +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT)`` to
> +       get minimum number of buffers required by the driver/format,
> +       and pass the obtained value plus the number of additional
> +       buffers needed in count to :c:func:`VIDIOC_REQBUFS`.
> +
> +5.  Begin parsing the stream for stream metadata via :c:func:`VIDIOC_STREAMON` on
> +    OUTPUT queue. This step allows the driver to parse/decode
> +    initial stream metadata until enough information to allocate
> +    CAPTURE buffers is found. This is indicated by the driver by
> +    sending a ``V4L2_EVENT_SOURCE_CHANGE`` event, which the client
> +    must handle.
> +
> +    a. Required fields: as per spec.
> +
> +    b. Return values: as per spec.
> +
> +    .. note::
> +
> +       Calling :c:func:`VIDIOC_REQBUFS`, :c:func:`VIDIOC_STREAMON`
> +       or :c:func:`VIDIOC_G_FMT` on the CAPTURE queue at this time is not
> +       allowed and must return EINVAL.

What about devices that have a frame buffer registration step before
stream start? For coda I need to know all CAPTURE buffers before I can
start streaming, because there is no way to register them after
STREAMON. Do I have to split the driver internally to do streamoff and
restart when the capture queue is brought up?

> +6.  This step only applies for coded formats that contain resolution
> +    information in the stream.
> +    Continue queuing/dequeuing bitstream buffers to/from the
> +    OUTPUT queue via :c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`. The driver
> +    must keep processing and returning each buffer to the client
> +    until required metadata to send a ``V4L2_EVENT_SOURCE_CHANGE``
> +    for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION`` is
> +    found. There is no requirement to pass enough data for this to
> +    occur in the first buffer and the driver must be able to
> +    process any number
> +
> +    a. Required fields: as per spec.
> +
> +    b. Return values: as per spec.
> +
> +    c. If data in a buffer that triggers the event is required to decode
> +       the first frame, the driver must not return it to the client,
> +       but must retain it for further decoding.
> +
> +    d. Until the resolution source event is sent to the client, calling
> +       :c:func:`VIDIOC_G_FMT` on the CAPTURE queue must return -EINVAL.
> +
> +    .. note::
> +
> +       No decoded frames are produced during this phase.
> +
> +7.  This step only applies for coded formats that contain resolution
> +    information in the stream.
> +    Receive and handle ``V4L2_EVENT_SOURCE_CHANGE`` from the driver
> +    via :c:func:`VIDIOC_DQEVENT`. The driver must send this event once
> +    enough data is obtained from the stream to allocate CAPTURE
> +    buffers and to begin producing decoded frames.
> +
> +    a. Required fields:
> +
> +       i. type = ``V4L2_EVENT_SOURCE_CHANGE``
> +
> +    b. Return values: as per spec.
> +
> +    c. The driver must return u.src_change.changes =
> +       ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> +
> +8.  This step only applies for coded formats that contain resolution
> +    information in the stream.
> +    Call :c:func:`VIDIOC_G_FMT` for CAPTURE queue to get format for the
> +    destination buffers parsed/decoded from the bitstream.
> +
> +    a. Required fields:
> +
> +       i. type = CAPTURE
> +
> +    b. Return values: as per spec.
> +
> +    c. Return fields:
> +
> +       i.   fmt.pix_mp.width, fmt.pix_mp.height: coded resolution
> +            for the decoded frames
> +
> +       ii.  fmt.pix_mp.pixelformat: default/required/preferred by
> +            driver pixelformat for decoded frames.

This text is specific to multiplanar queues, what about singleplanar
drivers?

> +
> +       iii. num_planes: set to number of planes for pixelformat.
> +
> +       iv.  For each plane p = [0, num_planes-1]:
> +            plane_fmt[p].sizeimage, plane_fmt[p].bytesperline as
> +            per spec for coded resolution.
> +
> +    .. note::
> +
> +       Te value of pixelformat may be any pixel format supported,

Typo, "The value ..."

> +       and must
> +       be supported for current stream, based on the information
> +       parsed from the stream and hardware capabilities. It is
> +       suggested that driver chooses the preferred/optimal format
> +       for given configuration. For example, a YUV format may be
> +       preferred over an RGB format, if additional conversion step
> +       would be required.
> +
> +9.  (optional) Enumerate CAPTURE formats via :c:func:`VIDIOC_ENUM_FMT` on
> +    CAPTURE queue.
> +    Once the stream information is parsed and known, the client
> +    may use this ioctl to discover which raw formats are supported
> +    for given stream and select on of them via :c:func:`VIDIOC_S_FMT`.
> +
> +    a. Fields/return values as per spec.
> +
> +    .. note::
> +
> +       The driver must return only formats supported for the
> +       current stream parsed in this initialization sequence, even
> +       if more formats may be supported by the driver in general.
> +       For example, a driver/hardware may support YUV and RGB
> +       formats for resolutions 1920x1088 and lower, but only YUV for
> +       higher resolutions (e.g. due to memory bandwidth
> +       limitations). After parsing a resolution of 1920x1088 or
> +       lower, :c:func:`VIDIOC_ENUM_FMT` may return a set of YUV and RGB
> +       pixelformats, but after parsing resolution higher than
> +       1920x1088, the driver must not return (unsupported for this
> +       resolution) RGB.
> +
> +       However, subsequent resolution change event
> +       triggered after discovering a resolution change within the
> +       same stream may switch the stream into a lower resolution;
> +       :c:func:`VIDIOC_ENUM_FMT` must return RGB formats again in that case.
> +
> +10.  (optional) Choose a different CAPTURE format than suggested via
> +     :c:func:`VIDIOC_S_FMT` on CAPTURE queue. It is possible for the client
> +     to choose a different format than selected/suggested by the
> +     driver in :c:func:`VIDIOC_G_FMT`.
> +
> +     a. Required fields:
> +
> +        i.  type = CAPTURE
> +
> +        ii. fmt.pix_mp.pixelformat set to a coded format
> +
> +     b. Return values:
> +
> +        i. EINVAL: unsupported format.
> +
> +     c. Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available formats
> +        after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to find
> +        out a set of allowed pixelformats for given configuration,
> +        but not required.

What about colorimetry? Should this and TRY_FMT only allow colorimetry
that is parsed from the stream, if available, or that was set via
S_FMT(OUT) as an override?

> +11.  (optional) Acquire visible resolution via :c:func:`VIDIOC_G_SELECTION`.
> +
> +    a. Required fields:
> +
> +       i.  type = CAPTURE
> +
> +       ii. target = ``V4L2_SEL_TGT_CROP``
> +
> +    b. Return values: per spec.
> +
> +    c. Return fields
> +
> +       i. r.left, r.top, r.width, r.height: visible rectangle; this must
> +          fit within coded resolution returned from :c:func:`VIDIOC_G_FMT`.

Isn't CROP supposed to be set on the OUTPUT queue only and COMPOSE on
the CAPTURE queue?
I would expect COMPOSE/COMPOSE_DEFAULT to be set to the visible
rectangle and COMPOSE_PADDED to be set to the rectangle that the
hardware actually overwrites.

> +12. (optional) Get minimum number of buffers required for CAPTURE queue
> +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> +    more buffers than minimum required by hardware/format (see
> +    allocation).
> +
> +    a. Required fields:
> +
> +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
> +
> +    b. Return values: per spec.
> +
> +    c. Return fields:
> +
> +       i. value: minimum number of buffers required to decode the stream
> +          parsed in this initialization sequence.
> +
> +    .. note::
> +
> +       Note that the minimum number of buffers must be at least the
> +       number required to successfully decode the current stream.
> +       This may for example be the required DPB size for an H.264
> +       stream given the parsed stream configuration (resolution,
> +       level).
> +
> +13. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS` on the
> +    CAPTURE queue.
> +
> +    a. Required fields:
> +
> +       i.   count = n, where n > 0.
> +
> +       ii.  type = CAPTURE
> +
> +       iii. memory = as per spec
> +
> +    b. Return values: Per spec.
> +
> +    c. Return fields:
> +
> +       i. count: adjusted to allocated number of buffers.
> +
> +    d. The driver must adjust count to minimum of required number of
> +       destination buffers for given format and stream configuration
> +       and the count passed. The client must check this value after
> +       the ioctl returns to get the number of buffers allocated.
> +
> +    .. note::
> +
> +       Passing count = 1 is useful for letting the driver choose
> +       the minimum.
> +
> +    .. note::
> +
> +       To allocate more than minimum number of buffers (for pipeline
> +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE)`` to
> +       get minimum number of buffers required, and pass the obtained
> +       value plus the number of additional buffers needed in count
> +       to :c:func:`VIDIOC_REQBUFS`.
> +
> +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
> +
> +    a. Required fields: as per spec.
> +
> +    b. Return values: as per spec.
> +
> +Decoding
> +--------
> +
> +This state is reached after a successful initialization sequence. In
> +this state, client queues and dequeues buffers to both queues via
> +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, as per spec.
> +
> +Both queues operate independently. The client may queue and dequeue
> +buffers to queues in any order and at any rate, also at a rate different
> +for each queue. The client may queue buffers within the same queue in
> +any order (V4L2 index-wise). It is recommended for the client to operate
> +the queues independently for best performance.
> +
> +Source OUTPUT buffers must contain:
> +
> +-  H.264/AVC: one or more complete NALUs of an Annex B elementary
> +   stream; one buffer does not have to contain enough data to decode
> +   a frame;

What if the hardware only supports handling complete frames?

> +-  VP8/VP9: one or more complete frames.
> +
> +No direct relationship between source and destination buffers and the
> +timing of buffers becoming available to dequeue should be assumed in the
> +Stream API. Specifically:
> +
> +-  a buffer queued to OUTPUT queue may result in no buffers being
> +   produced on the CAPTURE queue (e.g. if it does not contain
> +   encoded data, or if only metadata syntax structures are present
> +   in it), or one or more buffers produced on the CAPTURE queue (if
> +   the encoded data contained more than one frame, or if returning a
> +   decoded frame allowed the driver to return a frame that preceded
> +   it in decode, but succeeded it in display order)
> +
> +-  a buffer queued to OUTPUT may result in a buffer being produced on
> +   the CAPTURE queue later into decode process, and/or after
> +   processing further OUTPUT buffers, or be returned out of order,
> +   e.g. if display reordering is used
> +
> +-  buffers may become available on the CAPTURE queue without additional
> +   buffers queued to OUTPUT (e.g. during flush or EOS)
> +
> +Seek
> +----
> +
> +Seek is controlled by the OUTPUT queue, as it is the source of bitstream
> +data. CAPTURE queue remains unchanged/unaffected.

Does this mean that to achieve instantaneous seeks the driver has to
flush its CAPTURE queue internally when a seek is issued?

> +
> +1. Stop the OUTPUT queue to begin the seek sequence via
> +   :c:func:`VIDIOC_STREAMOFF`.
> +
> +   a. Required fields:
> +
> +      i. type = OUTPUT
> +
> +   b. The driver must drop all the pending OUTPUT buffers and they are
> +      treated as returned to the client (as per spec).

What about pending CAPTURE buffers that the client may not yet have
dequeued?

> +
> +2. Restart the OUTPUT queue via :c:func:`VIDIOC_STREAMON`
> +
> +   a. Required fields:
> +
> +      i. type = OUTPUT
> +
> +   b. The driver must be put in a state after seek and be ready to
> +      accept new source bitstream buffers.
> +
> +3. Start queuing buffers to OUTPUT queue containing stream data after
> +   the seek until a suitable resume point is found.
> +
> +   .. note::
> +
> +      There is no requirement to begin queuing stream
> +      starting exactly from a resume point (e.g. SPS or a keyframe).
> +      The driver must handle any data queued and must keep processing
> +      the queued buffers until it finds a suitable resume point.
> +      While looking for a resume point, the driver processes OUTPUT
> +      buffers and returns them to the client without producing any
> +      decoded frames.
> +
> +4. After a resume point is found, the driver will start returning
> +   CAPTURE buffers with decoded frames.
> +
> +   .. note::
> +
> +      There is no precise specification for CAPTURE queue of when it
> +      will start producing buffers containing decoded data from
> +      buffers queued after the seek, as it operates independently
> +      from OUTPUT queue.
> +
> +      -  The driver is allowed to and may return a number of remaining CAPTURE
> +         buffers containing decoded frames from before the seek after the
> +         seek sequence (STREAMOFF-STREAMON) is performed.

Oh, ok. That answers my last question above.

> +      -  The driver is also allowed to and may not return all decoded frames
> +         queued but not decode before the seek sequence was initiated.
> +         E.g. for an OUTPUT queue sequence: QBUF(A), QBUF(B),
> +         STREAMOFF(OUT), STREAMON(OUT), QBUF(G), QBUF(H), any of the
> +         following results on the CAPTURE queue is allowed: {A’, B’, G’,
> +         H’}, {A’, G’, H’}, {G’, H’}.
> +
> +Pause
> +-----
> +
> +In order to pause, the client should just cease queuing buffers onto the
> +OUTPUT queue. This is different from the general V4L2 API definition of
> +pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue. Without
> +source bitstream data, there is not data to process and the hardware
> +remains idle. Conversely, using :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue
> +indicates a seek, which 1) drops all buffers in flight and 2) after a

"... 1) drops all OUTPUT buffers in flight ... " ?

> +subsequent :c:func:`VIDIOC_STREAMON` will look for and only continue from a
> +resume point. This is usually undesirable for pause. The
> +STREAMOFF-STREAMON sequence is intended for seeking.
> +
> +Similarly, CAPTURE queue should remain streaming as well, as the
> +STREAMOFF-STREAMON sequence on it is intended solely for changing buffer
> +sets
> +
> +Dynamic resolution change
> +-------------------------
> +
> +When driver encounters a resolution change in the stream, the dynamic
> +resolution change sequence is started.

Must all drivers support dynamic resolution change?

> +1.  On encountering a resolution change in the stream. The driver must
> +    first process and decode all remaining buffers from before the
> +    resolution change point.
> +
> +2.  After all buffers containing decoded frames from before the
> +    resolution change point are ready to be dequeued on the
> +    CAPTURE queue, the driver sends a ``V4L2_EVENT_SOURCE_CHANGE``
> +    event for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> +    The last buffer from before the change must be marked with
> +    :c:type:`v4l2_buffer` ``flags`` flag ``V4L2_BUF_FLAG_LAST`` as in the flush
> +    sequence.
> +
> +    .. note::
> +
> +       Any attempts to dequeue more buffers beyond the buffer marked
> +       with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
> +       :c:func:`VIDIOC_DQBUF`.
> +
> +3.  After dequeuing all remaining buffers from the CAPTURE queue, the
> +    client must call :c:func:`VIDIOC_STREAMOFF` on the CAPTURE queue. The
> +    OUTPUT queue remains streaming (calling STREAMOFF on it would
> +    trigger a seek).
> +    Until STREAMOFF is called on the CAPTURE queue (acknowledging
> +    the event), the driver operates as if the resolution hasn’t
> +    changed yet, i.e. :c:func:`VIDIOC_G_FMT`, etc. return previous
> +    resolution.

What about the OUTPUT queue resolution, does it change as well?

> +4.  The client frees the buffers on the CAPTURE queue using
> +    :c:func:`VIDIOC_REQBUFS`.
> +
> +    a. Required fields:
> +
> +       i.   count = 0
> +
> +       ii.  type = CAPTURE
> +
> +       iii. memory = as per spec
> +
> +5.  The client calls :c:func:`VIDIOC_G_FMT` for CAPTURE to get the new format
> +    information.
> +    This is identical to calling :c:func:`VIDIOC_G_FMT` after
> +    ``V4L2_EVENT_SRC_CH_RESOLUTION`` in the initialization
> +    sequence and should be handled similarly.
> +
> +    .. note::
> +
> +       It is allowed for the driver not to support the same
> +       pixelformat as previously used (before the resolution change)
> +       for the new resolution. The driver must select a default
> +       supported pixelformat and return it from :c:func:`VIDIOC_G_FMT`, and
> +       client must take note of it.
> +

Can steps 4. and 5. be done in reverse order (i.e. first G_FMT and then
REQBUFS(0))?
If the client already has buffers allocated that are large enough to
contain decoded buffers in the new resolution, it might be preferable to
just keep them instead of reallocating.

> +6.  (optional) The client is allowed to enumerate available formats and
> +    select a different one than currently chosen (returned via
> +    :c:func:`VIDIOC_G_FMT)`. This is identical to a corresponding step in
> +    the initialization sequence.
> +
> +7.  (optional) The client acquires visible resolution as in
> +    initialization sequence.
> +
> +8.  (optional) The client acquires minimum number of buffers as in
> +    initialization sequence.
> +
> +9.  The client allocates a new set of buffers for the CAPTURE queue via
> +    :c:func:`VIDIOC_REQBUFS`. This is identical to a corresponding step in
> +    the initialization sequence.
> +
> +10. The client resumes decoding by issuing :c:func:`VIDIOC_STREAMON` on the
> +    CAPTURE queue.
> +
> +During the resolution change sequence, the OUTPUT queue must remain
> +streaming. Calling :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue will initiate seek.
> +
> +The OUTPUT queue operates separately from the CAPTURE queue for the
> +duration of the entire resolution change sequence. It is allowed (and
> +recommended for best performance and simplcity) for the client to keep
> +queuing/dequeuing buffers from/to OUTPUT queue even while processing
> +this sequence.
> +
> +.. note::
> +
> +   It is also possible for this sequence to be triggered without
> +   change in resolution if a different number of CAPTURE buffers is
> +   required in order to continue decoding the stream.
> +
> +Flush
> +-----
> +
> +Flush is the process of draining the CAPTURE queue of any remaining
> +buffers. After the flush sequence is complete, the client has received
> +all decoded frames for all OUTPUT buffers queued before the sequence was
> +started.
> +
> +1. Begin flush by issuing :c:func:`VIDIOC_DECODER_CMD`.
> +
> +   a. Required fields:
> +
> +      i. cmd = ``V4L2_DEC_CMD_STOP``
> +
> +2. The driver must process and decode as normal all OUTPUT buffers
> +   queued by the client before the :c:func:`VIDIOC_DECODER_CMD` was
> +   issued.
> +   Any operations triggered as a result of processing these
> +   buffers (including the initialization and resolution change
> +   sequences) must be processed as normal by both the driver and
> +   the client before proceeding with the flush sequence.
> +
> +3. Once all OUTPUT buffers queued before ``V4L2_DEC_CMD_STOP`` are
> +   processed:
> +
> +   a. If the CAPTURE queue is streaming, once all decoded frames (if
> +      any) are ready to be dequeued on the CAPTURE queue, the
> +      driver must send a ``V4L2_EVENT_EOS``. The driver must also
> +      set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer` ``flags`` field on the
> +      buffer on the CAPTURE queue containing the last frame (if
> +      any) produced as a result of processing the OUTPUT buffers
> +      queued before ``V4L2_DEC_CMD_STOP``. If no more frames are
> +      left to be returned at the point of handling
> +      ``V4L2_DEC_CMD_STOP``, the driver must return an empty buffer
> +      (with :c:type:`v4l2_buffer` ``bytesused`` = 0) as the last buffer with
> +      ``V4L2_BUF_FLAG_LAST`` set instead.
> +      Any attempts to dequeue more buffers beyond the buffer
> +      marked with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE
> +      error from :c:func:`VIDIOC_DQBUF`.
> +
> +   b. If the CAPTURE queue is NOT streaming, no action is necessary for
> +      CAPTURE queue and the driver must send a ``V4L2_EVENT_EOS``
> +      immediately after all OUTPUT buffers in question have been
> +      processed.
> +
> +4. To resume, client may issue ``V4L2_DEC_CMD_START``.
> +
> +End of stream
> +-------------
> +
> +When an explicit end of stream is encountered by the driver in the
> +stream, it must send a ``V4L2_EVENT_EOS`` to the client after all frames
> +are decoded and ready to be dequeued on the CAPTURE queue, with the
> +:c:type:`v4l2_buffer` ``flags`` set to ``V4L2_BUF_FLAG_LAST``. This behavior is
> +identical to the flush sequence as if triggered by the client via
> +``V4L2_DEC_CMD_STOP``.
> +
> +Commit points
> +-------------
> +
> +Setting formats and allocating buffers triggers changes in the behavior
> +of the driver.
> +
> +1. Setting format on OUTPUT queue may change the set of formats
> +   supported/advertised on the CAPTURE queue. It also must change
> +   the format currently selected on CAPTURE queue if it is not
> +   supported by the newly selected OUTPUT format to a supported one.

Ok. Is the same true about the contained colorimetry? What should happen
if the stream contains colorimetry information that differs from
S_FMT(OUT) colorimetry?

> +2. Enumerating formats on CAPTURE queue must only return CAPTURE formats
> +   supported for the OUTPUT format currently set.
> +
> +3. Setting/changing format on CAPTURE queue does not change formats
> +   available on OUTPUT queue. An attempt to set CAPTURE format that
> +   is not supported for the currently selected OUTPUT format must
> +   result in an error (-EINVAL) from :c:func:`VIDIOC_S_FMT`.

Is this limited to the pixel format? Surely setting out of bounds
width/height or incorrect colorimetry should not result in EINVAL but
still be corrected by the driver?

> +4. Enumerating formats on OUTPUT queue always returns a full set of
> +   supported formats, irrespective of the current format selected on
> +   CAPTURE queue.
> +
> +5. After allocating buffers on the OUTPUT queue, it is not possible to
> +   change format on it.

So even after source change events the OUTPUT queue still keeps the
initial OUTPUT format?

> +To summarize, setting formats and allocation must always start with the
> +OUTPUT queue and the OUTPUT queue is the master that governs the set of
> +supported formats for the CAPTURE queue.
> diff --git a/Documentation/media/uapi/v4l/v4l2.rst b/Documentation/media/uapi/v4l/v4l2.rst
> index b89e5621ae69..563d5b861d1c 100644
> --- a/Documentation/media/uapi/v4l/v4l2.rst
> +++ b/Documentation/media/uapi/v4l/v4l2.rst
> @@ -53,6 +53,10 @@ Authors, in alphabetical order:
>  
>    - Original author of the V4L2 API and documentation.
>  
> +- Figa, Tomasz <tfiga@chromium.org>
> +
> +  - Documented parts of the V4L2 (stateful) Codec Interface. Migrated from Google Docs to kernel documentation.
> +
>  - H Schimek, Michael <mschimek@gmx.at>
>  
>    - Original author of the V4L2 API and documentation.
> @@ -65,6 +69,10 @@ Authors, in alphabetical order:
>  
>    - Designed and documented the multi-planar API.
>  
> +- Osciak, Pawel <posciak@chromium.org>
> +
> +  - Documented the V4L2 (stateful) Codec Interface.
> +
>  - Palosaari, Antti <crope@iki.fi>
>  
>    - SDR API.
> @@ -85,7 +93,7 @@ Authors, in alphabetical order:
>  
>    - Designed and documented the VIDIOC_LOG_STATUS ioctl, the extended control ioctls, major parts of the sliced VBI API, the MPEG encoder and decoder APIs and the DV Timings API.
>  
> -**Copyright** |copy| 1999-2016: Bill Dirks, Michael H. Schimek, Hans Verkuil, Martin Rubli, Andy Walls, Muralidharan Karicheri, Mauro Carvalho Chehab, Pawel Osciak, Sakari Ailus & Antti Palosaari.
> +**Copyright** |copy| 1999-2018: Bill Dirks, Michael H. Schimek, Hans Verkuil, Martin Rubli, Andy Walls, Muralidharan Karicheri, Mauro Carvalho Chehab, Pawel Osciak, Sakari Ailus & Antti Palosaari, Tomasz Figa.
>  
>  Except when explicitly stated as GPL, programming examples within this
>  part can be used and distributed without restrictions.
> @@ -94,6 +102,10 @@ part can be used and distributed without restrictions.
>  Revision History
>  ****************
>  
> +:revision: TBD / TBD (*tf*)
> +
> +Add specification of V4L2 Codec Interface UAPI.
> +
>  :revision: 4.10 / 2016-07-15 (*rr*)
>  
>  Introduce HSV formats.

regards
Philipp
Dave Stevenson June 5, 2018, 1:10 p.m. UTC | #2
Hi Tomasz.

Thanks for formalising this.
I'm working on a stateful V4L2 codec driver on the Raspberry Pi and
was having to deduce various implementation details from other
drivers. I know how much we all tend to hate having to write
documentation, but it is useful to have.

On 5 June 2018 at 11:33, Tomasz Figa <tfiga@chromium.org> wrote:
> Due to complexity of the video decoding process, the V4L2 drivers of
> stateful decoder hardware require specific sequencies of V4L2 API calls
> to be followed. These include capability enumeration, initialization,
> decoding, seek, pause, dynamic resolution change, flush and end of
> stream.
>
> Specifics of the above have been discussed during Media Workshops at
> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> originated at those events was later implemented by the drivers we already
> have merged in mainline, such as s5p-mfc or mtk-vcodec.
>
> The only thing missing was the real specification included as a part of
> Linux Media documentation. Fix it now and document the decoder part of
> the Codec API.
>
> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> ---
>  Documentation/media/uapi/v4l/dev-codec.rst | 771 +++++++++++++++++++++
>  Documentation/media/uapi/v4l/v4l2.rst      |  14 +-
>  2 files changed, 784 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> index c61e938bd8dc..0483b10c205e 100644
> --- a/Documentation/media/uapi/v4l/dev-codec.rst
> +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> @@ -34,3 +34,774 @@ the codec and reprogram it whenever another file handler gets access.
>  This is different from the usual video node behavior where the video
>  properties are global to the device (i.e. changing something through one
>  file handle is visible through another file handle).

I know this isn't part of the changes, but raises a question in
v4l2-compliance (so probably one for Hans).
testUnlimitedOpens tries opening the device 100 times. On a normal
device this isn't a significant overhead, but when you're allocating
resources on a per instance basis it quickly adds up.
Internally I have state that has a limit of 64 codec instances (either
encode or decode), so either I allocate at start_streaming and fail on
the 65th one, or I fail on open. I generally take the view that
failing early is a good thing.
Opinions? Is 100 instances of an M2M device really sensible?

> +This interface is generally appropriate for hardware that does not
> +require additional software involvement to parse/partially decode/manage
> +the stream before/after processing in hardware.
> +
> +Input data to the Stream API are buffers containing unprocessed video
> +stream (Annex-B H264/H265 stream, raw VP8/9 stream) only. The driver is
> +expected not to require any additional information from the client to
> +process these buffers, and to return decoded frames on the CAPTURE queue
> +in display order.

This intersects with the question I asked on the list back in April
but got no reply [1].
Is there a requirement or expectation for the encoded data to be
framed as a single encoded frame per buffer, or is feeding in full
buffer sized chunks from a ES valid? It's not stated for the
description of V4L2_PIX_FMT_H264 etc either.
If not framed then anything assuming one-in one-out fails badly, but
it's likely to fail anyway if the stream has reference frames.

This description is also exclusive to video decode, whereas the top
section states "A V4L2 codec can compress, decompress, transform, or
otherwise convert video data". Should it be in the decoder section
below?

Have I missed a statement of what the Stream API is and how it differs
from any other API?

[1] https://www.spinics.net/lists/linux-media/msg133102.html

> +Performing software parsing, processing etc. of the stream in the driver
> +in order to support stream API is strongly discouraged. In such case use
> +of Stateless Codec Interface (in development) is preferred.
> +
> +Conventions and notation used in this document
> +==============================================
> +
> +1. The general V4L2 API rules apply if not specified in this document
> +   otherwise.
> +
> +2. The meaning of words “must”, “may”, “should”, etc. is as per RFC
> +   2119.
> +
> +3. All steps not marked “optional” are required.
> +
> +4. :c:func:`VIDIOC_G_EXT_CTRLS`, :c:func:`VIDIOC_S_EXT_CTRLS` may be used interchangeably with
> +   :c:func:`VIDIOC_G_CTRL`, :c:func:`VIDIOC_S_CTRL`, unless specified otherwise.
> +
> +5. Single-plane API (see spec) and applicable structures may be used
> +   interchangeably with Multi-plane API, unless specified otherwise.
> +
> +6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
> +   [0..2]: i = 0, 1, 2.
> +
> +7. For OUTPUT buffer A, A’ represents a buffer on the CAPTURE queue
> +   containing data (decoded or encoded frame/stream) that resulted
> +   from processing buffer A.
> +
> +Glossary
> +========
> +
> +CAPTURE
> +   the destination buffer queue, decoded frames for
> +   decoders, encoded bitstream for encoders;
> +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE`` or
> +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``
> +
> +client
> +   application client communicating with the driver
> +   implementing this API
> +
> +coded format
> +   encoded/compressed video bitstream format (e.g.
> +   H.264, VP8, etc.); see raw format; this is not equivalent to fourcc
> +   (V4L2 pixelformat), as each coded format may be supported by multiple
> +   fourccs (e.g. ``V4L2_PIX_FMT_H264``, ``V4L2_PIX_FMT_H264_SLICE``, etc.)
> +
> +coded height
> +   height for given coded resolution
> +
> +coded resolution
> +   stream resolution in pixels aligned to codec
> +   format and hardware requirements; see also visible resolution
> +
> +coded width
> +   width for given coded resolution
> +
> +decode order
> +   the order in which frames are decoded; may differ
> +   from display (output) order if frame reordering (B frames) is active in
> +   the stream; OUTPUT buffers must be queued in decode order; for frame
> +   API, CAPTURE buffers must be returned by the driver in decode order;
> +
> +display order
> +   the order in which frames must be displayed
> +   (outputted); for stream API, CAPTURE buffers must be returned by the
> +   driver in display order;
> +
> +EOS
> +   end of stream
> +
> +input height
> +   height in pixels for given input resolution
> +
> +input resolution
> +   resolution in pixels of source frames being input
> +   to the encoder and subject to further cropping to the bounds of visible
> +   resolution
> +
> +input width
> +   width in pixels for given input resolution
> +
> +OUTPUT
> +   the source buffer queue, encoded bitstream for
> +   decoders, raw frames for encoders; ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
> +   ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``
> +
> +raw format
> +   uncompressed format containing raw pixel data (e.g.
> +   YUV, RGB formats)
> +
> +resume point
> +   a point in the bitstream from which decoding may
> +   start/continue, without any previous state/data present, e.g.: a
> +   keyframe (VPX) or SPS/PPS/IDR sequence (H.264); a resume point is
> +   required to start decode of a new stream, or to resume decoding after a
> +   seek;
> +
> +source buffer
> +   buffers allocated for source queue
> +
> +source queue
> +   queue containing buffers used for source data, i.e.
> +
> +visible height
> +   height for given visible resolution
> +
> +visible resolution
> +   stream resolution of the visible picture, in
> +   pixels, to be used for display purposes; must be smaller or equal to
> +   coded resolution;
> +
> +visible width
> +   width for given visible resolution
> +
> +Decoder
> +=======
> +
> +Querying capabilities
> +---------------------
> +
> +1. To enumerate the set of coded formats supported by the driver, the
> +   client uses :c:func:`VIDIOC_ENUM_FMT` for OUTPUT. The driver must always
> +   return the full set of supported formats, irrespective of the
> +   format set on the CAPTURE queue.
> +
> +2. To enumerate the set of supported raw formats, the client uses
> +   :c:func:`VIDIOC_ENUM_FMT` for CAPTURE. The driver must return only the
> +   formats supported for the format currently set on the OUTPUT
> +   queue.
> +   In order to enumerate raw formats supported by a given coded
> +   format, the client must first set that coded format on the
> +   OUTPUT queue and then enumerate the CAPTURE queue.
> +
> +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> +   resolutions for a given format, passing its fourcc in
> +   :c:type:`v4l2_frmivalenum` ``pixel_format``.
> +
> +   a. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for coded formats
> +      must be maximums for given coded format for all supported raw
> +      formats.
> +
> +   b. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for raw formats must
> +      be maximums for given raw format for all supported coded
> +      formats.

So in both these cases you expect index=0 to return a response with
the type V4L2_FRMSIZE_TYPE_DISCRETE, and the maximum resolution?
-EINVAL on any other index value?
And I assume you mean maximum coded resolution, not visible resolution.
Or is V4L2_FRMSIZE_TYPE_STEPWISE more appropriate? In which case the
minimum is presumably a single macroblock, max is the max coded
resolution, and step size is the macroblock size, at least on the
CAPTURE side.

> +   c. The client should derive the supported resolution for a
> +      combination of coded+raw format by calculating the
> +      intersection of resolutions returned from calls to
> +      :c:func:`VIDIOC_ENUM_FRAMESIZES` for the given coded and raw formats.
> +
> +4. Supported profiles and levels for given format, if applicable, may be
> +   queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> +
> +5. The client may use :c:func:`VIDIOC_ENUM_FRAMEINTERVALS` to enumerate maximum
> +   supported framerates by the driver/hardware for a given
> +   format+resolution combination.
> +
> +Initialization sequence
> +-----------------------
> +
> +1. (optional) Enumerate supported OUTPUT formats and resolutions. See
> +   capability enumeration.
> +
> +2. Set a coded format on the source queue via :c:func:`VIDIOC_S_FMT`
> +
> +   a. Required fields:
> +
> +      i.   type = OUTPUT
> +
> +      ii.  fmt.pix_mp.pixelformat set to a coded format
> +
> +      iii. fmt.pix_mp.width, fmt.pix_mp.height only if cannot be
> +           parsed from the stream for the given coded format;
> +           ignored otherwise;
> +
> +   b. Return values:
> +
> +      i.  EINVAL: unsupported format.
> +
> +      ii. Others: per spec
> +
> +   .. note::
> +
> +      The driver must not adjust pixelformat, so if
> +      ``V4L2_PIX_FMT_H264`` is passed but only
> +      ``V4L2_PIX_FMT_H264_SLICE`` is supported, S_FMT will return
> +      -EINVAL. If both are acceptable by client, calling S_FMT for
> +      the other after one gets rejected may be required (or use
> +      :c:func:`VIDIOC_ENUM_FMT` to discover beforehand, see Capability
> +      enumeration).

I can't find V4L2_PIX_FMT_H264_SLICE in mainline. From trying to build
Chromium I believe it's a Rockchip special. Is it being upstreamed?
Or use V4L2_PIX_FMT_H264 vs V4L2_PIX_FMT_H264_NO_SC as the example?
(I've just noticed I missed an instance of this further up as well).

> +3.  (optional) Get minimum number of buffers required for OUTPUT queue
> +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> +    more buffers than minimum required by hardware/format (see
> +    allocation).
> +
> +    a. Required fields:
> +
> +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> +
> +    b. Return values: per spec.
> +
> +    c. Return fields:
> +
> +       i. value: required number of OUTPUT buffers for the currently set
> +          format;
> +
> +4.  Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on OUTPUT
> +    queue.
> +
> +    a. Required fields:
> +
> +       i.   count = n, where n > 0.
> +
> +       ii.  type = OUTPUT
> +
> +       iii. memory = as per spec
> +
> +    b. Return values: Per spec.
> +
> +    c. Return fields:
> +
> +       i. count: adjusted to allocated number of buffers
> +
> +    d. The driver must adjust count to minimum of required number of
> +       source buffers for given format and count passed. The client
> +       must check this value after the ioctl returns to get the
> +       number of buffers allocated.
> +
> +    .. note::
> +
> +       Passing count = 1 is useful for letting the driver choose
> +       the minimum according to the selected format/hardware
> +       requirements.
> +
> +    .. note::
> +
> +       To allocate more than minimum number of buffers (for pipeline
> +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT)`` to
> +       get minimum number of buffers required by the driver/format,
> +       and pass the obtained value plus the number of additional
> +       buffers needed in count to :c:func:`VIDIOC_REQBUFS`.
> +
> +5.  Begin parsing the stream for stream metadata via :c:func:`VIDIOC_STREAMON` on
> +    OUTPUT queue. This step allows the driver to parse/decode
> +    initial stream metadata until enough information to allocate
> +    CAPTURE buffers is found. This is indicated by the driver by
> +    sending a ``V4L2_EVENT_SOURCE_CHANGE`` event, which the client
> +    must handle.
> +
> +    a. Required fields: as per spec.
> +
> +    b. Return values: as per spec.
> +
> +    .. note::
> +
> +       Calling :c:func:`VIDIOC_REQBUFS`, :c:func:`VIDIOC_STREAMON`
> +       or :c:func:`VIDIOC_G_FMT` on the CAPTURE queue at this time is not
> +       allowed and must return EINVAL.

I think you've just broken FFMpeg and Gstreamer with that statement.

Gstreamer certainly doesn't subscribe to V4L2_EVENT_SOURCE_CHANGE but
has already parsed the stream and set the output format to the correct
resolution via S_FMT. IIRC it expects the driver to copy that across
from output to capture which was an interesting niggle to find.
FFMpeg does subscribe to V4L2_EVENT_SOURCE_CHANGE, although it seems
to currently have a bug around coded resolution != visible resolution
when it gets the event.

One has to assume that these have been working quite happily against
various hardware platforms, so it seems a little unfair to just break
them.

So I guess my question is what is the reasoning for rejecting these
calls? If you know the resolution ahead of time, allocate buffers, and
start CAPTURE streaming before the event then should you be wrong
you're just going through the dynamic resolution change path described
later. If you're correct then you've saved some setup time. It also
avoids having to have a special startup case in the driver.

> +6.  This step only applies for coded formats that contain resolution
> +    information in the stream.
> +    Continue queuing/dequeuing bitstream buffers to/from the
> +    OUTPUT queue via :c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`. The driver
> +    must keep processing and returning each buffer to the client
> +    until required metadata to send a ``V4L2_EVENT_SOURCE_CHANGE``
> +    for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION`` is
> +    found. There is no requirement to pass enough data for this to
> +    occur in the first buffer and the driver must be able to
> +    process any number

So back to my earlier question, we're supporting tiny fragments of
frames here? Or is the thought that you can pick up anywhere in a
stream and the decoder will wait for the required resume point?

> +    a. Required fields: as per spec.
> +
> +    b. Return values: as per spec.
> +
> +    c. If data in a buffer that triggers the event is required to decode
> +       the first frame, the driver must not return it to the client,
> +       but must retain it for further decoding.
> +
> +    d. Until the resolution source event is sent to the client, calling
> +       :c:func:`VIDIOC_G_FMT` on the CAPTURE queue must return -EINVAL.
> +
> +    .. note::
> +
> +       No decoded frames are produced during this phase.
> +
> +7.  This step only applies for coded formats that contain resolution
> +    information in the stream.
> +    Receive and handle ``V4L2_EVENT_SOURCE_CHANGE`` from the driver
> +    via :c:func:`VIDIOC_DQEVENT`. The driver must send this event once
> +    enough data is obtained from the stream to allocate CAPTURE
> +    buffers and to begin producing decoded frames.
> +
> +    a. Required fields:
> +
> +       i. type = ``V4L2_EVENT_SOURCE_CHANGE``
> +
> +    b. Return values: as per spec.
> +
> +    c. The driver must return u.src_change.changes =
> +       ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> +
> +8.  This step only applies for coded formats that contain resolution
> +    information in the stream.
> +    Call :c:func:`VIDIOC_G_FMT` for CAPTURE queue to get format for the
> +    destination buffers parsed/decoded from the bitstream.
> +
> +    a. Required fields:
> +
> +       i. type = CAPTURE
> +
> +    b. Return values: as per spec.
> +
> +    c. Return fields:
> +
> +       i.   fmt.pix_mp.width, fmt.pix_mp.height: coded resolution
> +            for the decoded frames
> +
> +       ii.  fmt.pix_mp.pixelformat: default/required/preferred by
> +            driver pixelformat for decoded frames.
> +
> +       iii. num_planes: set to number of planes for pixelformat.
> +
> +       iv.  For each plane p = [0, num_planes-1]:
> +            plane_fmt[p].sizeimage, plane_fmt[p].bytesperline as
> +            per spec for coded resolution.
> +
> +    .. note::
> +
> +       Te value of pixelformat may be any pixel format supported,

s/Te/The

> +       and must
> +       be supported for current stream, based on the information
> +       parsed from the stream and hardware capabilities. It is
> +       suggested that driver chooses the preferred/optimal format
> +       for given configuration. For example, a YUV format may be
> +       preferred over an RGB format, if additional conversion step
> +       would be required.
> +
> +9.  (optional) Enumerate CAPTURE formats via :c:func:`VIDIOC_ENUM_FMT` on
> +    CAPTURE queue.
> +    Once the stream information is parsed and known, the client
> +    may use this ioctl to discover which raw formats are supported
> +    for given stream and select on of them via :c:func:`VIDIOC_S_FMT`.
> +
> +    a. Fields/return values as per spec.
> +
> +    .. note::
> +
> +       The driver must return only formats supported for the
> +       current stream parsed in this initialization sequence, even
> +       if more formats may be supported by the driver in general.
> +       For example, a driver/hardware may support YUV and RGB
> +       formats for resolutions 1920x1088 and lower, but only YUV for
> +       higher resolutions (e.g. due to memory bandwidth
> +       limitations). After parsing a resolution of 1920x1088 or
> +       lower, :c:func:`VIDIOC_ENUM_FMT` may return a set of YUV and RGB
> +       pixelformats, but after parsing resolution higher than
> +       1920x1088, the driver must not return (unsupported for this
> +       resolution) RGB.

There are some funny cases here then.
Whilst memory bandwidth may limit the resolution that can be decoded
in real-time, for a transcode use case you haven't got a real-time
requirement. Enforcing this means you can never transcode that
resolution to RGB.
Actually I can't see any information related to frame rates being
passed in other than timestamps, therefore the driver hasn't got
sufficient information to make a sensible call based on memory
bandwidth.
Perhaps it's just that the example of memory bandwidth being the
limitation is a bad one.

> +       However, subsequent resolution change event
> +       triggered after discovering a resolution change within the
> +       same stream may switch the stream into a lower resolution;
> +       :c:func:`VIDIOC_ENUM_FMT` must return RGB formats again in that case.
> +
> +10.  (optional) Choose a different CAPTURE format than suggested via
> +     :c:func:`VIDIOC_S_FMT` on CAPTURE queue. It is possible for the client
> +     to choose a different format than selected/suggested by the
> +     driver in :c:func:`VIDIOC_G_FMT`.
> +
> +     a. Required fields:
> +
> +        i.  type = CAPTURE
> +
> +        ii. fmt.pix_mp.pixelformat set to a coded format
> +
> +     b. Return values:
> +
> +        i. EINVAL: unsupported format.
> +
> +     c. Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available formats
> +        after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to find
> +        out a set of allowed pixelformats for given configuration,
> +        but not required.
> +
> +11.  (optional) Acquire visible resolution via :c:func:`VIDIOC_G_SELECTION`.
> +
> +    a. Required fields:
> +
> +       i.  type = CAPTURE
> +
> +       ii. target = ``V4L2_SEL_TGT_CROP``
> +
> +    b. Return values: per spec.
> +
> +    c. Return fields
> +
> +       i. r.left, r.top, r.width, r.height: visible rectangle; this must
> +          fit within coded resolution returned from :c:func:`VIDIOC_G_FMT`.
> +
> +12. (optional) Get minimum number of buffers required for CAPTURE queue
> +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> +    more buffers than minimum required by hardware/format (see
> +    allocation).
> +
> +    a. Required fields:
> +
> +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
> +
> +    b. Return values: per spec.
> +
> +    c. Return fields:
> +
> +       i. value: minimum number of buffers required to decode the stream
> +          parsed in this initialization sequence.
> +
> +    .. note::
> +
> +       Note that the minimum number of buffers must be at least the
> +       number required to successfully decode the current stream.
> +       This may for example be the required DPB size for an H.264
> +       stream given the parsed stream configuration (resolution,
> +       level).
> +
> +13. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS` on the
> +    CAPTURE queue.
> +
> +    a. Required fields:
> +
> +       i.   count = n, where n > 0.
> +
> +       ii.  type = CAPTURE
> +
> +       iii. memory = as per spec
> +
> +    b. Return values: Per spec.
> +
> +    c. Return fields:
> +
> +       i. count: adjusted to allocated number of buffers.
> +
> +    d. The driver must adjust count to minimum of required number of
> +       destination buffers for given format and stream configuration
> +       and the count passed. The client must check this value after
> +       the ioctl returns to get the number of buffers allocated.
> +
> +    .. note::
> +
> +       Passing count = 1 is useful for letting the driver choose
> +       the minimum.
> +
> +    .. note::
> +
> +       To allocate more than minimum number of buffers (for pipeline
> +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE)`` to
> +       get minimum number of buffers required, and pass the obtained
> +       value plus the number of additional buffers needed in count
> +       to :c:func:`VIDIOC_REQBUFS`.
> +
> +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
> +
> +    a. Required fields: as per spec.
> +
> +    b. Return values: as per spec.
> +
> +Decoding
> +--------
> +
> +This state is reached after a successful initialization sequence. In
> +this state, client queues and dequeues buffers to both queues via
> +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, as per spec.
> +
> +Both queues operate independently. The client may queue and dequeue
> +buffers to queues in any order and at any rate, also at a rate different
> +for each queue. The client may queue buffers within the same queue in
> +any order (V4L2 index-wise). It is recommended for the client to operate
> +the queues independently for best performance.

Only recommended sounds like a great case for clients to treat codecs
as one-in one-out, and then fall over if you get extra header byte
frames in the stream.

> +Source OUTPUT buffers must contain:
> +
> +-  H.264/AVC: one or more complete NALUs of an Annex B elementary
> +   stream; one buffer does not have to contain enough data to decode
> +   a frame;

This appears to be answering my earlier question, but doesn't it
belong in the definition of V4L2_PIX_FMT_H264 rather than buried in
the codec description?
I'm OK with that choice, but you are closing off the use case of
effectively cat'ing an ES into the codec to be decoded.

There's the other niggle of how to specify sizeimage in the
pixelformat for compressed data. I have never seen a satisfactory
answer in most of the APIs I've encountered (*). How big can an
I-frame be in a random stream? It may be a very badly coded stream,
but if other decoders can cope, then it's the decoder that can't which
will be seen to be buggy.

(* ) OpenMAX IL is the exception as you can pass partial frames with
appropriate values in nFlags. Not many other positives one can say
about IL though.

> +-  VP8/VP9: one or more complete frames.
> +
> +No direct relationship between source and destination buffers and the
> +timing of buffers becoming available to dequeue should be assumed in the
> +Stream API. Specifically:
> +
> +-  a buffer queued to OUTPUT queue may result in no buffers being
> +   produced on the CAPTURE queue (e.g. if it does not contain
> +   encoded data, or if only metadata syntax structures are present
> +   in it), or one or more buffers produced on the CAPTURE queue (if
> +   the encoded data contained more than one frame, or if returning a
> +   decoded frame allowed the driver to return a frame that preceded
> +   it in decode, but succeeded it in display order)
> +
> +-  a buffer queued to OUTPUT may result in a buffer being produced on
> +   the CAPTURE queue later into decode process, and/or after
> +   processing further OUTPUT buffers, or be returned out of order,
> +   e.g. if display reordering is used
> +
> +-  buffers may become available on the CAPTURE queue without additional
> +   buffers queued to OUTPUT (e.g. during flush or EOS)
> +
> +Seek
> +----
> +
> +Seek is controlled by the OUTPUT queue, as it is the source of bitstream
> +data. CAPTURE queue remains unchanged/unaffected.
> +
> +1. Stop the OUTPUT queue to begin the seek sequence via
> +   :c:func:`VIDIOC_STREAMOFF`.
> +
> +   a. Required fields:
> +
> +      i. type = OUTPUT
> +
> +   b. The driver must drop all the pending OUTPUT buffers and they are
> +      treated as returned to the client (as per spec).
> +
> +2. Restart the OUTPUT queue via :c:func:`VIDIOC_STREAMON`
> +
> +   a. Required fields:
> +
> +      i. type = OUTPUT
> +
> +   b. The driver must be put in a state after seek and be ready to
> +      accept new source bitstream buffers.
> +
> +3. Start queuing buffers to OUTPUT queue containing stream data after
> +   the seek until a suitable resume point is found.
> +
> +   .. note::
> +
> +      There is no requirement to begin queuing stream
> +      starting exactly from a resume point (e.g. SPS or a keyframe).
> +      The driver must handle any data queued and must keep processing
> +      the queued buffers until it finds a suitable resume point.
> +      While looking for a resume point, the driver processes OUTPUT
> +      buffers and returns them to the client without producing any
> +      decoded frames.
> +
> +4. After a resume point is found, the driver will start returning
> +   CAPTURE buffers with decoded frames.
> +
> +   .. note::
> +
> +      There is no precise specification for CAPTURE queue of when it
> +      will start producing buffers containing decoded data from
> +      buffers queued after the seek, as it operates independently
> +      from OUTPUT queue.
> +
> +      -  The driver is allowed to and may return a number of remaining CAPTURE
> +         buffers containing decoded frames from before the seek after the
> +         seek sequence (STREAMOFF-STREAMON) is performed.
> +
> +      -  The driver is also allowed to and may not return all decoded frames
> +         queued but not decode before the seek sequence was initiated.
> +         E.g. for an OUTPUT queue sequence: QBUF(A), QBUF(B),
> +         STREAMOFF(OUT), STREAMON(OUT), QBUF(G), QBUF(H), any of the
> +         following results on the CAPTURE queue is allowed: {A’, B’, G’,
> +         H’}, {A’, G’, H’}, {G’, H’}.
> +
> +Pause
> +-----
> +
> +In order to pause, the client should just cease queuing buffers onto the
> +OUTPUT queue. This is different from the general V4L2 API definition of
> +pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue. Without
> +source bitstream data, there is not data to process and the hardware

s/not/no

> +remains idle. Conversely, using :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue
> +indicates a seek, which 1) drops all buffers in flight and 2) after a
> +subsequent :c:func:`VIDIOC_STREAMON` will look for and only continue from a
> +resume point. This is usually undesirable for pause. The
> +STREAMOFF-STREAMON sequence is intended for seeking.
> +
> +Similarly, CAPTURE queue should remain streaming as well, as the
> +STREAMOFF-STREAMON sequence on it is intended solely for changing buffer
> +sets
> +
> +Dynamic resolution change
> +-------------------------
> +
> +When driver encounters a resolution change in the stream, the dynamic
> +resolution change sequence is started.
> +
> +1.  On encountering a resolution change in the stream. The driver must
> +    first process and decode all remaining buffers from before the
> +    resolution change point.
> +
> +2.  After all buffers containing decoded frames from before the
> +    resolution change point are ready to be dequeued on the
> +    CAPTURE queue, the driver sends a ``V4L2_EVENT_SOURCE_CHANGE``
> +    event for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> +    The last buffer from before the change must be marked with
> +    :c:type:`v4l2_buffer` ``flags`` flag ``V4L2_BUF_FLAG_LAST`` as in the flush
> +    sequence.

How does the driver ensure the last buffer gets that flag? You may not
have had the new header bytes queued to the OUTPUT queue before the
previous frame has been decoded and dequeued on the CAPTURE queue.
Empty buffer with the flag set?

> +    .. note::
> +
> +       Any attempts to dequeue more buffers beyond the buffer marked
> +       with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
> +       :c:func:`VIDIOC_DQBUF`.
> +
> +3.  After dequeuing all remaining buffers from the CAPTURE queue, the
> +    client must call :c:func:`VIDIOC_STREAMOFF` on the CAPTURE queue. The
> +    OUTPUT queue remains streaming (calling STREAMOFF on it would
> +    trigger a seek).
> +    Until STREAMOFF is called on the CAPTURE queue (acknowledging
> +    the event), the driver operates as if the resolution hasn’t
> +    changed yet, i.e. :c:func:`VIDIOC_G_FMT`, etc. return previous
> +    resolution.
> +
> +4.  The client frees the buffers on the CAPTURE queue using
> +    :c:func:`VIDIOC_REQBUFS`.
> +
> +    a. Required fields:
> +
> +       i.   count = 0
> +
> +       ii.  type = CAPTURE
> +
> +       iii. memory = as per spec
> +
> +5.  The client calls :c:func:`VIDIOC_G_FMT` for CAPTURE to get the new format
> +    information.
> +    This is identical to calling :c:func:`VIDIOC_G_FMT` after
> +    ``V4L2_EVENT_SRC_CH_RESOLUTION`` in the initialization
> +    sequence and should be handled similarly.
> +
> +    .. note::
> +
> +       It is allowed for the driver not to support the same
> +       pixelformat as previously used (before the resolution change)
> +       for the new resolution. The driver must select a default
> +       supported pixelformat and return it from :c:func:`VIDIOC_G_FMT`, and
> +       client must take note of it.
> +
> +6.  (optional) The client is allowed to enumerate available formats and
> +    select a different one than currently chosen (returned via
> +    :c:func:`VIDIOC_G_FMT)`. This is identical to a corresponding step in
> +    the initialization sequence.
> +
> +7.  (optional) The client acquires visible resolution as in
> +    initialization sequence.
> +
> +8.  (optional) The client acquires minimum number of buffers as in
> +    initialization sequence.
> +
> +9.  The client allocates a new set of buffers for the CAPTURE queue via
> +    :c:func:`VIDIOC_REQBUFS`. This is identical to a corresponding step in
> +    the initialization sequence.
> +
> +10. The client resumes decoding by issuing :c:func:`VIDIOC_STREAMON` on the
> +    CAPTURE queue.
> +
> +During the resolution change sequence, the OUTPUT queue must remain
> +streaming. Calling :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue will initiate seek.
> +
> +The OUTPUT queue operates separately from the CAPTURE queue for the
> +duration of the entire resolution change sequence. It is allowed (and
> +recommended for best performance and simplcity) for the client to keep
> +queuing/dequeuing buffers from/to OUTPUT queue even while processing
> +this sequence.
> +
> +.. note::
> +
> +   It is also possible for this sequence to be triggered without
> +   change in resolution if a different number of CAPTURE buffers is
> +   required in order to continue decoding the stream.
> +
> +Flush
> +-----
> +
> +Flush is the process of draining the CAPTURE queue of any remaining
> +buffers. After the flush sequence is complete, the client has received
> +all decoded frames for all OUTPUT buffers queued before the sequence was
> +started.
> +
> +1. Begin flush by issuing :c:func:`VIDIOC_DECODER_CMD`.
> +
> +   a. Required fields:
> +
> +      i. cmd = ``V4L2_DEC_CMD_STOP``
> +
> +2. The driver must process and decode as normal all OUTPUT buffers
> +   queued by the client before the :c:func:`VIDIOC_DECODER_CMD` was
> +   issued.
> +   Any operations triggered as a result of processing these
> +   buffers (including the initialization and resolution change
> +   sequences) must be processed as normal by both the driver and
> +   the client before proceeding with the flush sequence.
> +
> +3. Once all OUTPUT buffers queued before ``V4L2_DEC_CMD_STOP`` are
> +   processed:
> +
> +   a. If the CAPTURE queue is streaming, once all decoded frames (if
> +      any) are ready to be dequeued on the CAPTURE queue, the
> +      driver must send a ``V4L2_EVENT_EOS``. The driver must also
> +      set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer` ``flags`` field on the
> +      buffer on the CAPTURE queue containing the last frame (if
> +      any) produced as a result of processing the OUTPUT buffers
> +      queued before ``V4L2_DEC_CMD_STOP``. If no more frames are
> +      left to be returned at the point of handling
> +      ``V4L2_DEC_CMD_STOP``, the driver must return an empty buffer
> +      (with :c:type:`v4l2_buffer` ``bytesused`` = 0) as the last buffer with
> +      ``V4L2_BUF_FLAG_LAST`` set instead.
> +      Any attempts to dequeue more buffers beyond the buffer
> +      marked with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE
> +      error from :c:func:`VIDIOC_DQBUF`.

I guess that answers my earlier question on resolution change when
there are no CAPTURE buffers left to be delivered.

> +   b. If the CAPTURE queue is NOT streaming, no action is necessary for
> +      CAPTURE queue and the driver must send a ``V4L2_EVENT_EOS``
> +      immediately after all OUTPUT buffers in question have been
> +      processed.
> +
> +4. To resume, client may issue ``V4L2_DEC_CMD_START``.
> +
> +End of stream
> +-------------
> +
> +When an explicit end of stream is encountered by the driver in the
> +stream, it must send a ``V4L2_EVENT_EOS`` to the client after all frames
> +are decoded and ready to be dequeued on the CAPTURE queue, with the
> +:c:type:`v4l2_buffer` ``flags`` set to ``V4L2_BUF_FLAG_LAST``. This behavior is
> +identical to the flush sequence as if triggered by the client via
> +``V4L2_DEC_CMD_STOP``.
> +
> +Commit points
> +-------------
> +
> +Setting formats and allocating buffers triggers changes in the behavior
> +of the driver.
> +
> +1. Setting format on OUTPUT queue may change the set of formats
> +   supported/advertised on the CAPTURE queue. It also must change
> +   the format currently selected on CAPTURE queue if it is not
> +   supported by the newly selected OUTPUT format to a supported one.
> +
> +2. Enumerating formats on CAPTURE queue must only return CAPTURE formats
> +   supported for the OUTPUT format currently set.
> +
> +3. Setting/changing format on CAPTURE queue does not change formats
> +   available on OUTPUT queue. An attempt to set CAPTURE format that
> +   is not supported for the currently selected OUTPUT format must
> +   result in an error (-EINVAL) from :c:func:`VIDIOC_S_FMT`.
> +
> +4. Enumerating formats on OUTPUT queue always returns a full set of
> +   supported formats, irrespective of the current format selected on
> +   CAPTURE queue.
> +
> +5. After allocating buffers on the OUTPUT queue, it is not possible to
> +   change format on it.
> +
> +To summarize, setting formats and allocation must always start with the
> +OUTPUT queue and the OUTPUT queue is the master that governs the set of
> +supported formats for the CAPTURE queue.
> diff --git a/Documentation/media/uapi/v4l/v4l2.rst b/Documentation/media/uapi/v4l/v4l2.rst
> index b89e5621ae69..563d5b861d1c 100644
> --- a/Documentation/media/uapi/v4l/v4l2.rst
> +++ b/Documentation/media/uapi/v4l/v4l2.rst
> @@ -53,6 +53,10 @@ Authors, in alphabetical order:
>
>    - Original author of the V4L2 API and documentation.
>
> +- Figa, Tomasz <tfiga@chromium.org>
> +
> +  - Documented parts of the V4L2 (stateful) Codec Interface. Migrated from Google Docs to kernel documentation.
> +
>  - H Schimek, Michael <mschimek@gmx.at>
>
>    - Original author of the V4L2 API and documentation.
> @@ -65,6 +69,10 @@ Authors, in alphabetical order:
>
>    - Designed and documented the multi-planar API.
>
> +- Osciak, Pawel <posciak@chromium.org>
> +
> +  - Documented the V4L2 (stateful) Codec Interface.
> +
>  - Palosaari, Antti <crope@iki.fi>
>
>    - SDR API.
> @@ -85,7 +93,7 @@ Authors, in alphabetical order:
>
>    - Designed and documented the VIDIOC_LOG_STATUS ioctl, the extended control ioctls, major parts of the sliced VBI API, the MPEG encoder and decoder APIs and the DV Timings API.
>
> -**Copyright** |copy| 1999-2016: Bill Dirks, Michael H. Schimek, Hans Verkuil, Martin Rubli, Andy Walls, Muralidharan Karicheri, Mauro Carvalho Chehab, Pawel Osciak, Sakari Ailus & Antti Palosaari.
> +**Copyright** |copy| 1999-2018: Bill Dirks, Michael H. Schimek, Hans Verkuil, Martin Rubli, Andy Walls, Muralidharan Karicheri, Mauro Carvalho Chehab, Pawel Osciak, Sakari Ailus & Antti Palosaari, Tomasz Figa.
>
>  Except when explicitly stated as GPL, programming examples within this
>  part can be used and distributed without restrictions.
> @@ -94,6 +102,10 @@ part can be used and distributed without restrictions.
>  Revision History
>  ****************
>
> +:revision: TBD / TBD (*tf*)
> +
> +Add specification of V4L2 Codec Interface UAPI.
> +
>  :revision: 4.10 / 2016-07-15 (*rr*)
>
>  Introduce HSV formats.
> --
> 2.17.1.1185.g55be947832-goog

Related to an earlier comment, whilst the driver has to support
multiple instances, there is no arbitration over the overall decode
rate with regard real-time performance.
I know our hardware is capable of 1080P60, but there's no easy way to
stop someone trying to decode 2 1080P60 streams simultaneously. From a
software perspective it'll do it, but not in real-time. I'd assume
most other platforms will give the similar behaviour.
Is it worth adding a note that real-time performance is not guaranteed
should multiple instances be running simultaneously, or a comment made
somewhere about expected performance? Or enforce it by knowing the max
data rates and analysing the level of each stream (please no)?

Thanks,
  Dave
Tomasz Figa June 5, 2018, 1:42 p.m. UTC | #3
Hi Philipp,

Thanks a lot for review.

On Tue, Jun 5, 2018 at 8:41 PM Philipp Zabel <p.zabel@pengutronix.de> wrote:
>
> Hi Tomasz,
>
> On Tue, 2018-06-05 at 19:33 +0900, Tomasz Figa wrote:
> > Due to complexity of the video decoding process, the V4L2 drivers of
> > stateful decoder hardware require specific sequencies of V4L2 API calls
> > to be followed. These include capability enumeration, initialization,
> > decoding, seek, pause, dynamic resolution change, flush and end of
> > stream.
> >
> > Specifics of the above have been discussed during Media Workshops at
> > LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> > Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> > originated at those events was later implemented by the drivers we already
> > have merged in mainline, such as s5p-mfc or mtk-vcodec.
> >
> > The only thing missing was the real specification included as a part of
> > Linux Media documentation. Fix it now and document the decoder part of
> > the Codec API.
> >
> > Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> > ---
> >  Documentation/media/uapi/v4l/dev-codec.rst | 771 +++++++++++++++++++++
> >  Documentation/media/uapi/v4l/v4l2.rst      |  14 +-
> >  2 files changed, 784 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> > index c61e938bd8dc..0483b10c205e 100644
> > --- a/Documentation/media/uapi/v4l/dev-codec.rst
> > +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> > @@ -34,3 +34,774 @@ the codec and reprogram it whenever another file handler gets access.
> >  This is different from the usual video node behavior where the video
> >  properties are global to the device (i.e. changing something through one
> >  file handle is visible through another file handle).
> > +
> > +This interface is generally appropriate for hardware that does not
> > +require additional software involvement to parse/partially decode/manage
> > +the stream before/after processing in hardware.
> > +
> > +Input data to the Stream API are buffers containing unprocessed video
> > +stream (Annex-B H264/H265 stream, raw VP8/9 stream) only. The driver is
> > +expected not to require any additional information from the client to
> > +process these buffers, and to return decoded frames on the CAPTURE queue
> > +in display order.
> > +
> > +Performing software parsing, processing etc. of the stream in the driver
> > +in order to support stream API is strongly discouraged. In such case use
> > +of Stateless Codec Interface (in development) is preferred.
> > +
> > +Conventions and notation used in this document
> > +==============================================
> > +
> > +1. The general V4L2 API rules apply if not specified in this document
> > +   otherwise.
> > +
> > +2. The meaning of words “must”, “may”, “should”, etc. is as per RFC
> > +   2119.
> > +
> > +3. All steps not marked “optional” are required.
> > +
> > +4. :c:func:`VIDIOC_G_EXT_CTRLS`, :c:func:`VIDIOC_S_EXT_CTRLS` may be used interchangeably with
> > +   :c:func:`VIDIOC_G_CTRL`, :c:func:`VIDIOC_S_CTRL`, unless specified otherwise.
> > +
> > +5. Single-plane API (see spec) and applicable structures may be used
> > +   interchangeably with Multi-plane API, unless specified otherwise.
> > +
> > +6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
> > +   [0..2]: i = 0, 1, 2.
> > +
> > +7. For OUTPUT buffer A, A’ represents a buffer on the CAPTURE queue
> > +   containing data (decoded or encoded frame/stream) that resulted
> > +   from processing buffer A.
> > +
> > +Glossary
> > +========
> > +
> > +CAPTURE
> > +   the destination buffer queue, decoded frames for
> > +   decoders, encoded bitstream for encoders;
> > +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE`` or
> > +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``
> > +
> > +client
> > +   application client communicating with the driver
> > +   implementing this API
> > +
> > +coded format
> > +   encoded/compressed video bitstream format (e.g.
> > +   H.264, VP8, etc.); see raw format; this is not equivalent to fourcc
> > +   (V4L2 pixelformat), as each coded format may be supported by multiple
> > +   fourccs (e.g. ``V4L2_PIX_FMT_H264``, ``V4L2_PIX_FMT_H264_SLICE``, etc.)
> > +
> > +coded height
> > +   height for given coded resolution
> > +
> > +coded resolution
> > +   stream resolution in pixels aligned to codec
> > +   format and hardware requirements; see also visible resolution
> > +
> > +coded width
> > +   width for given coded resolution
> > +
> > +decode order
> > +   the order in which frames are decoded; may differ
> > +   from display (output) order if frame reordering (B frames) is active in
> > +   the stream; OUTPUT buffers must be queued in decode order; for frame
> > +   API, CAPTURE buffers must be returned by the driver in decode order;
> > +
> > +display order
> > +   the order in which frames must be displayed
> > +   (outputted); for stream API, CAPTURE buffers must be returned by the
> > +   driver in display order;
> > +
> > +EOS
> > +   end of stream
> > +
> > +input height
> > +   height in pixels for given input resolution
> > +
> > +input resolution
> > +   resolution in pixels of source frames being input
> > +   to the encoder and subject to further cropping to the bounds of visible
> > +   resolution
> > +
> > +input width
> > +   width in pixels for given input resolution
> > +
> > +OUTPUT
> > +   the source buffer queue, encoded bitstream for
> > +   decoders, raw frames for encoders; ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
> > +   ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``
> > +
> > +raw format
> > +   uncompressed format containing raw pixel data (e.g.
> > +   YUV, RGB formats)
> > +
> > +resume point
> > +   a point in the bitstream from which decoding may
> > +   start/continue, without any previous state/data present, e.g.: a
> > +   keyframe (VPX) or SPS/PPS/IDR sequence (H.264); a resume point is
> > +   required to start decode of a new stream, or to resume decoding after a
> > +   seek;
> > +
> > +source buffer
> > +   buffers allocated for source queue
> > +
> > +source queue
> > +   queue containing buffers used for source data, i.e.
> > +
> > +visible height
> > +   height for given visible resolution
> > +
> > +visible resolution
> > +   stream resolution of the visible picture, in
> > +   pixels, to be used for display purposes; must be smaller or equal to
> > +   coded resolution;
> > +
> > +visible width
> > +   width for given visible resolution
> > +
> > +Decoder
> > +=======
> > +
> > +Querying capabilities
> > +---------------------
> > +
> > +1. To enumerate the set of coded formats supported by the driver, the
> > +   client uses :c:func:`VIDIOC_ENUM_FMT` for OUTPUT. The driver must always
> > +   return the full set of supported formats, irrespective of the
> > +   format set on the CAPTURE queue.
> > +
> > +2. To enumerate the set of supported raw formats, the client uses
> > +   :c:func:`VIDIOC_ENUM_FMT` for CAPTURE. The driver must return only the
> > +   formats supported for the format currently set on the OUTPUT
> > +   queue.
> > +   In order to enumerate raw formats supported by a given coded
> > +   format, the client must first set that coded format on the
> > +   OUTPUT queue and then enumerate the CAPTURE queue.
> > +
> > +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> > +   resolutions for a given format, passing its fourcc in
> > +   :c:type:`v4l2_frmivalenum` ``pixel_format``.
>
> Is this a must-implement for drivers? coda currently doesn't implement
> enum-framesizes.

I'll leave this to Pawel. This might be one of the things that we
didn't get to implement in upstream in the end.

>
> > +
> > +   a. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for coded formats
> > +      must be maximums for given coded format for all supported raw
> > +      formats.
>
> I don't understand what maximums means in this context.
>
> If I have a decoder that can decode from 16x16 up to 1920x1088, should
> this return a continuous range from minimum frame size to maximum frame
> size?

Looks like the wording here is a bit off. It should be as you say +/-
alignment requirements, which can be specified by using
v4l2_frmsize_stepwise. Hardware that supports only a fixed set of
resolutions (if such exists), should use v4l2_frmsize_discrete.
Basically this should follow the standard description of
VIDIOC_ENUM_FRAMESIZES.

>
> > +   b. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for raw formats must
> > +      be maximums for given raw format for all supported coded
> > +      formats.
>
> Same here, this is unclear to me.

Should be as above, i.e. according to standard operation of
VIDIOC_ENUM_FRAMESIZES.

>
> > +   c. The client should derive the supported resolution for a
> > +      combination of coded+raw format by calculating the
> > +      intersection of resolutions returned from calls to
> > +      :c:func:`VIDIOC_ENUM_FRAMESIZES` for the given coded and raw formats.
> > +
> > +4. Supported profiles and levels for given format, if applicable, may be
> > +   queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> > +
> > +5. The client may use :c:func:`VIDIOC_ENUM_FRAMEINTERVALS` to enumerate maximum
> > +   supported framerates by the driver/hardware for a given
> > +   format+resolution combination.
>
> Same as above, is this must-implement for decoder drivers?

Leaving this to Pawel.

>
> > +
> > +Initialization sequence
> > +-----------------------
> > +
> > +1. (optional) Enumerate supported OUTPUT formats and resolutions. See
> > +   capability enumeration.
> > +
> > +2. Set a coded format on the source queue via :c:func:`VIDIOC_S_FMT`
> > +
> > +   a. Required fields:
> > +
> > +      i.   type = OUTPUT
> > +
> > +      ii.  fmt.pix_mp.pixelformat set to a coded format
> > +
> > +      iii. fmt.pix_mp.width, fmt.pix_mp.height only if cannot be
> > +           parsed from the stream for the given coded format;
> > +           ignored otherwise;
>
> When this is set, does this also update the format on the CAPTURE queue
> (i.e. would G_FMT(CAP), S_FMT(OUT), G_FMT(CAP) potentially return
> different CAP formats?) I think this should be explained here.

Yes, it would. Agreed that it should be explicitly mentioned here.

>
> What about colorimetry, does setting colorimetry here overwrite
> colorimetry information that may potentially be contained in the stream?

I'd say that if the hardware/driver can't report such information,
CAPTURE queue should report V4L2_COLORSPACE_DEFAULT and userspace
should take care of determining the right one (or using a default one)
on its own. This would eliminate the need to set anything on OUTPUT
queue.

Actually, when I think of it now, I wonder if we really should be
setting resolution here for bitstream formats that don't include
resolution, rather than on CAPTURE queue. Pawel, could you clarify
what was the intention here?

>
> > +   b. Return values:
> > +
> > +      i.  EINVAL: unsupported format.
> > +
> > +      ii. Others: per spec
> > +
> > +   .. note::
> > +
> > +      The driver must not adjust pixelformat, so if
> > +      ``V4L2_PIX_FMT_H264`` is passed but only
> > +      ``V4L2_PIX_FMT_H264_SLICE`` is supported, S_FMT will return
> > +      -EINVAL. If both are acceptable by client, calling S_FMT for
> > +      the other after one gets rejected may be required (or use
> > +      :c:func:`VIDIOC_ENUM_FMT` to discover beforehand, see Capability
> > +      enumeration).
> > +
> > +3.  (optional) Get minimum number of buffers required for OUTPUT queue
> > +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> > +    more buffers than minimum required by hardware/format (see
> > +    allocation).
> > +
> > +    a. Required fields:
> > +
> > +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> > +
> > +    b. Return values: per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i. value: required number of OUTPUT buffers for the currently set
> > +          format;
> > +
> > +4.  Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on OUTPUT
> > +    queue.
> > +
> > +    a. Required fields:
> > +
> > +       i.   count = n, where n > 0.
> > +
> > +       ii.  type = OUTPUT
> > +
> > +       iii. memory = as per spec
> > +
> > +    b. Return values: Per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i. count: adjusted to allocated number of buffers
> > +
> > +    d. The driver must adjust count to minimum of required number of
> > +       source buffers for given format and count passed. The client
> > +       must check this value after the ioctl returns to get the
> > +       number of buffers allocated.
> > +
> > +    .. note::
> > +
> > +       Passing count = 1 is useful for letting the driver choose
> > +       the minimum according to the selected format/hardware
> > +       requirements.
> > +
> > +    .. note::
> > +
> > +       To allocate more than minimum number of buffers (for pipeline
> > +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT)`` to
> > +       get minimum number of buffers required by the driver/format,
> > +       and pass the obtained value plus the number of additional
> > +       buffers needed in count to :c:func:`VIDIOC_REQBUFS`.
> > +
> > +5.  Begin parsing the stream for stream metadata via :c:func:`VIDIOC_STREAMON` on
> > +    OUTPUT queue. This step allows the driver to parse/decode
> > +    initial stream metadata until enough information to allocate
> > +    CAPTURE buffers is found. This is indicated by the driver by
> > +    sending a ``V4L2_EVENT_SOURCE_CHANGE`` event, which the client
> > +    must handle.
> > +
> > +    a. Required fields: as per spec.
> > +
> > +    b. Return values: as per spec.
> > +
> > +    .. note::
> > +
> > +       Calling :c:func:`VIDIOC_REQBUFS`, :c:func:`VIDIOC_STREAMON`
> > +       or :c:func:`VIDIOC_G_FMT` on the CAPTURE queue at this time is not
> > +       allowed and must return EINVAL.
>
> What about devices that have a frame buffer registration step before
> stream start? For coda I need to know all CAPTURE buffers before I can
> start streaming, because there is no way to register them after
> STREAMON. Do I have to split the driver internally to do streamoff and
> restart when the capture queue is brought up?

Do you mean that the hardware requires registering framebuffers before
the headers are parsed and resolution is detected? That sounds quite
unusual.

Other drivers would:
1) parse the header on STREAMON(OUTPUT),
2) report resolution to userspace,
3) have framebuffers allocated in REQBUFS(CAPTURE),
4) register framebuffers in STREAMON(CAPTURE).

>
> > +6.  This step only applies for coded formats that contain resolution
> > +    information in the stream.
> > +    Continue queuing/dequeuing bitstream buffers to/from the
> > +    OUTPUT queue via :c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`. The driver
> > +    must keep processing and returning each buffer to the client
> > +    until required metadata to send a ``V4L2_EVENT_SOURCE_CHANGE``
> > +    for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION`` is
> > +    found. There is no requirement to pass enough data for this to
> > +    occur in the first buffer and the driver must be able to
> > +    process any number
> > +
> > +    a. Required fields: as per spec.
> > +
> > +    b. Return values: as per spec.
> > +
> > +    c. If data in a buffer that triggers the event is required to decode
> > +       the first frame, the driver must not return it to the client,
> > +       but must retain it for further decoding.
> > +
> > +    d. Until the resolution source event is sent to the client, calling
> > +       :c:func:`VIDIOC_G_FMT` on the CAPTURE queue must return -EINVAL.
> > +
> > +    .. note::
> > +
> > +       No decoded frames are produced during this phase.
> > +
> > +7.  This step only applies for coded formats that contain resolution
> > +    information in the stream.
> > +    Receive and handle ``V4L2_EVENT_SOURCE_CHANGE`` from the driver
> > +    via :c:func:`VIDIOC_DQEVENT`. The driver must send this event once
> > +    enough data is obtained from the stream to allocate CAPTURE
> > +    buffers and to begin producing decoded frames.
> > +
> > +    a. Required fields:
> > +
> > +       i. type = ``V4L2_EVENT_SOURCE_CHANGE``
> > +
> > +    b. Return values: as per spec.
> > +
> > +    c. The driver must return u.src_change.changes =
> > +       ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> > +
> > +8.  This step only applies for coded formats that contain resolution
> > +    information in the stream.
> > +    Call :c:func:`VIDIOC_G_FMT` for CAPTURE queue to get format for the
> > +    destination buffers parsed/decoded from the bitstream.
> > +
> > +    a. Required fields:
> > +
> > +       i. type = CAPTURE
> > +
> > +    b. Return values: as per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i.   fmt.pix_mp.width, fmt.pix_mp.height: coded resolution
> > +            for the decoded frames
> > +
> > +       ii.  fmt.pix_mp.pixelformat: default/required/preferred by
> > +            driver pixelformat for decoded frames.
>
> This text is specific to multiplanar queues, what about singleplanar
> drivers?

Should be the same. There was "+5. Single-plane API (see spec) and
applicable structures may be used interchangeably with Multi-plane
API, unless specified otherwise." mentioned at the beginning of the
documentation, but I guess we could just make the description generic
instead.

>
> > +
> > +       iii. num_planes: set to number of planes for pixelformat.
> > +
> > +       iv.  For each plane p = [0, num_planes-1]:
> > +            plane_fmt[p].sizeimage, plane_fmt[p].bytesperline as
> > +            per spec for coded resolution.
> > +
> > +    .. note::
> > +
> > +       Te value of pixelformat may be any pixel format supported,
>
> Typo, "The value ..."

Thanks, will fix.

>
> > +       and must
> > +       be supported for current stream, based on the information
> > +       parsed from the stream and hardware capabilities. It is
> > +       suggested that driver chooses the preferred/optimal format
> > +       for given configuration. For example, a YUV format may be
> > +       preferred over an RGB format, if additional conversion step
> > +       would be required.
> > +
> > +9.  (optional) Enumerate CAPTURE formats via :c:func:`VIDIOC_ENUM_FMT` on
> > +    CAPTURE queue.
> > +    Once the stream information is parsed and known, the client
> > +    may use this ioctl to discover which raw formats are supported
> > +    for given stream and select on of them via :c:func:`VIDIOC_S_FMT`.
> > +
> > +    a. Fields/return values as per spec.
> > +
> > +    .. note::
> > +
> > +       The driver must return only formats supported for the
> > +       current stream parsed in this initialization sequence, even
> > +       if more formats may be supported by the driver in general.
> > +       For example, a driver/hardware may support YUV and RGB
> > +       formats for resolutions 1920x1088 and lower, but only YUV for
> > +       higher resolutions (e.g. due to memory bandwidth
> > +       limitations). After parsing a resolution of 1920x1088 or
> > +       lower, :c:func:`VIDIOC_ENUM_FMT` may return a set of YUV and RGB
> > +       pixelformats, but after parsing resolution higher than
> > +       1920x1088, the driver must not return (unsupported for this
> > +       resolution) RGB.
> > +
> > +       However, subsequent resolution change event
> > +       triggered after discovering a resolution change within the
> > +       same stream may switch the stream into a lower resolution;
> > +       :c:func:`VIDIOC_ENUM_FMT` must return RGB formats again in that case.
> > +
> > +10.  (optional) Choose a different CAPTURE format than suggested via
> > +     :c:func:`VIDIOC_S_FMT` on CAPTURE queue. It is possible for the client
> > +     to choose a different format than selected/suggested by the
> > +     driver in :c:func:`VIDIOC_G_FMT`.
> > +
> > +     a. Required fields:
> > +
> > +        i.  type = CAPTURE
> > +
> > +        ii. fmt.pix_mp.pixelformat set to a coded format
> > +
> > +     b. Return values:
> > +
> > +        i. EINVAL: unsupported format.
> > +
> > +     c. Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available formats
> > +        after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to find
> > +        out a set of allowed pixelformats for given configuration,
> > +        but not required.
>
> What about colorimetry? Should this and TRY_FMT only allow colorimetry
> that is parsed from the stream, if available, or that was set via
> S_FMT(OUT) as an override?

I'd say this depend on the hardware. If it can convert the video into
desired color space, it could be allowed.

>
> > +11.  (optional) Acquire visible resolution via :c:func:`VIDIOC_G_SELECTION`.
> > +
> > +    a. Required fields:
> > +
> > +       i.  type = CAPTURE
> > +
> > +       ii. target = ``V4L2_SEL_TGT_CROP``
> > +
> > +    b. Return values: per spec.
> > +
> > +    c. Return fields
> > +
> > +       i. r.left, r.top, r.width, r.height: visible rectangle; this must
> > +          fit within coded resolution returned from :c:func:`VIDIOC_G_FMT`.
>
> Isn't CROP supposed to be set on the OUTPUT queue only and COMPOSE on
> the CAPTURE queue?

Why? Both CROP and COMPOSE can be used on any queue, if supported by
given interface.

However, on codecs, since OUTPUT queue is a bitstream, I don't think
selection makes sense there.

> I would expect COMPOSE/COMPOSE_DEFAULT to be set to the visible
> rectangle and COMPOSE_PADDED to be set to the rectangle that the
> hardware actually overwrites.

Yes, that's a good point. I'd also say that CROP/CROP_DEFAULT should
be set to the visible rectangle as well, to allow adding handling for
cases when the hardware can actually do further cropping.

>
> > +12. (optional) Get minimum number of buffers required for CAPTURE queue
> > +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> > +    more buffers than minimum required by hardware/format (see
> > +    allocation).
> > +
> > +    a. Required fields:
> > +
> > +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
> > +
> > +    b. Return values: per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i. value: minimum number of buffers required to decode the stream
> > +          parsed in this initialization sequence.
> > +
> > +    .. note::
> > +
> > +       Note that the minimum number of buffers must be at least the
> > +       number required to successfully decode the current stream.
> > +       This may for example be the required DPB size for an H.264
> > +       stream given the parsed stream configuration (resolution,
> > +       level).
> > +
> > +13. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS` on the
> > +    CAPTURE queue.
> > +
> > +    a. Required fields:
> > +
> > +       i.   count = n, where n > 0.
> > +
> > +       ii.  type = CAPTURE
> > +
> > +       iii. memory = as per spec
> > +
> > +    b. Return values: Per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i. count: adjusted to allocated number of buffers.
> > +
> > +    d. The driver must adjust count to minimum of required number of
> > +       destination buffers for given format and stream configuration
> > +       and the count passed. The client must check this value after
> > +       the ioctl returns to get the number of buffers allocated.
> > +
> > +    .. note::
> > +
> > +       Passing count = 1 is useful for letting the driver choose
> > +       the minimum.
> > +
> > +    .. note::
> > +
> > +       To allocate more than minimum number of buffers (for pipeline
> > +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE)`` to
> > +       get minimum number of buffers required, and pass the obtained
> > +       value plus the number of additional buffers needed in count
> > +       to :c:func:`VIDIOC_REQBUFS`.
> > +
> > +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
> > +
> > +    a. Required fields: as per spec.
> > +
> > +    b. Return values: as per spec.
> > +
> > +Decoding
> > +--------
> > +
> > +This state is reached after a successful initialization sequence. In
> > +this state, client queues and dequeues buffers to both queues via
> > +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, as per spec.
> > +
> > +Both queues operate independently. The client may queue and dequeue
> > +buffers to queues in any order and at any rate, also at a rate different
> > +for each queue. The client may queue buffers within the same queue in
> > +any order (V4L2 index-wise). It is recommended for the client to operate
> > +the queues independently for best performance.
> > +
> > +Source OUTPUT buffers must contain:
> > +
> > +-  H.264/AVC: one or more complete NALUs of an Annex B elementary
> > +   stream; one buffer does not have to contain enough data to decode
> > +   a frame;
>
> What if the hardware only supports handling complete frames?

Pawel, could you help with this?

>
> > +-  VP8/VP9: one or more complete frames.
> > +
> > +No direct relationship between source and destination buffers and the
> > +timing of buffers becoming available to dequeue should be assumed in the
> > +Stream API. Specifically:
> > +
> > +-  a buffer queued to OUTPUT queue may result in no buffers being
> > +   produced on the CAPTURE queue (e.g. if it does not contain
> > +   encoded data, or if only metadata syntax structures are present
> > +   in it), or one or more buffers produced on the CAPTURE queue (if
> > +   the encoded data contained more than one frame, or if returning a
> > +   decoded frame allowed the driver to return a frame that preceded
> > +   it in decode, but succeeded it in display order)
> > +
> > +-  a buffer queued to OUTPUT may result in a buffer being produced on
> > +   the CAPTURE queue later into decode process, and/or after
> > +   processing further OUTPUT buffers, or be returned out of order,
> > +   e.g. if display reordering is used
> > +
> > +-  buffers may become available on the CAPTURE queue without additional
> > +   buffers queued to OUTPUT (e.g. during flush or EOS)
> > +
> > +Seek
> > +----
> > +
> > +Seek is controlled by the OUTPUT queue, as it is the source of bitstream
> > +data. CAPTURE queue remains unchanged/unaffected.
>
> Does this mean that to achieve instantaneous seeks the driver has to
> flush its CAPTURE queue internally when a seek is issued?

That's a good point. I'd say that we might actually want the userspace
to restart the capture queue in such case. Pawel, do you have any
opinion on this?

>
> > +
> > +1. Stop the OUTPUT queue to begin the seek sequence via
> > +   :c:func:`VIDIOC_STREAMOFF`.
> > +
> > +   a. Required fields:
> > +
> > +      i. type = OUTPUT
> > +
> > +   b. The driver must drop all the pending OUTPUT buffers and they are
> > +      treated as returned to the client (as per spec).
>
> What about pending CAPTURE buffers that the client may not yet have
> dequeued?

Just as written here: nothing happens to them, since the "CAPTURE
queue remains unchanged/unaffected". :)

>
> > +
> > +2. Restart the OUTPUT queue via :c:func:`VIDIOC_STREAMON`
> > +
> > +   a. Required fields:
> > +
> > +      i. type = OUTPUT
> > +
> > +   b. The driver must be put in a state after seek and be ready to
> > +      accept new source bitstream buffers.
> > +
> > +3. Start queuing buffers to OUTPUT queue containing stream data after
> > +   the seek until a suitable resume point is found.
> > +
> > +   .. note::
> > +
> > +      There is no requirement to begin queuing stream
> > +      starting exactly from a resume point (e.g. SPS or a keyframe).
> > +      The driver must handle any data queued and must keep processing
> > +      the queued buffers until it finds a suitable resume point.
> > +      While looking for a resume point, the driver processes OUTPUT
> > +      buffers and returns them to the client without producing any
> > +      decoded frames.
> > +
> > +4. After a resume point is found, the driver will start returning
> > +   CAPTURE buffers with decoded frames.
> > +
> > +   .. note::
> > +
> > +      There is no precise specification for CAPTURE queue of when it
> > +      will start producing buffers containing decoded data from
> > +      buffers queued after the seek, as it operates independently
> > +      from OUTPUT queue.
> > +
> > +      -  The driver is allowed to and may return a number of remaining CAPTURE
> > +         buffers containing decoded frames from before the seek after the
> > +         seek sequence (STREAMOFF-STREAMON) is performed.
>
> Oh, ok. That answers my last question above.
>
> > +      -  The driver is also allowed to and may not return all decoded frames
> > +         queued but not decode before the seek sequence was initiated.
> > +         E.g. for an OUTPUT queue sequence: QBUF(A), QBUF(B),
> > +         STREAMOFF(OUT), STREAMON(OUT), QBUF(G), QBUF(H), any of the
> > +         following results on the CAPTURE queue is allowed: {A’, B’, G’,
> > +         H’}, {A’, G’, H’}, {G’, H’}.
> > +
> > +Pause
> > +-----
> > +
> > +In order to pause, the client should just cease queuing buffers onto the
> > +OUTPUT queue. This is different from the general V4L2 API definition of
> > +pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue. Without
> > +source bitstream data, there is not data to process and the hardware
> > +remains idle. Conversely, using :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue
> > +indicates a seek, which 1) drops all buffers in flight and 2) after a
>
> "... 1) drops all OUTPUT buffers in flight ... " ?

Yeah, although it's kind of inferred from the standard behavior of
VIDIOC_STREAMOFF on given queue.

>
> > +subsequent :c:func:`VIDIOC_STREAMON` will look for and only continue from a
> > +resume point. This is usually undesirable for pause. The
> > +STREAMOFF-STREAMON sequence is intended for seeking.
> > +
> > +Similarly, CAPTURE queue should remain streaming as well, as the
> > +STREAMOFF-STREAMON sequence on it is intended solely for changing buffer
> > +sets
> > +
> > +Dynamic resolution change
> > +-------------------------
> > +
> > +When driver encounters a resolution change in the stream, the dynamic
> > +resolution change sequence is started.
>
> Must all drivers support dynamic resolution change?

I'd say no, but I guess that would mean that the driver never
encounters it, because hardware wouldn't report it.

I wonder would happen in such case, though. Obviously decoding of such
stream couldn't continue without support in the driver.

>
> > +1.  On encountering a resolution change in the stream. The driver must
> > +    first process and decode all remaining buffers from before the
> > +    resolution change point.
> > +
> > +2.  After all buffers containing decoded frames from before the
> > +    resolution change point are ready to be dequeued on the
> > +    CAPTURE queue, the driver sends a ``V4L2_EVENT_SOURCE_CHANGE``
> > +    event for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> > +    The last buffer from before the change must be marked with
> > +    :c:type:`v4l2_buffer` ``flags`` flag ``V4L2_BUF_FLAG_LAST`` as in the flush
> > +    sequence.
> > +
> > +    .. note::
> > +
> > +       Any attempts to dequeue more buffers beyond the buffer marked
> > +       with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
> > +       :c:func:`VIDIOC_DQBUF`.
> > +
> > +3.  After dequeuing all remaining buffers from the CAPTURE queue, the
> > +    client must call :c:func:`VIDIOC_STREAMOFF` on the CAPTURE queue. The
> > +    OUTPUT queue remains streaming (calling STREAMOFF on it would
> > +    trigger a seek).
> > +    Until STREAMOFF is called on the CAPTURE queue (acknowledging
> > +    the event), the driver operates as if the resolution hasn’t
> > +    changed yet, i.e. :c:func:`VIDIOC_G_FMT`, etc. return previous
> > +    resolution.
>
> What about the OUTPUT queue resolution, does it change as well?

There shouldn't be resolution associated with OUTPUT queue, because
pixel format is bitstream, not raw frame.

>
> > +4.  The client frees the buffers on the CAPTURE queue using
> > +    :c:func:`VIDIOC_REQBUFS`.
> > +
> > +    a. Required fields:
> > +
> > +       i.   count = 0
> > +
> > +       ii.  type = CAPTURE
> > +
> > +       iii. memory = as per spec
> > +
> > +5.  The client calls :c:func:`VIDIOC_G_FMT` for CAPTURE to get the new format
> > +    information.
> > +    This is identical to calling :c:func:`VIDIOC_G_FMT` after
> > +    ``V4L2_EVENT_SRC_CH_RESOLUTION`` in the initialization
> > +    sequence and should be handled similarly.
> > +
> > +    .. note::
> > +
> > +       It is allowed for the driver not to support the same
> > +       pixelformat as previously used (before the resolution change)
> > +       for the new resolution. The driver must select a default
> > +       supported pixelformat and return it from :c:func:`VIDIOC_G_FMT`, and
> > +       client must take note of it.
> > +
>
> Can steps 4. and 5. be done in reverse order (i.e. first G_FMT and then
> REQBUFS(0))?
> If the client already has buffers allocated that are large enough to
> contain decoded buffers in the new resolution, it might be preferable to
> just keep them instead of reallocating.

I think we had some thoughts on similar cases. Pawel, do you recall
what was the problem?

I agree though, that it would make sense to keep the buffers, if they
are big enough.

>
> > +6.  (optional) The client is allowed to enumerate available formats and
> > +    select a different one than currently chosen (returned via
> > +    :c:func:`VIDIOC_G_FMT)`. This is identical to a corresponding step in
> > +    the initialization sequence.
> > +
> > +7.  (optional) The client acquires visible resolution as in
> > +    initialization sequence.
> > +
> > +8.  (optional) The client acquires minimum number of buffers as in
> > +    initialization sequence.
> > +
> > +9.  The client allocates a new set of buffers for the CAPTURE queue via
> > +    :c:func:`VIDIOC_REQBUFS`. This is identical to a corresponding step in
> > +    the initialization sequence.
> > +
> > +10. The client resumes decoding by issuing :c:func:`VIDIOC_STREAMON` on the
> > +    CAPTURE queue.
> > +
> > +During the resolution change sequence, the OUTPUT queue must remain
> > +streaming. Calling :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue will initiate seek.
> > +
> > +The OUTPUT queue operates separately from the CAPTURE queue for the
> > +duration of the entire resolution change sequence. It is allowed (and
> > +recommended for best performance and simplcity) for the client to keep
> > +queuing/dequeuing buffers from/to OUTPUT queue even while processing
> > +this sequence.
> > +
> > +.. note::
> > +
> > +   It is also possible for this sequence to be triggered without
> > +   change in resolution if a different number of CAPTURE buffers is
> > +   required in order to continue decoding the stream.
> > +
> > +Flush
> > +-----
> > +
> > +Flush is the process of draining the CAPTURE queue of any remaining
> > +buffers. After the flush sequence is complete, the client has received
> > +all decoded frames for all OUTPUT buffers queued before the sequence was
> > +started.
> > +
> > +1. Begin flush by issuing :c:func:`VIDIOC_DECODER_CMD`.
> > +
> > +   a. Required fields:
> > +
> > +      i. cmd = ``V4L2_DEC_CMD_STOP``
> > +
> > +2. The driver must process and decode as normal all OUTPUT buffers
> > +   queued by the client before the :c:func:`VIDIOC_DECODER_CMD` was
> > +   issued.
> > +   Any operations triggered as a result of processing these
> > +   buffers (including the initialization and resolution change
> > +   sequences) must be processed as normal by both the driver and
> > +   the client before proceeding with the flush sequence.
> > +
> > +3. Once all OUTPUT buffers queued before ``V4L2_DEC_CMD_STOP`` are
> > +   processed:
> > +
> > +   a. If the CAPTURE queue is streaming, once all decoded frames (if
> > +      any) are ready to be dequeued on the CAPTURE queue, the
> > +      driver must send a ``V4L2_EVENT_EOS``. The driver must also
> > +      set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer` ``flags`` field on the
> > +      buffer on the CAPTURE queue containing the last frame (if
> > +      any) produced as a result of processing the OUTPUT buffers
> > +      queued before ``V4L2_DEC_CMD_STOP``. If no more frames are
> > +      left to be returned at the point of handling
> > +      ``V4L2_DEC_CMD_STOP``, the driver must return an empty buffer
> > +      (with :c:type:`v4l2_buffer` ``bytesused`` = 0) as the last buffer with
> > +      ``V4L2_BUF_FLAG_LAST`` set instead.
> > +      Any attempts to dequeue more buffers beyond the buffer
> > +      marked with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE
> > +      error from :c:func:`VIDIOC_DQBUF`.
> > +
> > +   b. If the CAPTURE queue is NOT streaming, no action is necessary for
> > +      CAPTURE queue and the driver must send a ``V4L2_EVENT_EOS``
> > +      immediately after all OUTPUT buffers in question have been
> > +      processed.
> > +
> > +4. To resume, client may issue ``V4L2_DEC_CMD_START``.
> > +
> > +End of stream
> > +-------------
> > +
> > +When an explicit end of stream is encountered by the driver in the
> > +stream, it must send a ``V4L2_EVENT_EOS`` to the client after all frames
> > +are decoded and ready to be dequeued on the CAPTURE queue, with the
> > +:c:type:`v4l2_buffer` ``flags`` set to ``V4L2_BUF_FLAG_LAST``. This behavior is
> > +identical to the flush sequence as if triggered by the client via
> > +``V4L2_DEC_CMD_STOP``.
> > +
> > +Commit points
> > +-------------
> > +
> > +Setting formats and allocating buffers triggers changes in the behavior
> > +of the driver.
> > +
> > +1. Setting format on OUTPUT queue may change the set of formats
> > +   supported/advertised on the CAPTURE queue. It also must change
> > +   the format currently selected on CAPTURE queue if it is not
> > +   supported by the newly selected OUTPUT format to a supported one.
>
> Ok. Is the same true about the contained colorimetry? What should happen
> if the stream contains colorimetry information that differs from
> S_FMT(OUT) colorimetry?

As I explained close to the top, IMHO we shouldn't be setting
colorimetry on OUTPUT queue.

>
> > +2. Enumerating formats on CAPTURE queue must only return CAPTURE formats
> > +   supported for the OUTPUT format currently set.
> > +
> > +3. Setting/changing format on CAPTURE queue does not change formats
> > +   available on OUTPUT queue. An attempt to set CAPTURE format that
> > +   is not supported for the currently selected OUTPUT format must
> > +   result in an error (-EINVAL) from :c:func:`VIDIOC_S_FMT`.
>
> Is this limited to the pixel format? Surely setting out of bounds
> width/height or incorrect colorimetry should not result in EINVAL but
> still be corrected by the driver?

That doesn't sound right to me indeed. The driver should fix up
S_FMT(CAPTURE), including pixel format or anything else. It must only
not alter OUTPUT settings.

>
> > +4. Enumerating formats on OUTPUT queue always returns a full set of
> > +   supported formats, irrespective of the current format selected on
> > +   CAPTURE queue.
> > +
> > +5. After allocating buffers on the OUTPUT queue, it is not possible to
> > +   change format on it.
>
> So even after source change events the OUTPUT queue still keeps the
> initial OUTPUT format?

It would basically only have pixelformat (fourcc) assigned to it,
since bitstream formats are not video frames, but just sequences of
bytes. I don't think it makes sense to change e.g. from H264 to VP8
during streaming.

Best regards,
Tomasz
Tomasz Figa June 6, 2018, 9:03 a.m. UTC | #4
Hi Dave,

Thanks for review! Please see my replies inline.

On Tue, Jun 5, 2018 at 10:10 PM Dave Stevenson
<dave.stevenson@raspberrypi.org> wrote:
>
> Hi Tomasz.
>
> Thanks for formalising this.
> I'm working on a stateful V4L2 codec driver on the Raspberry Pi and
> was having to deduce various implementation details from other
> drivers. I know how much we all tend to hate having to write
> documentation, but it is useful to have.

Agreed. Piles of other work showing up out of nowhere don't help either. :(

A lot of credits go to Pawel, who wrote down most of details discussed
earlier into a document that we used internally to implement Chrome OS
video stack and drivers. He unfortunately got flooded with loads of
other work and ran out of time to finalize it and produce something
usable as kernel documentation (time was needed especially in the old
DocBook xml days).

>
> On 5 June 2018 at 11:33, Tomasz Figa <tfiga@chromium.org> wrote:
> > Due to complexity of the video decoding process, the V4L2 drivers of
> > stateful decoder hardware require specific sequencies of V4L2 API calls
> > to be followed. These include capability enumeration, initialization,
> > decoding, seek, pause, dynamic resolution change, flush and end of
> > stream.
> >
> > Specifics of the above have been discussed during Media Workshops at
> > LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> > Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> > originated at those events was later implemented by the drivers we already
> > have merged in mainline, such as s5p-mfc or mtk-vcodec.
> >
> > The only thing missing was the real specification included as a part of
> > Linux Media documentation. Fix it now and document the decoder part of
> > the Codec API.
> >
> > Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> > ---
> >  Documentation/media/uapi/v4l/dev-codec.rst | 771 +++++++++++++++++++++
> >  Documentation/media/uapi/v4l/v4l2.rst      |  14 +-
> >  2 files changed, 784 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> > index c61e938bd8dc..0483b10c205e 100644
> > --- a/Documentation/media/uapi/v4l/dev-codec.rst
> > +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> > @@ -34,3 +34,774 @@ the codec and reprogram it whenever another file handler gets access.
> >  This is different from the usual video node behavior where the video
> >  properties are global to the device (i.e. changing something through one
> >  file handle is visible through another file handle).
>
> I know this isn't part of the changes, but raises a question in
> v4l2-compliance (so probably one for Hans).
> testUnlimitedOpens tries opening the device 100 times. On a normal
> device this isn't a significant overhead, but when you're allocating
> resources on a per instance basis it quickly adds up.
> Internally I have state that has a limit of 64 codec instances (either
> encode or decode), so either I allocate at start_streaming and fail on
> the 65th one, or I fail on open. I generally take the view that
> failing early is a good thing.
> Opinions? Is 100 instances of an M2M device really sensible?

I don't think we can guarantee opening an arbitrary number of
instances. To add to your point about resource usage, this is
something that can be limited already on hardware or firmware level.
Another aspect is that the hardware is often rated to decode N streams
at resolution X by Y at Z fps, so it might not even make practical
sense to use it to decode M > N streams.

>
> > +This interface is generally appropriate for hardware that does not
> > +require additional software involvement to parse/partially decode/manage
> > +the stream before/after processing in hardware.
> > +
> > +Input data to the Stream API are buffers containing unprocessed video
> > +stream (Annex-B H264/H265 stream, raw VP8/9 stream) only. The driver is
> > +expected not to require any additional information from the client to
> > +process these buffers, and to return decoded frames on the CAPTURE queue
> > +in display order.
>
> This intersects with the question I asked on the list back in April
> but got no reply [1].
> Is there a requirement or expectation for the encoded data to be
> framed as a single encoded frame per buffer, or is feeding in full
> buffer sized chunks from a ES valid? It's not stated for the
> description of V4L2_PIX_FMT_H264 etc either.
> If not framed then anything assuming one-in one-out fails badly, but
> it's likely to fail anyway if the stream has reference frames.

I believe we agreed on the data to be framed. The details are
explained in "Decoding" session, but I guess it could actually belong
to the definition of each specific pixel format.

>
> This description is also exclusive to video decode, whereas the top
> section states "A V4L2 codec can compress, decompress, transform, or
> otherwise convert video data". Should it be in the decoder section
> below?

Yeah, looks like it should be moved indeed.

>
> Have I missed a statement of what the Stream API is and how it differs
> from any other API?

This is a leftover that I should have removed, since this document
continues to call this interface "Codec Interface".

The other API is the "Stateless Codec Interface" mentioned below. As
opposed to the regular (stateful) Codec Interface, it would target the
hardware that do not store any decoding state for its own use, but
rather expects the software to provide necessary data for each chunk
of framed bitstream, such as headers parsed into predefined structures
(as per codec standard) or reference frame lists. With stateless API,
userspace would have to explicitly manage which buffers are used as
reference frames, reordering to display order and so on. It's a WiP
and is partially blocked by Request API, since it needs extra data to
be given in a per-buffer manner.

>
> [1] https://www.spinics.net/lists/linux-media/msg133102.html
>
> > +Performing software parsing, processing etc. of the stream in the driver
> > +in order to support stream API is strongly discouraged. In such case use
> > +of Stateless Codec Interface (in development) is preferred.
> > +
> > +Conventions and notation used in this document
> > +==============================================
> > +
> > +1. The general V4L2 API rules apply if not specified in this document
> > +   otherwise.
> > +
> > +2. The meaning of words “must”, “may”, “should”, etc. is as per RFC
> > +   2119.
> > +
> > +3. All steps not marked “optional” are required.
> > +
> > +4. :c:func:`VIDIOC_G_EXT_CTRLS`, :c:func:`VIDIOC_S_EXT_CTRLS` may be used interchangeably with
> > +   :c:func:`VIDIOC_G_CTRL`, :c:func:`VIDIOC_S_CTRL`, unless specified otherwise.
> > +
> > +5. Single-plane API (see spec) and applicable structures may be used
> > +   interchangeably with Multi-plane API, unless specified otherwise.
> > +
> > +6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
> > +   [0..2]: i = 0, 1, 2.
> > +
> > +7. For OUTPUT buffer A, A’ represents a buffer on the CAPTURE queue
> > +   containing data (decoded or encoded frame/stream) that resulted
> > +   from processing buffer A.
> > +
> > +Glossary
> > +========
> > +
> > +CAPTURE
> > +   the destination buffer queue, decoded frames for
> > +   decoders, encoded bitstream for encoders;
> > +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE`` or
> > +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``
> > +
> > +client
> > +   application client communicating with the driver
> > +   implementing this API
> > +
> > +coded format
> > +   encoded/compressed video bitstream format (e.g.
> > +   H.264, VP8, etc.); see raw format; this is not equivalent to fourcc
> > +   (V4L2 pixelformat), as each coded format may be supported by multiple
> > +   fourccs (e.g. ``V4L2_PIX_FMT_H264``, ``V4L2_PIX_FMT_H264_SLICE``, etc.)
> > +
> > +coded height
> > +   height for given coded resolution
> > +
> > +coded resolution
> > +   stream resolution in pixels aligned to codec
> > +   format and hardware requirements; see also visible resolution
> > +
> > +coded width
> > +   width for given coded resolution
> > +
> > +decode order
> > +   the order in which frames are decoded; may differ
> > +   from display (output) order if frame reordering (B frames) is active in
> > +   the stream; OUTPUT buffers must be queued in decode order; for frame
> > +   API, CAPTURE buffers must be returned by the driver in decode order;
> > +
> > +display order
> > +   the order in which frames must be displayed
> > +   (outputted); for stream API, CAPTURE buffers must be returned by the
> > +   driver in display order;
> > +
> > +EOS
> > +   end of stream
> > +
> > +input height
> > +   height in pixels for given input resolution
> > +
> > +input resolution
> > +   resolution in pixels of source frames being input
> > +   to the encoder and subject to further cropping to the bounds of visible
> > +   resolution
> > +
> > +input width
> > +   width in pixels for given input resolution
> > +
> > +OUTPUT
> > +   the source buffer queue, encoded bitstream for
> > +   decoders, raw frames for encoders; ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
> > +   ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``
> > +
> > +raw format
> > +   uncompressed format containing raw pixel data (e.g.
> > +   YUV, RGB formats)
> > +
> > +resume point
> > +   a point in the bitstream from which decoding may
> > +   start/continue, without any previous state/data present, e.g.: a
> > +   keyframe (VPX) or SPS/PPS/IDR sequence (H.264); a resume point is
> > +   required to start decode of a new stream, or to resume decoding after a
> > +   seek;
> > +
> > +source buffer
> > +   buffers allocated for source queue
> > +
> > +source queue
> > +   queue containing buffers used for source data, i.e.
> > +
> > +visible height
> > +   height for given visible resolution
> > +
> > +visible resolution
> > +   stream resolution of the visible picture, in
> > +   pixels, to be used for display purposes; must be smaller or equal to
> > +   coded resolution;
> > +
> > +visible width
> > +   width for given visible resolution
> > +
> > +Decoder
> > +=======
> > +
> > +Querying capabilities
> > +---------------------
> > +
> > +1. To enumerate the set of coded formats supported by the driver, the
> > +   client uses :c:func:`VIDIOC_ENUM_FMT` for OUTPUT. The driver must always
> > +   return the full set of supported formats, irrespective of the
> > +   format set on the CAPTURE queue.
> > +
> > +2. To enumerate the set of supported raw formats, the client uses
> > +   :c:func:`VIDIOC_ENUM_FMT` for CAPTURE. The driver must return only the
> > +   formats supported for the format currently set on the OUTPUT
> > +   queue.
> > +   In order to enumerate raw formats supported by a given coded
> > +   format, the client must first set that coded format on the
> > +   OUTPUT queue and then enumerate the CAPTURE queue.
> > +
> > +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> > +   resolutions for a given format, passing its fourcc in
> > +   :c:type:`v4l2_frmivalenum` ``pixel_format``.
> > +
> > +   a. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for coded formats
> > +      must be maximums for given coded format for all supported raw
> > +      formats.
> > +
> > +   b. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for raw formats must
> > +      be maximums for given raw format for all supported coded
> > +      formats.
>
> So in both these cases you expect index=0 to return a response with
> the type V4L2_FRMSIZE_TYPE_DISCRETE, and the maximum resolution?
> -EINVAL on any other index value?
> And I assume you mean maximum coded resolution, not visible resolution.
> Or is V4L2_FRMSIZE_TYPE_STEPWISE more appropriate? In which case the
> minimum is presumably a single macroblock, max is the max coded
> resolution, and step size is the macroblock size, at least on the
> CAPTURE side.

Codec size seems to make the most sense here, since that's what
corresponds to the amount of data the decoder needs to process. Let's
have it stated more explicitly.

My understanding is that VIDIOC_ENUM_FRAMESIZES maintains its regular
semantics here and which type of range is used would depend on the
hardware capabilities. This actually matches to what we have
implemented in Chromium video stack [1]. Let's state it more
explicitly as well.

[1] https://cs.chromium.org/chromium/src/media/gpu/v4l2/v4l2_device.cc?q=VIDIOC_ENUM_FRAMESIZES&sq=package:chromium&g=0&l=279

>
> > +   c. The client should derive the supported resolution for a
> > +      combination of coded+raw format by calculating the
> > +      intersection of resolutions returned from calls to
> > +      :c:func:`VIDIOC_ENUM_FRAMESIZES` for the given coded and raw formats.
> > +
> > +4. Supported profiles and levels for given format, if applicable, may be
> > +   queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> > +
> > +5. The client may use :c:func:`VIDIOC_ENUM_FRAMEINTERVALS` to enumerate maximum
> > +   supported framerates by the driver/hardware for a given
> > +   format+resolution combination.
> > +
> > +Initialization sequence
> > +-----------------------
> > +
> > +1. (optional) Enumerate supported OUTPUT formats and resolutions. See
> > +   capability enumeration.
> > +
> > +2. Set a coded format on the source queue via :c:func:`VIDIOC_S_FMT`
> > +
> > +   a. Required fields:
> > +
> > +      i.   type = OUTPUT
> > +
> > +      ii.  fmt.pix_mp.pixelformat set to a coded format
> > +
> > +      iii. fmt.pix_mp.width, fmt.pix_mp.height only if cannot be
> > +           parsed from the stream for the given coded format;
> > +           ignored otherwise;
> > +
> > +   b. Return values:
> > +
> > +      i.  EINVAL: unsupported format.
> > +
> > +      ii. Others: per spec
> > +
> > +   .. note::
> > +
> > +      The driver must not adjust pixelformat, so if
> > +      ``V4L2_PIX_FMT_H264`` is passed but only
> > +      ``V4L2_PIX_FMT_H264_SLICE`` is supported, S_FMT will return
> > +      -EINVAL. If both are acceptable by client, calling S_FMT for
> > +      the other after one gets rejected may be required (or use
> > +      :c:func:`VIDIOC_ENUM_FMT` to discover beforehand, see Capability
> > +      enumeration).
>
> I can't find V4L2_PIX_FMT_H264_SLICE in mainline. From trying to build
> Chromium I believe it's a Rockchip special. Is it being upstreamed?

This is a part of the stateless Codec Interface being in development.
We used to call it "Slice API" internally and so the name. It is not
specific to Rockchip, but rather the whole class of stateless codecs,
as I explained by the way of your another comment.

Any mention of it should be removed from the document for now.

> Or use V4L2_PIX_FMT_H264 vs V4L2_PIX_FMT_H264_NO_SC as the example?
> (I've just noticed I missed an instance of this further up as well).

Yeah, sounds like it would be a better example.

>
> > +3.  (optional) Get minimum number of buffers required for OUTPUT queue
> > +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> > +    more buffers than minimum required by hardware/format (see
> > +    allocation).
> > +
> > +    a. Required fields:
> > +
> > +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> > +
> > +    b. Return values: per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i. value: required number of OUTPUT buffers for the currently set
> > +          format;
> > +
> > +4.  Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on OUTPUT
> > +    queue.
> > +
> > +    a. Required fields:
> > +
> > +       i.   count = n, where n > 0.
> > +
> > +       ii.  type = OUTPUT
> > +
> > +       iii. memory = as per spec
> > +
> > +    b. Return values: Per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i. count: adjusted to allocated number of buffers
> > +
> > +    d. The driver must adjust count to minimum of required number of
> > +       source buffers for given format and count passed. The client
> > +       must check this value after the ioctl returns to get the
> > +       number of buffers allocated.
> > +
> > +    .. note::
> > +
> > +       Passing count = 1 is useful for letting the driver choose
> > +       the minimum according to the selected format/hardware
> > +       requirements.
> > +
> > +    .. note::
> > +
> > +       To allocate more than minimum number of buffers (for pipeline
> > +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT)`` to
> > +       get minimum number of buffers required by the driver/format,
> > +       and pass the obtained value plus the number of additional
> > +       buffers needed in count to :c:func:`VIDIOC_REQBUFS`.
> > +
> > +5.  Begin parsing the stream for stream metadata via :c:func:`VIDIOC_STREAMON` on
> > +    OUTPUT queue. This step allows the driver to parse/decode
> > +    initial stream metadata until enough information to allocate
> > +    CAPTURE buffers is found. This is indicated by the driver by
> > +    sending a ``V4L2_EVENT_SOURCE_CHANGE`` event, which the client
> > +    must handle.
> > +
> > +    a. Required fields: as per spec.
> > +
> > +    b. Return values: as per spec.
> > +
> > +    .. note::
> > +
> > +       Calling :c:func:`VIDIOC_REQBUFS`, :c:func:`VIDIOC_STREAMON`
> > +       or :c:func:`VIDIOC_G_FMT` on the CAPTURE queue at this time is not
> > +       allowed and must return EINVAL.
>
> I think you've just broken FFMpeg and Gstreamer with that statement.
>
> Gstreamer certainly doesn't subscribe to V4L2_EVENT_SOURCE_CHANGE but
> has already parsed the stream and set the output format to the correct
> resolution via S_FMT. IIRC it expects the driver to copy that across
> from output to capture which was an interesting niggle to find.
> FFMpeg does subscribe to V4L2_EVENT_SOURCE_CHANGE, although it seems
> to currently have a bug around coded resolution != visible resolution
> when it gets the event.
>
> One has to assume that these have been working quite happily against
> various hardware platforms, so it seems a little unfair to just break
> them.

That's certainly not what existing drivers do and the examples would be:

- s5p-mfc (the first codec driver in upstream) and mtk-vcodec (merged
quite recently)
    It just ignores width/height and OUTPUT queue
      https://elixir.bootlin.com/linux/latest/source/drivers/media/platform/s5p-mfc/s5p_mfc_dec.c#L443
    and reports what the hardware parses from bitstream on CAPTURE:
      https://elixir.bootlin.com/linux/latest/source/drivers/media/platform/s5p-mfc/s5p_mfc_dec.c#L352

- mtk-vcodec (merged quite recently):
    It indeed accepts whatever is set on OUTPUT as some kind of defaults,
      https://elixir.bootlin.com/linux/latest/source/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c#L856
    but those are overridden as soon as the headers are parsed
      https://elixir.bootlin.com/linux/latest/source/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c#L989

However, the above probably doesn't prevent Gstreamer from working,
because both drivers would allow REQBUFS(CAPTURE) before the parsing
is done and luckily the resolution would match later after parsing.

> So I guess my question is what is the reasoning for rejecting these
> calls? If you know the resolution ahead of time, allocate buffers, and
> start CAPTURE streaming before the event then should you be wrong
> you're just going through the dynamic resolution change path described
> later. If you're correct then you've saved some setup time. It also
> avoids having to have a special startup case in the driver.

We might need Pawel or Hans to comment on this, as I believe it has
been decided to be like this in earlier Media Workshops.

I personally don't see what would go wrong if we allow that and handle
a fallback using the dynamic resolution change flow. Maybe except the
need to rework the s5p-mfc driver.

>
> > +6.  This step only applies for coded formats that contain resolution
> > +    information in the stream.
> > +    Continue queuing/dequeuing bitstream buffers to/from the
> > +    OUTPUT queue via :c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`. The driver
> > +    must keep processing and returning each buffer to the client
> > +    until required metadata to send a ``V4L2_EVENT_SOURCE_CHANGE``
> > +    for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION`` is
> > +    found. There is no requirement to pass enough data for this to
> > +    occur in the first buffer and the driver must be able to
> > +    process any number
>
> So back to my earlier question, we're supporting tiny fragments of
> frames here? Or is the thought that you can pick up anywhere in a
> stream and the decoder will wait for the required resume point?

I think this is precisely about the hardware/driver discarding
bitstream frames until a frame containing resolution data is found. So
that would be the latter, I believe.

>
> > +    a. Required fields: as per spec.
> > +
> > +    b. Return values: as per spec.
> > +
> > +    c. If data in a buffer that triggers the event is required to decode
> > +       the first frame, the driver must not return it to the client,
> > +       but must retain it for further decoding.
> > +
> > +    d. Until the resolution source event is sent to the client, calling
> > +       :c:func:`VIDIOC_G_FMT` on the CAPTURE queue must return -EINVAL.
> > +
> > +    .. note::
> > +
> > +       No decoded frames are produced during this phase.
> > +
> > +7.  This step only applies for coded formats that contain resolution
> > +    information in the stream.
> > +    Receive and handle ``V4L2_EVENT_SOURCE_CHANGE`` from the driver
> > +    via :c:func:`VIDIOC_DQEVENT`. The driver must send this event once
> > +    enough data is obtained from the stream to allocate CAPTURE
> > +    buffers and to begin producing decoded frames.
> > +
> > +    a. Required fields:
> > +
> > +       i. type = ``V4L2_EVENT_SOURCE_CHANGE``
> > +
> > +    b. Return values: as per spec.
> > +
> > +    c. The driver must return u.src_change.changes =
> > +       ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> > +
> > +8.  This step only applies for coded formats that contain resolution
> > +    information in the stream.
> > +    Call :c:func:`VIDIOC_G_FMT` for CAPTURE queue to get format for the
> > +    destination buffers parsed/decoded from the bitstream.
> > +
> > +    a. Required fields:
> > +
> > +       i. type = CAPTURE
> > +
> > +    b. Return values: as per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i.   fmt.pix_mp.width, fmt.pix_mp.height: coded resolution
> > +            for the decoded frames
> > +
> > +       ii.  fmt.pix_mp.pixelformat: default/required/preferred by
> > +            driver pixelformat for decoded frames.
> > +
> > +       iii. num_planes: set to number of planes for pixelformat.
> > +
> > +       iv.  For each plane p = [0, num_planes-1]:
> > +            plane_fmt[p].sizeimage, plane_fmt[p].bytesperline as
> > +            per spec for coded resolution.
> > +
> > +    .. note::
> > +
> > +       Te value of pixelformat may be any pixel format supported,
>
> s/Te/The

Ack.

>
> > +       and must
> > +       be supported for current stream, based on the information
> > +       parsed from the stream and hardware capabilities. It is
> > +       suggested that driver chooses the preferred/optimal format
> > +       for given configuration. For example, a YUV format may be
> > +       preferred over an RGB format, if additional conversion step
> > +       would be required.
> > +
> > +9.  (optional) Enumerate CAPTURE formats via :c:func:`VIDIOC_ENUM_FMT` on
> > +    CAPTURE queue.
> > +    Once the stream information is parsed and known, the client
> > +    may use this ioctl to discover which raw formats are supported
> > +    for given stream and select on of them via :c:func:`VIDIOC_S_FMT`.
> > +
> > +    a. Fields/return values as per spec.
> > +
> > +    .. note::
> > +
> > +       The driver must return only formats supported for the
> > +       current stream parsed in this initialization sequence, even
> > +       if more formats may be supported by the driver in general.
> > +       For example, a driver/hardware may support YUV and RGB
> > +       formats for resolutions 1920x1088 and lower, but only YUV for
> > +       higher resolutions (e.g. due to memory bandwidth
> > +       limitations). After parsing a resolution of 1920x1088 or
> > +       lower, :c:func:`VIDIOC_ENUM_FMT` may return a set of YUV and RGB
> > +       pixelformats, but after parsing resolution higher than
> > +       1920x1088, the driver must not return (unsupported for this
> > +       resolution) RGB.
>
> There are some funny cases here then.
> Whilst memory bandwidth may limit the resolution that can be decoded
> in real-time, for a transcode use case you haven't got a real-time
> requirement. Enforcing this means you can never transcode that
> resolution to RGB.

I think the above is not about performance, but the general hardware
ability to decode into such format. The bandwidth might be just not
enough to even process one frame leading to some bus timeouts for
example. The history of hardware design knows a lot of funny cases. :)

> Actually I can't see any information related to frame rates being
> passed in other than timestamps, therefore the driver hasn't got
> sufficient information to make a sensible call based on memory
> bandwidth.

Again, I believe this is not about frame rate, but rather one-shot
bandwidth needed to fetch 1 frame data without breaking things.

> Perhaps it's just that the example of memory bandwidth being the
> limitation is a bad one.

Yeah, it might just be a not very good example. It could as well be
just a fixed size static memory inside the codec hardware, which would
obviously be capable of holding less pixels for 32-bit RGBx than
12-bit (in average) YUV420.

>
> > +       However, subsequent resolution change event
> > +       triggered after discovering a resolution change within the
> > +       same stream may switch the stream into a lower resolution;
> > +       :c:func:`VIDIOC_ENUM_FMT` must return RGB formats again in that case.
> > +
> > +10.  (optional) Choose a different CAPTURE format than suggested via
> > +     :c:func:`VIDIOC_S_FMT` on CAPTURE queue. It is possible for the client
> > +     to choose a different format than selected/suggested by the
> > +     driver in :c:func:`VIDIOC_G_FMT`.
> > +
> > +     a. Required fields:
> > +
> > +        i.  type = CAPTURE
> > +
> > +        ii. fmt.pix_mp.pixelformat set to a coded format
> > +
> > +     b. Return values:
> > +
> > +        i. EINVAL: unsupported format.
> > +
> > +     c. Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available formats
> > +        after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to find
> > +        out a set of allowed pixelformats for given configuration,
> > +        but not required.
> > +
> > +11.  (optional) Acquire visible resolution via :c:func:`VIDIOC_G_SELECTION`.
> > +
> > +    a. Required fields:
> > +
> > +       i.  type = CAPTURE
> > +
> > +       ii. target = ``V4L2_SEL_TGT_CROP``
> > +
> > +    b. Return values: per spec.
> > +
> > +    c. Return fields
> > +
> > +       i. r.left, r.top, r.width, r.height: visible rectangle; this must
> > +          fit within coded resolution returned from :c:func:`VIDIOC_G_FMT`.
> > +
> > +12. (optional) Get minimum number of buffers required for CAPTURE queue
> > +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> > +    more buffers than minimum required by hardware/format (see
> > +    allocation).
> > +
> > +    a. Required fields:
> > +
> > +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
> > +
> > +    b. Return values: per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i. value: minimum number of buffers required to decode the stream
> > +          parsed in this initialization sequence.
> > +
> > +    .. note::
> > +
> > +       Note that the minimum number of buffers must be at least the
> > +       number required to successfully decode the current stream.
> > +       This may for example be the required DPB size for an H.264
> > +       stream given the parsed stream configuration (resolution,
> > +       level).
> > +
> > +13. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS` on the
> > +    CAPTURE queue.
> > +
> > +    a. Required fields:
> > +
> > +       i.   count = n, where n > 0.
> > +
> > +       ii.  type = CAPTURE
> > +
> > +       iii. memory = as per spec
> > +
> > +    b. Return values: Per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i. count: adjusted to allocated number of buffers.
> > +
> > +    d. The driver must adjust count to minimum of required number of
> > +       destination buffers for given format and stream configuration
> > +       and the count passed. The client must check this value after
> > +       the ioctl returns to get the number of buffers allocated.
> > +
> > +    .. note::
> > +
> > +       Passing count = 1 is useful for letting the driver choose
> > +       the minimum.
> > +
> > +    .. note::
> > +
> > +       To allocate more than minimum number of buffers (for pipeline
> > +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE)`` to
> > +       get minimum number of buffers required, and pass the obtained
> > +       value plus the number of additional buffers needed in count
> > +       to :c:func:`VIDIOC_REQBUFS`.
> > +
> > +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
> > +
> > +    a. Required fields: as per spec.
> > +
> > +    b. Return values: as per spec.
> > +
> > +Decoding
> > +--------
> > +
> > +This state is reached after a successful initialization sequence. In
> > +this state, client queues and dequeues buffers to both queues via
> > +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, as per spec.
> > +
> > +Both queues operate independently. The client may queue and dequeue
> > +buffers to queues in any order and at any rate, also at a rate different
> > +for each queue. The client may queue buffers within the same queue in
> > +any order (V4L2 index-wise). It is recommended for the client to operate
> > +the queues independently for best performance.
>
> Only recommended sounds like a great case for clients to treat codecs
> as one-in one-out, and then fall over if you get extra header byte
> frames in the stream.

I think the meaning of "operating the queues independently" is a bit
different here, e.g. from separate threads.

But agreed that we need to make sure that the documentation explicitly
says that there is neither one-in one-out guarantee nor 1:1 relation
between OUT and CAP buffers, if it doesn't say it already.

>
> > +Source OUTPUT buffers must contain:
> > +
> > +-  H.264/AVC: one or more complete NALUs of an Annex B elementary
> > +   stream; one buffer does not have to contain enough data to decode
> > +   a frame;
>
> This appears to be answering my earlier question, but doesn't it
> belong in the definition of V4L2_PIX_FMT_H264 rather than buried in
> the codec description?
> I'm OK with that choice, but you are closing off the use case of
> effectively cat'ing an ES into the codec to be decoded.

I think it would indeed make sense to make this behavior a part of the
pixel format. Pawel, what do you think?

>
> There's the other niggle of how to specify sizeimage in the
> pixelformat for compressed data. I have never seen a satisfactory
> answer in most of the APIs I've encountered (*). How big can an
> I-frame be in a random stream? It may be a very badly coded stream,
> but if other decoders can cope, then it's the decoder that can't which
> will be seen to be buggy.

That's a very good question. I think we just empirically came up with
some values that seem to work in Chromium:
https://cs.chromium.org/chromium/src/media/gpu/v4l2/v4l2_slice_video_decode_accelerator.h?rcl=eed597a7f14cb03cd7db9d9722820dddd86b4c41&l=102
https://cs.chromium.org/chromium/src/media/gpu/v4l2/v4l2_video_decode_accelerator.cc?rcl=eed597a7f14cb03cd7db9d9722820dddd86b4c41&l=2241

Pawel, any background behind those?

>
> (* ) OpenMAX IL is the exception as you can pass partial frames with
> appropriate values in nFlags. Not many other positives one can say
> about IL though.
>
> > +-  VP8/VP9: one or more complete frames.
> > +
> > +No direct relationship between source and destination buffers and the
> > +timing of buffers becoming available to dequeue should be assumed in the
> > +Stream API. Specifically:
> > +
> > +-  a buffer queued to OUTPUT queue may result in no buffers being
> > +   produced on the CAPTURE queue (e.g. if it does not contain
> > +   encoded data, or if only metadata syntax structures are present
> > +   in it), or one or more buffers produced on the CAPTURE queue (if
> > +   the encoded data contained more than one frame, or if returning a
> > +   decoded frame allowed the driver to return a frame that preceded
> > +   it in decode, but succeeded it in display order)
> > +
> > +-  a buffer queued to OUTPUT may result in a buffer being produced on
> > +   the CAPTURE queue later into decode process, and/or after
> > +   processing further OUTPUT buffers, or be returned out of order,
> > +   e.g. if display reordering is used
> > +
> > +-  buffers may become available on the CAPTURE queue without additional
> > +   buffers queued to OUTPUT (e.g. during flush or EOS)
> > +
> > +Seek
> > +----
> > +
> > +Seek is controlled by the OUTPUT queue, as it is the source of bitstream
> > +data. CAPTURE queue remains unchanged/unaffected.
> > +
> > +1. Stop the OUTPUT queue to begin the seek sequence via
> > +   :c:func:`VIDIOC_STREAMOFF`.
> > +
> > +   a. Required fields:
> > +
> > +      i. type = OUTPUT
> > +
> > +   b. The driver must drop all the pending OUTPUT buffers and they are
> > +      treated as returned to the client (as per spec).
> > +
> > +2. Restart the OUTPUT queue via :c:func:`VIDIOC_STREAMON`
> > +
> > +   a. Required fields:
> > +
> > +      i. type = OUTPUT
> > +
> > +   b. The driver must be put in a state after seek and be ready to
> > +      accept new source bitstream buffers.
> > +
> > +3. Start queuing buffers to OUTPUT queue containing stream data after
> > +   the seek until a suitable resume point is found.
> > +
> > +   .. note::
> > +
> > +      There is no requirement to begin queuing stream
> > +      starting exactly from a resume point (e.g. SPS or a keyframe).
> > +      The driver must handle any data queued and must keep processing
> > +      the queued buffers until it finds a suitable resume point.
> > +      While looking for a resume point, the driver processes OUTPUT
> > +      buffers and returns them to the client without producing any
> > +      decoded frames.
> > +
> > +4. After a resume point is found, the driver will start returning
> > +   CAPTURE buffers with decoded frames.
> > +
> > +   .. note::
> > +
> > +      There is no precise specification for CAPTURE queue of when it
> > +      will start producing buffers containing decoded data from
> > +      buffers queued after the seek, as it operates independently
> > +      from OUTPUT queue.
> > +
> > +      -  The driver is allowed to and may return a number of remaining CAPTURE
> > +         buffers containing decoded frames from before the seek after the
> > +         seek sequence (STREAMOFF-STREAMON) is performed.
> > +
> > +      -  The driver is also allowed to and may not return all decoded frames
> > +         queued but not decode before the seek sequence was initiated.
> > +         E.g. for an OUTPUT queue sequence: QBUF(A), QBUF(B),
> > +         STREAMOFF(OUT), STREAMON(OUT), QBUF(G), QBUF(H), any of the
> > +         following results on the CAPTURE queue is allowed: {A’, B’, G’,
> > +         H’}, {A’, G’, H’}, {G’, H’}.
> > +
> > +Pause
> > +-----
> > +
> > +In order to pause, the client should just cease queuing buffers onto the
> > +OUTPUT queue. This is different from the general V4L2 API definition of
> > +pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue. Without
> > +source bitstream data, there is not data to process and the hardware
>
> s/not/no

Ack.

>
> > +remains idle. Conversely, using :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue
> > +indicates a seek, which 1) drops all buffers in flight and 2) after a
> > +subsequent :c:func:`VIDIOC_STREAMON` will look for and only continue from a
> > +resume point. This is usually undesirable for pause. The
> > +STREAMOFF-STREAMON sequence is intended for seeking.
> > +
> > +Similarly, CAPTURE queue should remain streaming as well, as the
> > +STREAMOFF-STREAMON sequence on it is intended solely for changing buffer
> > +sets
> > +
> > +Dynamic resolution change
> > +-------------------------
> > +
> > +When driver encounters a resolution change in the stream, the dynamic
> > +resolution change sequence is started.
> > +
> > +1.  On encountering a resolution change in the stream. The driver must
> > +    first process and decode all remaining buffers from before the
> > +    resolution change point.
> > +
> > +2.  After all buffers containing decoded frames from before the
> > +    resolution change point are ready to be dequeued on the
> > +    CAPTURE queue, the driver sends a ``V4L2_EVENT_SOURCE_CHANGE``
> > +    event for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> > +    The last buffer from before the change must be marked with
> > +    :c:type:`v4l2_buffer` ``flags`` flag ``V4L2_BUF_FLAG_LAST`` as in the flush
> > +    sequence.
>
> How does the driver ensure the last buffer gets that flag? You may not
> have had the new header bytes queued to the OUTPUT queue before the
> previous frame has been decoded and dequeued on the CAPTURE queue.
> Empty buffer with the flag set?

Yes, an empty buffer. I think that was explained by the way of the
general flush sequence later. We should state it here as well.

>
> > +    .. note::
> > +
> > +       Any attempts to dequeue more buffers beyond the buffer marked
> > +       with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
> > +       :c:func:`VIDIOC_DQBUF`.
> > +
> > +3.  After dequeuing all remaining buffers from the CAPTURE queue, the
> > +    client must call :c:func:`VIDIOC_STREAMOFF` on the CAPTURE queue. The
> > +    OUTPUT queue remains streaming (calling STREAMOFF on it would
> > +    trigger a seek).
> > +    Until STREAMOFF is called on the CAPTURE queue (acknowledging
> > +    the event), the driver operates as if the resolution hasn’t
> > +    changed yet, i.e. :c:func:`VIDIOC_G_FMT`, etc. return previous
> > +    resolution.
> > +
> > +4.  The client frees the buffers on the CAPTURE queue using
> > +    :c:func:`VIDIOC_REQBUFS`.
> > +
> > +    a. Required fields:
> > +
> > +       i.   count = 0
> > +
> > +       ii.  type = CAPTURE
> > +
> > +       iii. memory = as per spec
> > +
> > +5.  The client calls :c:func:`VIDIOC_G_FMT` for CAPTURE to get the new format
> > +    information.
> > +    This is identical to calling :c:func:`VIDIOC_G_FMT` after
> > +    ``V4L2_EVENT_SRC_CH_RESOLUTION`` in the initialization
> > +    sequence and should be handled similarly.
> > +
> > +    .. note::
> > +
> > +       It is allowed for the driver not to support the same
> > +       pixelformat as previously used (before the resolution change)
> > +       for the new resolution. The driver must select a default
> > +       supported pixelformat and return it from :c:func:`VIDIOC_G_FMT`, and
> > +       client must take note of it.
> > +
> > +6.  (optional) The client is allowed to enumerate available formats and
> > +    select a different one than currently chosen (returned via
> > +    :c:func:`VIDIOC_G_FMT)`. This is identical to a corresponding step in
> > +    the initialization sequence.
> > +
> > +7.  (optional) The client acquires visible resolution as in
> > +    initialization sequence.
> > +
> > +8.  (optional) The client acquires minimum number of buffers as in
> > +    initialization sequence.
> > +
> > +9.  The client allocates a new set of buffers for the CAPTURE queue via
> > +    :c:func:`VIDIOC_REQBUFS`. This is identical to a corresponding step in
> > +    the initialization sequence.
> > +
> > +10. The client resumes decoding by issuing :c:func:`VIDIOC_STREAMON` on the
> > +    CAPTURE queue.
> > +
> > +During the resolution change sequence, the OUTPUT queue must remain
> > +streaming. Calling :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue will initiate seek.
> > +
> > +The OUTPUT queue operates separately from the CAPTURE queue for the
> > +duration of the entire resolution change sequence. It is allowed (and
> > +recommended for best performance and simplcity) for the client to keep
> > +queuing/dequeuing buffers from/to OUTPUT queue even while processing
> > +this sequence.
> > +
> > +.. note::
> > +
> > +   It is also possible for this sequence to be triggered without
> > +   change in resolution if a different number of CAPTURE buffers is
> > +   required in order to continue decoding the stream.
> > +
> > +Flush
> > +-----
> > +
> > +Flush is the process of draining the CAPTURE queue of any remaining
> > +buffers. After the flush sequence is complete, the client has received
> > +all decoded frames for all OUTPUT buffers queued before the sequence was
> > +started.
> > +
> > +1. Begin flush by issuing :c:func:`VIDIOC_DECODER_CMD`.
> > +
> > +   a. Required fields:
> > +
> > +      i. cmd = ``V4L2_DEC_CMD_STOP``
> > +
> > +2. The driver must process and decode as normal all OUTPUT buffers
> > +   queued by the client before the :c:func:`VIDIOC_DECODER_CMD` was
> > +   issued.
> > +   Any operations triggered as a result of processing these
> > +   buffers (including the initialization and resolution change
> > +   sequences) must be processed as normal by both the driver and
> > +   the client before proceeding with the flush sequence.
> > +
> > +3. Once all OUTPUT buffers queued before ``V4L2_DEC_CMD_STOP`` are
> > +   processed:
> > +
> > +   a. If the CAPTURE queue is streaming, once all decoded frames (if
> > +      any) are ready to be dequeued on the CAPTURE queue, the
> > +      driver must send a ``V4L2_EVENT_EOS``. The driver must also
> > +      set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer` ``flags`` field on the
> > +      buffer on the CAPTURE queue containing the last frame (if
> > +      any) produced as a result of processing the OUTPUT buffers
> > +      queued before ``V4L2_DEC_CMD_STOP``. If no more frames are
> > +      left to be returned at the point of handling
> > +      ``V4L2_DEC_CMD_STOP``, the driver must return an empty buffer
> > +      (with :c:type:`v4l2_buffer` ``bytesused`` = 0) as the last buffer with
> > +      ``V4L2_BUF_FLAG_LAST`` set instead.
> > +      Any attempts to dequeue more buffers beyond the buffer
> > +      marked with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE
> > +      error from :c:func:`VIDIOC_DQBUF`.
>
> I guess that answers my earlier question on resolution change when
> there are no CAPTURE buffers left to be delivered.
>
> > +   b. If the CAPTURE queue is NOT streaming, no action is necessary for
> > +      CAPTURE queue and the driver must send a ``V4L2_EVENT_EOS``
> > +      immediately after all OUTPUT buffers in question have been
> > +      processed.
> > +
> > +4. To resume, client may issue ``V4L2_DEC_CMD_START``.
> > +
> > +End of stream
> > +-------------
> > +
> > +When an explicit end of stream is encountered by the driver in the
> > +stream, it must send a ``V4L2_EVENT_EOS`` to the client after all frames
> > +are decoded and ready to be dequeued on the CAPTURE queue, with the
> > +:c:type:`v4l2_buffer` ``flags`` set to ``V4L2_BUF_FLAG_LAST``. This behavior is
> > +identical to the flush sequence as if triggered by the client via
> > +``V4L2_DEC_CMD_STOP``.
> > +
> > +Commit points
> > +-------------
> > +
> > +Setting formats and allocating buffers triggers changes in the behavior
> > +of the driver.
> > +
> > +1. Setting format on OUTPUT queue may change the set of formats
> > +   supported/advertised on the CAPTURE queue. It also must change
> > +   the format currently selected on CAPTURE queue if it is not
> > +   supported by the newly selected OUTPUT format to a supported one.
> > +
> > +2. Enumerating formats on CAPTURE queue must only return CAPTURE formats
> > +   supported for the OUTPUT format currently set.
> > +
> > +3. Setting/changing format on CAPTURE queue does not change formats
> > +   available on OUTPUT queue. An attempt to set CAPTURE format that
> > +   is not supported for the currently selected OUTPUT format must
> > +   result in an error (-EINVAL) from :c:func:`VIDIOC_S_FMT`.
> > +
> > +4. Enumerating formats on OUTPUT queue always returns a full set of
> > +   supported formats, irrespective of the current format selected on
> > +   CAPTURE queue.
> > +
> > +5. After allocating buffers on the OUTPUT queue, it is not possible to
> > +   change format on it.
> > +
> > +To summarize, setting formats and allocation must always start with the
> > +OUTPUT queue and the OUTPUT queue is the master that governs the set of
> > +supported formats for the CAPTURE queue.
> > diff --git a/Documentation/media/uapi/v4l/v4l2.rst b/Documentation/media/uapi/v4l/v4l2.rst
> > index b89e5621ae69..563d5b861d1c 100644
> > --- a/Documentation/media/uapi/v4l/v4l2.rst
> > +++ b/Documentation/media/uapi/v4l/v4l2.rst
> > @@ -53,6 +53,10 @@ Authors, in alphabetical order:
> >
> >    - Original author of the V4L2 API and documentation.
> >
> > +- Figa, Tomasz <tfiga@chromium.org>
> > +
> > +  - Documented parts of the V4L2 (stateful) Codec Interface. Migrated from Google Docs to kernel documentation.
> > +
> >  - H Schimek, Michael <mschimek@gmx.at>
> >
> >    - Original author of the V4L2 API and documentation.
> > @@ -65,6 +69,10 @@ Authors, in alphabetical order:
> >
> >    - Designed and documented the multi-planar API.
> >
> > +- Osciak, Pawel <posciak@chromium.org>
> > +
> > +  - Documented the V4L2 (stateful) Codec Interface.
> > +
> >  - Palosaari, Antti <crope@iki.fi>
> >
> >    - SDR API.
> > @@ -85,7 +93,7 @@ Authors, in alphabetical order:
> >
> >    - Designed and documented the VIDIOC_LOG_STATUS ioctl, the extended control ioctls, major parts of the sliced VBI API, the MPEG encoder and decoder APIs and the DV Timings API.
> >
> > -**Copyright** |copy| 1999-2016: Bill Dirks, Michael H. Schimek, Hans Verkuil, Martin Rubli, Andy Walls, Muralidharan Karicheri, Mauro Carvalho Chehab, Pawel Osciak, Sakari Ailus & Antti Palosaari.
> > +**Copyright** |copy| 1999-2018: Bill Dirks, Michael H. Schimek, Hans Verkuil, Martin Rubli, Andy Walls, Muralidharan Karicheri, Mauro Carvalho Chehab, Pawel Osciak, Sakari Ailus & Antti Palosaari, Tomasz Figa.
> >
> >  Except when explicitly stated as GPL, programming examples within this
> >  part can be used and distributed without restrictions.
> > @@ -94,6 +102,10 @@ part can be used and distributed without restrictions.
> >  Revision History
> >  ****************
> >
> > +:revision: TBD / TBD (*tf*)
> > +
> > +Add specification of V4L2 Codec Interface UAPI.
> > +
> >  :revision: 4.10 / 2016-07-15 (*rr*)
> >
> >  Introduce HSV formats.
> > --
> > 2.17.1.1185.g55be947832-goog
>
> Related to an earlier comment, whilst the driver has to support
> multiple instances, there is no arbitration over the overall decode
> rate with regard real-time performance.
> I know our hardware is capable of 1080P60, but there's no easy way to
> stop someone trying to decode 2 1080P60 streams simultaneously. From a
> software perspective it'll do it, but not in real-time. I'd assume
> most other platforms will give the similar behaviour.
> Is it worth adding a note that real-time performance is not guaranteed
> should multiple instances be running simultaneously, or a comment made
> somewhere about expected performance? Or enforce it by knowing the max
> data rates and analysing the level of each stream (please no)?

This is a very interesting problem in general.

I believe we don't really do anything like the latter in Chromium and
if someone tries to play too many videos, they would just start
dropping frames. (Pawel, correct me if I'm wrong.) It's actually
exactly what would happen if one starts too many videos with software
decoder running on CPU (and possibly with less instances).

Best regards,
Tomasz
Philipp Zabel June 6, 2018, 10:44 a.m. UTC | #5
On Tue, 2018-06-05 at 22:42 +0900, Tomasz Figa wrote:
[...]
> > > +   a. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for coded formats
> > > +      must be maximums for given coded format for all supported raw
> > > +      formats.
> > 
> > I don't understand what maximums means in this context.
> > 
> > If I have a decoder that can decode from 16x16 up to 1920x1088, should
> > this return a continuous range from minimum frame size to maximum frame
> > size?
> 
> Looks like the wording here is a bit off. It should be as you say +/-
> alignment requirements, which can be specified by using
> v4l2_frmsize_stepwise. Hardware that supports only a fixed set of
> resolutions (if such exists), should use v4l2_frmsize_discrete.
> Basically this should follow the standard description of
> VIDIOC_ENUM_FRAMESIZES.

Should this contain coded sizes or visible sizes?

> > 
> > > +   b. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for raw formats must
> > > +      be maximums for given raw format for all supported coded
> > > +      formats.
> > 
> > Same here, this is unclear to me.
> 
> Should be as above, i.e. according to standard operation of
> VIDIOC_ENUM_FRAMESIZES.

How about just:

   a. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for coded formats
      must contain all possible (coded?) frame sizes for the given coded format
      for all supported raw formats.

   b. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for raw formats
      must contain all possible coded frame sizes for the given raw format
      for all supported encoded formats.

And then a note somewhere that explains that coded frame sizes are
usually visible frame size rounded up to macro block size, possibly a
link to the coded resolution glossary.

[...]
> Actually, when I think of it now, I wonder if we really should be
> setting resolution here for bitstream formats that don't include
> resolution, rather than on CAPTURE queue. Pawel, could you clarify
> what was the intention here?

Setting the resolution here makes it possible to start streaming,
allocate buffers on both queues etc. without relying on the hardware to
actually parse the headers. If we are given the right information, the
first source change event will just confirm the currently set
resolution.

[...]
> > What about devices that have a frame buffer registration step before
> > stream start? For coda I need to know all CAPTURE buffers before I can
> > start streaming, because there is no way to register them after
> > STREAMON. Do I have to split the driver internally to do streamoff and
> > restart when the capture queue is brought up?
> 
> Do you mean that the hardware requires registering framebuffers before
> the headers are parsed and resolution is detected? That sounds quite
> unusual.

I meant that, but I was mistaken. For coda that is just how the driver
currently works, but it is not required by the hardware.

> Other drivers would:
> 1) parse the header on STREAMON(OUTPUT),

coda has a SEQ_INIT command, which parses the headers, and a
SET_FRAME_BUF command that registers allocated (internal) buffers.
Both are currently done during streamon, but it should be possible to
split this up. SET_FRAME_BUF can be only issued once between SEQ_INIT
and SEQ_END, but it is a separate command.

> 2) report resolution to userspace,
> 3) have framebuffers allocated in REQBUFS(CAPTURE),
> 4) register framebuffers in STREAMON(CAPTURE).

coda has a peculiarity in that the registered frame buffers are internal
only, and another part of the codec (copy/rotator) or another part of
the SoC (VDOA) copies those frames into the CAPTURE buffers that don't
have to be registered at all in advance in a separate step. But it
should still be possible to do the internal buffer allocation and
registration in the right places.

[...]
> Should be the same. There was "+5. Single-plane API (see spec) and
> applicable structures may be used interchangeably with Multi-plane
> API, unless specified otherwise." mentioned at the beginning of the
> documentation, but I guess we could just make the description generic
> instead.

Yes, please. Especially when using this as a reference during driver
development, it would be very helpful to have all relevant information
in place or at least referenced, instead of having to read and memorize
the whole document linearly.

[...]
> > Isn't CROP supposed to be set on the OUTPUT queue only and COMPOSE on
> > the CAPTURE queue?
> 
> Why? Both CROP and COMPOSE can be used on any queue, if supported by
> given interface.
> 
> However, on codecs, since OUTPUT queue is a bitstream, I don't think
> selection makes sense there.
>
> > I would expect COMPOSE/COMPOSE_DEFAULT to be set to the visible
> > rectangle and COMPOSE_PADDED to be set to the rectangle that the
> > hardware actually overwrites.
> 
> Yes, that's a good point. I'd also say that CROP/CROP_DEFAULT should
> be set to the visible rectangle as well, to allow adding handling for
> cases when the hardware can actually do further cropping.

Should CROP_BOUNDS be set to visible rectangle or to the coded
rectangle? This is related the question to whether coded G/S_FMT should
handle coded sizes or visible sizes.

For video capture devices, the cropping bounds should represent those
pixels that can be sampled. If we can 'sample' the coded pixels beyond
the visible rectangle, should decoders behave the same?

I think Documentation/media/uapi/v4l/selection-api-004.rst is missing a
section about mem2mem devices and/or codecs to clarify this.

> > > +12. (optional) Get minimum number of buffers required for CAPTURE queue
> > > +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> > > +    more buffers than minimum required by hardware/format (see
> > > +    allocation).
> > > +
> > > +    a. Required fields:
> > > +
> > > +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
> > > +
> > > +    b. Return values: per spec.
> > > +
> > > +    c. Return fields:
> > > +
> > > +       i. value: minimum number of buffers required to decode the stream
> > > +          parsed in this initialization sequence.
> > > +
> > > +    .. note::
> > > +
> > > +       Note that the minimum number of buffers must be at least the
> > > +       number required to successfully decode the current stream.
> > > +       This may for example be the required DPB size for an H.264
> > > +       stream given the parsed stream configuration (resolution,
> > > +       level).
> > > +
> > > +13. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS` on the
> > > +    CAPTURE queue.
> > > +
> > > +    a. Required fields:
> > > +
> > > +       i.   count = n, where n > 0.
> > > +
> > > +       ii.  type = CAPTURE
> > > +
> > > +       iii. memory = as per spec
> > > +
> > > +    b. Return values: Per spec.
> > > +
> > > +    c. Return fields:
> > > +
> > > +       i. count: adjusted to allocated number of buffers.
> > > +
> > > +    d. The driver must adjust count to minimum of required number of
> > > +       destination buffers for given format and stream configuration
> > > +       and the count passed. The client must check this value after
> > > +       the ioctl returns to get the number of buffers allocated.
> > > +
> > > +    .. note::
> > > +
> > > +       Passing count = 1 is useful for letting the driver choose
> > > +       the minimum.
> > > +
> > > +    .. note::
> > > +
> > > +       To allocate more than minimum number of buffers (for pipeline
> > > +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE)`` to
> > > +       get minimum number of buffers required, and pass the obtained
> > > +       value plus the number of additional buffers needed in count
> > > +       to :c:func:`VIDIOC_REQBUFS`.
> > > +
> > > +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
> > > +
> > > +    a. Required fields: as per spec.
> > > +
> > > +    b. Return values: as per spec.
> > > +
> > > +Decoding
> > > +--------
> > > +
> > > +This state is reached after a successful initialization sequence. In
> > > +this state, client queues and dequeues buffers to both queues via
> > > +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, as per spec.
> > > +
> > > +Both queues operate independently. The client may queue and dequeue
> > > +buffers to queues in any order and at any rate, also at a rate different
> > > +for each queue. The client may queue buffers within the same queue in
> > > +any order (V4L2 index-wise). It is recommended for the client to operate
> > > +the queues independently for best performance.
> > > +
> > > +Source OUTPUT buffers must contain:
> > > +
> > > +-  H.264/AVC: one or more complete NALUs of an Annex B elementary
> > > +   stream; one buffer does not have to contain enough data to decode
> > > +   a frame;
> > 
> > What if the hardware only supports handling complete frames?
> 
> Pawel, could you help with this?
> 
> > 
> > > +-  VP8/VP9: one or more complete frames.
> > > +
> > > +No direct relationship between source and destination buffers and the
> > > +timing of buffers becoming available to dequeue should be assumed in the
> > > +Stream API. Specifically:
> > > +
> > > +-  a buffer queued to OUTPUT queue may result in no buffers being
> > > +   produced on the CAPTURE queue (e.g. if it does not contain
> > > +   encoded data, or if only metadata syntax structures are present
> > > +   in it), or one or more buffers produced on the CAPTURE queue (if
> > > +   the encoded data contained more than one frame, or if returning a
> > > +   decoded frame allowed the driver to return a frame that preceded
> > > +   it in decode, but succeeded it in display order)
> > > +
> > > +-  a buffer queued to OUTPUT may result in a buffer being produced on
> > > +   the CAPTURE queue later into decode process, and/or after
> > > +   processing further OUTPUT buffers, or be returned out of order,
> > > +   e.g. if display reordering is used
> > > +
> > > +-  buffers may become available on the CAPTURE queue without additional
> > > +   buffers queued to OUTPUT (e.g. during flush or EOS)
> > > +
> > > +Seek
> > > +----
> > > +
> > > +Seek is controlled by the OUTPUT queue, as it is the source of bitstream
> > > +data. CAPTURE queue remains unchanged/unaffected.
> > 
> > Does this mean that to achieve instantaneous seeks the driver has to
> > flush its CAPTURE queue internally when a seek is issued?
> 
> That's a good point. I'd say that we might actually want the userspace
> to restart the capture queue in such case. Pawel, do you have any
> opinion on this?
> 
> > 
> > > +
> > > +1. Stop the OUTPUT queue to begin the seek sequence via
> > > +   :c:func:`VIDIOC_STREAMOFF`.
> > > +
> > > +   a. Required fields:
> > > +
> > > +      i. type = OUTPUT
> > > +
> > > +   b. The driver must drop all the pending OUTPUT buffers and they are
> > > +      treated as returned to the client (as per spec).
> > 
> > What about pending CAPTURE buffers that the client may not yet have
> > dequeued?
> 
> Just as written here: nothing happens to them, since the "CAPTURE
> queue remains unchanged/unaffected". :)
> 
> > 
> > > +
> > > +2. Restart the OUTPUT queue via :c:func:`VIDIOC_STREAMON`
> > > +
> > > +   a. Required fields:
> > > +
> > > +      i. type = OUTPUT
> > > +
> > > +   b. The driver must be put in a state after seek and be ready to
> > > +      accept new source bitstream buffers.
> > > +
> > > +3. Start queuing buffers to OUTPUT queue containing stream data after
> > > +   the seek until a suitable resume point is found.
> > > +
> > > +   .. note::
> > > +
> > > +      There is no requirement to begin queuing stream
> > > +      starting exactly from a resume point (e.g. SPS or a keyframe).
> > > +      The driver must handle any data queued and must keep processing
> > > +      the queued buffers until it finds a suitable resume point.
> > > +      While looking for a resume point, the driver processes OUTPUT
> > > +      buffers and returns them to the client without producing any
> > > +      decoded frames.
> > > +
> > > +4. After a resume point is found, the driver will start returning
> > > +   CAPTURE buffers with decoded frames.
> > > +
> > > +   .. note::
> > > +
> > > +      There is no precise specification for CAPTURE queue of when it
> > > +      will start producing buffers containing decoded data from
> > > +      buffers queued after the seek, as it operates independently
> > > +      from OUTPUT queue.
> > > +
> > > +      -  The driver is allowed to and may return a number of remaining CAPTURE
> > > +         buffers containing decoded frames from before the seek after the
> > > +         seek sequence (STREAMOFF-STREAMON) is performed.
> > 
> > Oh, ok. That answers my last question above.
> > 
> > > +      -  The driver is also allowed to and may not return all decoded frames
> > > +         queued but not decode before the seek sequence was initiated.
> > > +         E.g. for an OUTPUT queue sequence: QBUF(A), QBUF(B),
> > > +         STREAMOFF(OUT), STREAMON(OUT), QBUF(G), QBUF(H), any of the
> > > +         following results on the CAPTURE queue is allowed: {A’, B’, G’,
> > > +         H’}, {A’, G’, H’}, {G’, H’}.
> > > +
> > > +Pause
> > > +-----
> > > +
> > > +In order to pause, the client should just cease queuing buffers onto the
> > > +OUTPUT queue. This is different from the general V4L2 API definition of
> > > +pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue. Without
> > > +source bitstream data, there is not data to process and the hardware
> > > +remains idle. Conversely, using :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue
> > > +indicates a seek, which 1) drops all buffers in flight and 2) after a
> > 
> > "... 1) drops all OUTPUT buffers in flight ... " ?
> 
> Yeah, although it's kind of inferred from the standard behavior of
> VIDIOC_STREAMOFF on given queue.
> 
> > 
> > > +subsequent :c:func:`VIDIOC_STREAMON` will look for and only continue from a
> > > +resume point. This is usually undesirable for pause. The
> > > +STREAMOFF-STREAMON sequence is intended for seeking.
> > > +
> > > +Similarly, CAPTURE queue should remain streaming as well, as the
> > > +STREAMOFF-STREAMON sequence on it is intended solely for changing buffer
> > > +sets
> > > +
> > > +Dynamic resolution change
> > > +-------------------------
> > > +
> > > +When driver encounters a resolution change in the stream, the dynamic
> > > +resolution change sequence is started.
> > 
> > Must all drivers support dynamic resolution change?
> 
> I'd say no, but I guess that would mean that the driver never
> encounters it, because hardware wouldn't report it.
> 
> I wonder would happen in such case, though. Obviously decoding of such
> stream couldn't continue without support in the driver.

GStreamer supports decoding of variable resolution streams without
driver support by just stopping and restarting streaming completely.

> > 
> > > +1.  On encountering a resolution change in the stream. The driver must
> > > +    first process and decode all remaining buffers from before the
> > > +    resolution change point.
> > > +
> > > +2.  After all buffers containing decoded frames from before the
> > > +    resolution change point are ready to be dequeued on the
> > > +    CAPTURE queue, the driver sends a ``V4L2_EVENT_SOURCE_CHANGE``
> > > +    event for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> > > +    The last buffer from before the change must be marked with
> > > +    :c:type:`v4l2_buffer` ``flags`` flag ``V4L2_BUF_FLAG_LAST`` as in the flush
> > > +    sequence.
> > > +
> > > +    .. note::
> > > +
> > > +       Any attempts to dequeue more buffers beyond the buffer marked
> > > +       with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
> > > +       :c:func:`VIDIOC_DQBUF`.
> > > +
> > > +3.  After dequeuing all remaining buffers from the CAPTURE queue, the
> > > +    client must call :c:func:`VIDIOC_STREAMOFF` on the CAPTURE queue. The
> > > +    OUTPUT queue remains streaming (calling STREAMOFF on it would
> > > +    trigger a seek).
> > > +    Until STREAMOFF is called on the CAPTURE queue (acknowledging
> > > +    the event), the driver operates as if the resolution hasn’t
> > > +    changed yet, i.e. :c:func:`VIDIOC_G_FMT`, etc. return previous
> > > +    resolution.
> > 
> > What about the OUTPUT queue resolution, does it change as well?
> 
> There shouldn't be resolution associated with OUTPUT queue, because
> pixel format is bitstream, not raw frame.

So the width and height field may just contain bogus values for coded
formats?

[...]
> > Ok. Is the same true about the contained colorimetry? What should happen
> > if the stream contains colorimetry information that differs from
> > S_FMT(OUT) colorimetry?
> 
> As I explained close to the top, IMHO we shouldn't be setting
> colorimetry on OUTPUT queue.

Does that mean that if userspace sets those fields though, we correct to
V4L2_COLORSPACE_DEFAULT and friends? Or just accept anything and ignore
it?

> > > +2. Enumerating formats on CAPTURE queue must only return CAPTURE formats
> > > +   supported for the OUTPUT format currently set.
> > > +
> > > +3. Setting/changing format on CAPTURE queue does not change formats
> > > +   available on OUTPUT queue. An attempt to set CAPTURE format that
> > > +   is not supported for the currently selected OUTPUT format must
> > > +   result in an error (-EINVAL) from :c:func:`VIDIOC_S_FMT`.
> > 
> > Is this limited to the pixel format? Surely setting out of bounds
> > width/height or incorrect colorimetry should not result in EINVAL but
> > still be corrected by the driver?
> 
> That doesn't sound right to me indeed. The driver should fix up
> S_FMT(CAPTURE), including pixel format or anything else. It must only
> not alter OUTPUT settings.

That's what I would have expected as well.

> > 
> > > +4. Enumerating formats on OUTPUT queue always returns a full set of
> > > +   supported formats, irrespective of the current format selected on
> > > +   CAPTURE queue.
> > > +
> > > +5. After allocating buffers on the OUTPUT queue, it is not possible to
> > > +   change format on it.
> > 
> > So even after source change events the OUTPUT queue still keeps the
> > initial OUTPUT format?
> 
> It would basically only have pixelformat (fourcc) assigned to it,
> since bitstream formats are not video frames, but just sequences of
> bytes. I don't think it makes sense to change e.g. from H264 to VP8
> during streaming.

What should the width and height format fields be set to then? Is there
a precedent for this? Capture devices that produce compressed output
usually set width and height to the visible resolution.

regards
Philipp
Alexandre Courbot June 6, 2018, 1:02 p.m. UTC | #6
On Tue, Jun 5, 2018 at 10:42 PM Tomasz Figa <tfiga@chromium.org> wrote:
>
> Hi Philipp,
>
> Thanks a lot for review.
>
> On Tue, Jun 5, 2018 at 8:41 PM Philipp Zabel <p.zabel@pengutronix.de> wrote:
> >
> > Hi Tomasz,
> >
> > On Tue, 2018-06-05 at 19:33 +0900, Tomasz Figa wrote:
> > > Due to complexity of the video decoding process, the V4L2 drivers of
> > > stateful decoder hardware require specific sequencies of V4L2 API calls
> > > to be followed. These include capability enumeration, initialization,
> > > decoding, seek, pause, dynamic resolution change, flush and end of
> > > stream.
> > >
> > > Specifics of the above have been discussed during Media Workshops at
> > > LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> > > Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> > > originated at those events was later implemented by the drivers we already
> > > have merged in mainline, such as s5p-mfc or mtk-vcodec.
> > >
> > > The only thing missing was the real specification included as a part of
> > > Linux Media documentation. Fix it now and document the decoder part of
> > > the Codec API.
> > >
> > > Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> > > ---
> > >  Documentation/media/uapi/v4l/dev-codec.rst | 771 +++++++++++++++++++++
> > >  Documentation/media/uapi/v4l/v4l2.rst      |  14 +-
> > >  2 files changed, 784 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> > > index c61e938bd8dc..0483b10c205e 100644
> > > --- a/Documentation/media/uapi/v4l/dev-codec.rst
> > > +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> > > @@ -34,3 +34,774 @@ the codec and reprogram it whenever another file handler gets access.
> > >  This is different from the usual video node behavior where the video
> > >  properties are global to the device (i.e. changing something through one
> > >  file handle is visible through another file handle).
> > > +
> > > +This interface is generally appropriate for hardware that does not
> > > +require additional software involvement to parse/partially decode/manage
> > > +the stream before/after processing in hardware.
> > > +
> > > +Input data to the Stream API are buffers containing unprocessed video
> > > +stream (Annex-B H264/H265 stream, raw VP8/9 stream) only. The driver is
> > > +expected not to require any additional information from the client to
> > > +process these buffers, and to return decoded frames on the CAPTURE queue
> > > +in display order.
> > > +
> > > +Performing software parsing, processing etc. of the stream in the driver
> > > +in order to support stream API is strongly discouraged. In such case use
> > > +of Stateless Codec Interface (in development) is preferred.
> > > +
> > > +Conventions and notation used in this document
> > > +==============================================
> > > +
> > > +1. The general V4L2 API rules apply if not specified in this document
> > > +   otherwise.
> > > +
> > > +2. The meaning of words “must”, “may”, “should”, etc. is as per RFC
> > > +   2119.
> > > +
> > > +3. All steps not marked “optional” are required.
> > > +
> > > +4. :c:func:`VIDIOC_G_EXT_CTRLS`, :c:func:`VIDIOC_S_EXT_CTRLS` may be used interchangeably with
> > > +   :c:func:`VIDIOC_G_CTRL`, :c:func:`VIDIOC_S_CTRL`, unless specified otherwise.
> > > +
> > > +5. Single-plane API (see spec) and applicable structures may be used
> > > +   interchangeably with Multi-plane API, unless specified otherwise.
> > > +
> > > +6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
> > > +   [0..2]: i = 0, 1, 2.
> > > +
> > > +7. For OUTPUT buffer A, A’ represents a buffer on the CAPTURE queue
> > > +   containing data (decoded or encoded frame/stream) that resulted
> > > +   from processing buffer A.
> > > +
> > > +Glossary
> > > +========
> > > +
> > > +CAPTURE
> > > +   the destination buffer queue, decoded frames for
> > > +   decoders, encoded bitstream for encoders;
> > > +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE`` or
> > > +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``
> > > +
> > > +client
> > > +   application client communicating with the driver
> > > +   implementing this API
> > > +
> > > +coded format
> > > +   encoded/compressed video bitstream format (e.g.
> > > +   H.264, VP8, etc.); see raw format; this is not equivalent to fourcc
> > > +   (V4L2 pixelformat), as each coded format may be supported by multiple
> > > +   fourccs (e.g. ``V4L2_PIX_FMT_H264``, ``V4L2_PIX_FMT_H264_SLICE``, etc.)
> > > +
> > > +coded height
> > > +   height for given coded resolution
> > > +
> > > +coded resolution
> > > +   stream resolution in pixels aligned to codec
> > > +   format and hardware requirements; see also visible resolution
> > > +
> > > +coded width
> > > +   width for given coded resolution
> > > +
> > > +decode order
> > > +   the order in which frames are decoded; may differ
> > > +   from display (output) order if frame reordering (B frames) is active in
> > > +   the stream; OUTPUT buffers must be queued in decode order; for frame
> > > +   API, CAPTURE buffers must be returned by the driver in decode order;
> > > +
> > > +display order
> > > +   the order in which frames must be displayed
> > > +   (outputted); for stream API, CAPTURE buffers must be returned by the
> > > +   driver in display order;
> > > +
> > > +EOS
> > > +   end of stream
> > > +
> > > +input height
> > > +   height in pixels for given input resolution
> > > +
> > > +input resolution
> > > +   resolution in pixels of source frames being input
> > > +   to the encoder and subject to further cropping to the bounds of visible
> > > +   resolution
> > > +
> > > +input width
> > > +   width in pixels for given input resolution
> > > +
> > > +OUTPUT
> > > +   the source buffer queue, encoded bitstream for
> > > +   decoders, raw frames for encoders; ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
> > > +   ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``
> > > +
> > > +raw format
> > > +   uncompressed format containing raw pixel data (e.g.
> > > +   YUV, RGB formats)
> > > +
> > > +resume point
> > > +   a point in the bitstream from which decoding may
> > > +   start/continue, without any previous state/data present, e.g.: a
> > > +   keyframe (VPX) or SPS/PPS/IDR sequence (H.264); a resume point is
> > > +   required to start decode of a new stream, or to resume decoding after a
> > > +   seek;
> > > +
> > > +source buffer
> > > +   buffers allocated for source queue
> > > +
> > > +source queue
> > > +   queue containing buffers used for source data, i.e.
> > > +
> > > +visible height
> > > +   height for given visible resolution
> > > +
> > > +visible resolution
> > > +   stream resolution of the visible picture, in
> > > +   pixels, to be used for display purposes; must be smaller or equal to
> > > +   coded resolution;
> > > +
> > > +visible width
> > > +   width for given visible resolution
> > > +
> > > +Decoder
> > > +=======
> > > +
> > > +Querying capabilities
> > > +---------------------
> > > +
> > > +1. To enumerate the set of coded formats supported by the driver, the
> > > +   client uses :c:func:`VIDIOC_ENUM_FMT` for OUTPUT. The driver must always
> > > +   return the full set of supported formats, irrespective of the
> > > +   format set on the CAPTURE queue.
> > > +
> > > +2. To enumerate the set of supported raw formats, the client uses
> > > +   :c:func:`VIDIOC_ENUM_FMT` for CAPTURE. The driver must return only the
> > > +   formats supported for the format currently set on the OUTPUT
> > > +   queue.
> > > +   In order to enumerate raw formats supported by a given coded
> > > +   format, the client must first set that coded format on the
> > > +   OUTPUT queue and then enumerate the CAPTURE queue.
> > > +
> > > +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> > > +   resolutions for a given format, passing its fourcc in
> > > +   :c:type:`v4l2_frmivalenum` ``pixel_format``.
> >
> > Is this a must-implement for drivers? coda currently doesn't implement
> > enum-framesizes.
>
> I'll leave this to Pawel. This might be one of the things that we
> didn't get to implement in upstream in the end.
>
> >
> > > +
> > > +   a. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for coded formats
> > > +      must be maximums for given coded format for all supported raw
> > > +      formats.
> >
> > I don't understand what maximums means in this context.
> >
> > If I have a decoder that can decode from 16x16 up to 1920x1088, should
> > this return a continuous range from minimum frame size to maximum frame
> > size?
>
> Looks like the wording here is a bit off. It should be as you say +/-
> alignment requirements, which can be specified by using
> v4l2_frmsize_stepwise. Hardware that supports only a fixed set of
> resolutions (if such exists), should use v4l2_frmsize_discrete.
> Basically this should follow the standard description of
> VIDIOC_ENUM_FRAMESIZES.
>
> >
> > > +   b. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for raw formats must
> > > +      be maximums for given raw format for all supported coded
> > > +      formats.
> >
> > Same here, this is unclear to me.
>
> Should be as above, i.e. according to standard operation of
> VIDIOC_ENUM_FRAMESIZES.
>
> >
> > > +   c. The client should derive the supported resolution for a
> > > +      combination of coded+raw format by calculating the
> > > +      intersection of resolutions returned from calls to
> > > +      :c:func:`VIDIOC_ENUM_FRAMESIZES` for the given coded and raw formats.
> > > +
> > > +4. Supported profiles and levels for given format, if applicable, may be
> > > +   queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> > > +
> > > +5. The client may use :c:func:`VIDIOC_ENUM_FRAMEINTERVALS` to enumerate maximum
> > > +   supported framerates by the driver/hardware for a given
> > > +   format+resolution combination.
> >
> > Same as above, is this must-implement for decoder drivers?
>
> Leaving this to Pawel.
>
> >
> > > +
> > > +Initialization sequence
> > > +-----------------------
> > > +
> > > +1. (optional) Enumerate supported OUTPUT formats and resolutions. See
> > > +   capability enumeration.
> > > +
> > > +2. Set a coded format on the source queue via :c:func:`VIDIOC_S_FMT`
> > > +
> > > +   a. Required fields:
> > > +
> > > +      i.   type = OUTPUT
> > > +
> > > +      ii.  fmt.pix_mp.pixelformat set to a coded format
> > > +
> > > +      iii. fmt.pix_mp.width, fmt.pix_mp.height only if cannot be
> > > +           parsed from the stream for the given coded format;
> > > +           ignored otherwise;
> >
> > When this is set, does this also update the format on the CAPTURE queue
> > (i.e. would G_FMT(CAP), S_FMT(OUT), G_FMT(CAP) potentially return
> > different CAP formats?) I think this should be explained here.
>
> Yes, it would. Agreed that it should be explicitly mentioned here.
>
> >
> > What about colorimetry, does setting colorimetry here overwrite
> > colorimetry information that may potentially be contained in the stream?
>
> I'd say that if the hardware/driver can't report such information,
> CAPTURE queue should report V4L2_COLORSPACE_DEFAULT and userspace
> should take care of determining the right one (or using a default one)
> on its own. This would eliminate the need to set anything on OUTPUT
> queue.
>
> Actually, when I think of it now, I wonder if we really should be
> setting resolution here for bitstream formats that don't include
> resolution, rather than on CAPTURE queue. Pawel, could you clarify
> what was the intention here?
>
> >
> > > +   b. Return values:
> > > +
> > > +      i.  EINVAL: unsupported format.
> > > +
> > > +      ii. Others: per spec
> > > +
> > > +   .. note::
> > > +
> > > +      The driver must not adjust pixelformat, so if
> > > +      ``V4L2_PIX_FMT_H264`` is passed but only
> > > +      ``V4L2_PIX_FMT_H264_SLICE`` is supported, S_FMT will return
> > > +      -EINVAL. If both are acceptable by client, calling S_FMT for
> > > +      the other after one gets rejected may be required (or use
> > > +      :c:func:`VIDIOC_ENUM_FMT` to discover beforehand, see Capability
> > > +      enumeration).
> > > +
> > > +3.  (optional) Get minimum number of buffers required for OUTPUT queue
> > > +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> > > +    more buffers than minimum required by hardware/format (see
> > > +    allocation).
> > > +
> > > +    a. Required fields:
> > > +
> > > +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> > > +
> > > +    b. Return values: per spec.
> > > +
> > > +    c. Return fields:
> > > +
> > > +       i. value: required number of OUTPUT buffers for the currently set
> > > +          format;
> > > +
> > > +4.  Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on OUTPUT
> > > +    queue.
> > > +
> > > +    a. Required fields:
> > > +
> > > +       i.   count = n, where n > 0.
> > > +
> > > +       ii.  type = OUTPUT
> > > +
> > > +       iii. memory = as per spec
> > > +
> > > +    b. Return values: Per spec.
> > > +
> > > +    c. Return fields:
> > > +
> > > +       i. count: adjusted to allocated number of buffers
> > > +
> > > +    d. The driver must adjust count to minimum of required number of
> > > +       source buffers for given format and count passed. The client
> > > +       must check this value after the ioctl returns to get the
> > > +       number of buffers allocated.
> > > +
> > > +    .. note::
> > > +
> > > +       Passing count = 1 is useful for letting the driver choose
> > > +       the minimum according to the selected format/hardware
> > > +       requirements.
> > > +
> > > +    .. note::
> > > +
> > > +       To allocate more than minimum number of buffers (for pipeline
> > > +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT)`` to
> > > +       get minimum number of buffers required by the driver/format,
> > > +       and pass the obtained value plus the number of additional
> > > +       buffers needed in count to :c:func:`VIDIOC_REQBUFS`.
> > > +
> > > +5.  Begin parsing the stream for stream metadata via :c:func:`VIDIOC_STREAMON` on
> > > +    OUTPUT queue. This step allows the driver to parse/decode
> > > +    initial stream metadata until enough information to allocate
> > > +    CAPTURE buffers is found. This is indicated by the driver by
> > > +    sending a ``V4L2_EVENT_SOURCE_CHANGE`` event, which the client
> > > +    must handle.
> > > +
> > > +    a. Required fields: as per spec.
> > > +
> > > +    b. Return values: as per spec.
> > > +
> > > +    .. note::
> > > +
> > > +       Calling :c:func:`VIDIOC_REQBUFS`, :c:func:`VIDIOC_STREAMON`
> > > +       or :c:func:`VIDIOC_G_FMT` on the CAPTURE queue at this time is not
> > > +       allowed and must return EINVAL.
> >
> > What about devices that have a frame buffer registration step before
> > stream start? For coda I need to know all CAPTURE buffers before I can
> > start streaming, because there is no way to register them after
> > STREAMON. Do I have to split the driver internally to do streamoff and
> > restart when the capture queue is brought up?
>
> Do you mean that the hardware requires registering framebuffers before
> the headers are parsed and resolution is detected? That sounds quite
> unusual.
>
> Other drivers would:
> 1) parse the header on STREAMON(OUTPUT),
> 2) report resolution to userspace,
> 3) have framebuffers allocated in REQBUFS(CAPTURE),
> 4) register framebuffers in STREAMON(CAPTURE).
>
> >
> > > +6.  This step only applies for coded formats that contain resolution
> > > +    information in the stream.
> > > +    Continue queuing/dequeuing bitstream buffers to/from the
> > > +    OUTPUT queue via :c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`. The driver
> > > +    must keep processing and returning each buffer to the client
> > > +    until required metadata to send a ``V4L2_EVENT_SOURCE_CHANGE``
> > > +    for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION`` is
> > > +    found. There is no requirement to pass enough data for this to
> > > +    occur in the first buffer and the driver must be able to
> > > +    process any number
> > > +
> > > +    a. Required fields: as per spec.
> > > +
> > > +    b. Return values: as per spec.
> > > +
> > > +    c. If data in a buffer that triggers the event is required to decode
> > > +       the first frame, the driver must not return it to the client,
> > > +       but must retain it for further decoding.
> > > +
> > > +    d. Until the resolution source event is sent to the client, calling
> > > +       :c:func:`VIDIOC_G_FMT` on the CAPTURE queue must return -EINVAL.
> > > +
> > > +    .. note::
> > > +
> > > +       No decoded frames are produced during this phase.
> > > +
> > > +7.  This step only applies for coded formats that contain resolution
> > > +    information in the stream.
> > > +    Receive and handle ``V4L2_EVENT_SOURCE_CHANGE`` from the driver
> > > +    via :c:func:`VIDIOC_DQEVENT`. The driver must send this event once
> > > +    enough data is obtained from the stream to allocate CAPTURE
> > > +    buffers and to begin producing decoded frames.
> > > +
> > > +    a. Required fields:
> > > +
> > > +       i. type = ``V4L2_EVENT_SOURCE_CHANGE``
> > > +
> > > +    b. Return values: as per spec.
> > > +
> > > +    c. The driver must return u.src_change.changes =
> > > +       ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> > > +
> > > +8.  This step only applies for coded formats that contain resolution
> > > +    information in the stream.
> > > +    Call :c:func:`VIDIOC_G_FMT` for CAPTURE queue to get format for the
> > > +    destination buffers parsed/decoded from the bitstream.
> > > +
> > > +    a. Required fields:
> > > +
> > > +       i. type = CAPTURE
> > > +
> > > +    b. Return values: as per spec.
> > > +
> > > +    c. Return fields:
> > > +
> > > +       i.   fmt.pix_mp.width, fmt.pix_mp.height: coded resolution
> > > +            for the decoded frames
> > > +
> > > +       ii.  fmt.pix_mp.pixelformat: default/required/preferred by
> > > +            driver pixelformat for decoded frames.
> >
> > This text is specific to multiplanar queues, what about singleplanar
> > drivers?
>
> Should be the same. There was "+5. Single-plane API (see spec) and
> applicable structures may be used interchangeably with Multi-plane
> API, unless specified otherwise." mentioned at the beginning of the
> documentation, but I guess we could just make the description generic
> instead.
>
> >
> > > +
> > > +       iii. num_planes: set to number of planes for pixelformat.
> > > +
> > > +       iv.  For each plane p = [0, num_planes-1]:
> > > +            plane_fmt[p].sizeimage, plane_fmt[p].bytesperline as
> > > +            per spec for coded resolution.
> > > +
> > > +    .. note::
> > > +
> > > +       Te value of pixelformat may be any pixel format supported,
> >
> > Typo, "The value ..."
>
> Thanks, will fix.
>
> >
> > > +       and must
> > > +       be supported for current stream, based on the information
> > > +       parsed from the stream and hardware capabilities. It is
> > > +       suggested that driver chooses the preferred/optimal format
> > > +       for given configuration. For example, a YUV format may be
> > > +       preferred over an RGB format, if additional conversion step
> > > +       would be required.
> > > +
> > > +9.  (optional) Enumerate CAPTURE formats via :c:func:`VIDIOC_ENUM_FMT` on
> > > +    CAPTURE queue.
> > > +    Once the stream information is parsed and known, the client
> > > +    may use this ioctl to discover which raw formats are supported
> > > +    for given stream and select on of them via :c:func:`VIDIOC_S_FMT`.
> > > +
> > > +    a. Fields/return values as per spec.
> > > +
> > > +    .. note::
> > > +
> > > +       The driver must return only formats supported for the
> > > +       current stream parsed in this initialization sequence, even
> > > +       if more formats may be supported by the driver in general.
> > > +       For example, a driver/hardware may support YUV and RGB
> > > +       formats for resolutions 1920x1088 and lower, but only YUV for
> > > +       higher resolutions (e.g. due to memory bandwidth
> > > +       limitations). After parsing a resolution of 1920x1088 or
> > > +       lower, :c:func:`VIDIOC_ENUM_FMT` may return a set of YUV and RGB
> > > +       pixelformats, but after parsing resolution higher than
> > > +       1920x1088, the driver must not return (unsupported for this
> > > +       resolution) RGB.
> > > +
> > > +       However, subsequent resolution change event
> > > +       triggered after discovering a resolution change within the
> > > +       same stream may switch the stream into a lower resolution;
> > > +       :c:func:`VIDIOC_ENUM_FMT` must return RGB formats again in that case.
> > > +
> > > +10.  (optional) Choose a different CAPTURE format than suggested via
> > > +     :c:func:`VIDIOC_S_FMT` on CAPTURE queue. It is possible for the client
> > > +     to choose a different format than selected/suggested by the
> > > +     driver in :c:func:`VIDIOC_G_FMT`.
> > > +
> > > +     a. Required fields:
> > > +
> > > +        i.  type = CAPTURE
> > > +
> > > +        ii. fmt.pix_mp.pixelformat set to a coded format
> > > +
> > > +     b. Return values:
> > > +
> > > +        i. EINVAL: unsupported format.
> > > +
> > > +     c. Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available formats
> > > +        after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to find
> > > +        out a set of allowed pixelformats for given configuration,
> > > +        but not required.
> >
> > What about colorimetry? Should this and TRY_FMT only allow colorimetry
> > that is parsed from the stream, if available, or that was set via
> > S_FMT(OUT) as an override?
>
> I'd say this depend on the hardware. If it can convert the video into
> desired color space, it could be allowed.
>
> >
> > > +11.  (optional) Acquire visible resolution via :c:func:`VIDIOC_G_SELECTION`.
> > > +
> > > +    a. Required fields:
> > > +
> > > +       i.  type = CAPTURE
> > > +
> > > +       ii. target = ``V4L2_SEL_TGT_CROP``
> > > +
> > > +    b. Return values: per spec.
> > > +
> > > +    c. Return fields
> > > +
> > > +       i. r.left, r.top, r.width, r.height: visible rectangle; this must
> > > +          fit within coded resolution returned from :c:func:`VIDIOC_G_FMT`.
> >
> > Isn't CROP supposed to be set on the OUTPUT queue only and COMPOSE on
> > the CAPTURE queue?
>
> Why? Both CROP and COMPOSE can be used on any queue, if supported by
> given interface.
>
> However, on codecs, since OUTPUT queue is a bitstream, I don't think
> selection makes sense there.
>
> > I would expect COMPOSE/COMPOSE_DEFAULT to be set to the visible
> > rectangle and COMPOSE_PADDED to be set to the rectangle that the
> > hardware actually overwrites.
>
> Yes, that's a good point. I'd also say that CROP/CROP_DEFAULT should
> be set to the visible rectangle as well, to allow adding handling for
> cases when the hardware can actually do further cropping.
>
> >
> > > +12. (optional) Get minimum number of buffers required for CAPTURE queue
> > > +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> > > +    more buffers than minimum required by hardware/format (see
> > > +    allocation).
> > > +
> > > +    a. Required fields:
> > > +
> > > +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
> > > +
> > > +    b. Return values: per spec.
> > > +
> > > +    c. Return fields:
> > > +
> > > +       i. value: minimum number of buffers required to decode the stream
> > > +          parsed in this initialization sequence.
> > > +
> > > +    .. note::
> > > +
> > > +       Note that the minimum number of buffers must be at least the
> > > +       number required to successfully decode the current stream.
> > > +       This may for example be the required DPB size for an H.264
> > > +       stream given the parsed stream configuration (resolution,
> > > +       level).
> > > +
> > > +13. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS` on the
> > > +    CAPTURE queue.
> > > +
> > > +    a. Required fields:
> > > +
> > > +       i.   count = n, where n > 0.
> > > +
> > > +       ii.  type = CAPTURE
> > > +
> > > +       iii. memory = as per spec
> > > +
> > > +    b. Return values: Per spec.
> > > +
> > > +    c. Return fields:
> > > +
> > > +       i. count: adjusted to allocated number of buffers.
> > > +
> > > +    d. The driver must adjust count to minimum of required number of
> > > +       destination buffers for given format and stream configuration
> > > +       and the count passed. The client must check this value after
> > > +       the ioctl returns to get the number of buffers allocated.
> > > +
> > > +    .. note::
> > > +
> > > +       Passing count = 1 is useful for letting the driver choose
> > > +       the minimum.
> > > +
> > > +    .. note::
> > > +
> > > +       To allocate more than minimum number of buffers (for pipeline
> > > +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE)`` to
> > > +       get minimum number of buffers required, and pass the obtained
> > > +       value plus the number of additional buffers needed in count
> > > +       to :c:func:`VIDIOC_REQBUFS`.
> > > +
> > > +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
> > > +
> > > +    a. Required fields: as per spec.
> > > +
> > > +    b. Return values: as per spec.
> > > +
> > > +Decoding
> > > +--------
> > > +
> > > +This state is reached after a successful initialization sequence. In
> > > +this state, client queues and dequeues buffers to both queues via
> > > +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, as per spec.
> > > +
> > > +Both queues operate independently. The client may queue and dequeue
> > > +buffers to queues in any order and at any rate, also at a rate different
> > > +for each queue. The client may queue buffers within the same queue in
> > > +any order (V4L2 index-wise). It is recommended for the client to operate
> > > +the queues independently for best performance.
> > > +
> > > +Source OUTPUT buffers must contain:
> > > +
> > > +-  H.264/AVC: one or more complete NALUs of an Annex B elementary
> > > +   stream; one buffer does not have to contain enough data to decode
> > > +   a frame;
> >
> > What if the hardware only supports handling complete frames?
>
> Pawel, could you help with this?

I had a discussion with Pawel about this very topic recently, since I
noticed hat not only do some drivers require whole frames, some also
cannot accept some non-VCL NALUs unless a VLC NALU is also included in
the same buffer.

We thought that this could probably be solved by having a read-only
control that explains (using a bitmask of NALU types maybe) how the
encoded input should be split before being sent to the decoder.

>
> >
> > > +-  VP8/VP9: one or more complete frames.
> > > +
> > > +No direct relationship between source and destination buffers and the
> > > +timing of buffers becoming available to dequeue should be assumed in the
> > > +Stream API. Specifically:
> > > +
> > > +-  a buffer queued to OUTPUT queue may result in no buffers being
> > > +   produced on the CAPTURE queue (e.g. if it does not contain
> > > +   encoded data, or if only metadata syntax structures are present
> > > +   in it), or one or more buffers produced on the CAPTURE queue (if
> > > +   the encoded data contained more than one frame, or if returning a
> > > +   decoded frame allowed the driver to return a frame that preceded
> > > +   it in decode, but succeeded it in display order)
> > > +
> > > +-  a buffer queued to OUTPUT may result in a buffer being produced on
> > > +   the CAPTURE queue later into decode process, and/or after
> > > +   processing further OUTPUT buffers, or be returned out of order,
> > > +   e.g. if display reordering is used
> > > +
> > > +-  buffers may become available on the CAPTURE queue without additional
> > > +   buffers queued to OUTPUT (e.g. during flush or EOS)
> > > +
> > > +Seek
> > > +----
> > > +
> > > +Seek is controlled by the OUTPUT queue, as it is the source of bitstream
> > > +data. CAPTURE queue remains unchanged/unaffected.
> >
> > Does this mean that to achieve instantaneous seeks the driver has to
> > flush its CAPTURE queue internally when a seek is issued?
>
> That's a good point. I'd say that we might actually want the userspace
> to restart the capture queue in such case. Pawel, do you have any
> opinion on this?
>
> >
> > > +
> > > +1. Stop the OUTPUT queue to begin the seek sequence via
> > > +   :c:func:`VIDIOC_STREAMOFF`.
> > > +
> > > +   a. Required fields:
> > > +
> > > +      i. type = OUTPUT
> > > +
> > > +   b. The driver must drop all the pending OUTPUT buffers and they are
> > > +      treated as returned to the client (as per spec).
> >
> > What about pending CAPTURE buffers that the client may not yet have
> > dequeued?
>
> Just as written here: nothing happens to them, since the "CAPTURE
> queue remains unchanged/unaffected". :)
>
> >
> > > +
> > > +2. Restart the OUTPUT queue via :c:func:`VIDIOC_STREAMON`
> > > +
> > > +   a. Required fields:
> > > +
> > > +      i. type = OUTPUT
> > > +
> > > +   b. The driver must be put in a state after seek and be ready to
> > > +      accept new source bitstream buffers.
> > > +
> > > +3. Start queuing buffers to OUTPUT queue containing stream data after
> > > +   the seek until a suitable resume point is found.
> > > +
> > > +   .. note::
> > > +
> > > +      There is no requirement to begin queuing stream
> > > +      starting exactly from a resume point (e.g. SPS or a keyframe).
> > > +      The driver must handle any data queued and must keep processing
> > > +      the queued buffers until it finds a suitable resume point.
> > > +      While looking for a resume point, the driver processes OUTPUT
> > > +      buffers and returns them to the client without producing any
> > > +      decoded frames.
> > > +
> > > +4. After a resume point is found, the driver will start returning
> > > +   CAPTURE buffers with decoded frames.
> > > +
> > > +   .. note::
> > > +
> > > +      There is no precise specification for CAPTURE queue of when it
> > > +      will start producing buffers containing decoded data from
> > > +      buffers queued after the seek, as it operates independently
> > > +      from OUTPUT queue.
> > > +
> > > +      -  The driver is allowed to and may return a number of remaining CAPTURE
> > > +         buffers containing decoded frames from before the seek after the
> > > +         seek sequence (STREAMOFF-STREAMON) is performed.
> >
> > Oh, ok. That answers my last question above.
> >
> > > +      -  The driver is also allowed to and may not return all decoded frames
> > > +         queued but not decode before the seek sequence was initiated.
> > > +         E.g. for an OUTPUT queue sequence: QBUF(A), QBUF(B),
> > > +         STREAMOFF(OUT), STREAMON(OUT), QBUF(G), QBUF(H), any of the
> > > +         following results on the CAPTURE queue is allowed: {A’, B’, G’,
> > > +         H’}, {A’, G’, H’}, {G’, H’}.
> > > +
> > > +Pause
> > > +-----
> > > +
> > > +In order to pause, the client should just cease queuing buffers onto the
> > > +OUTPUT queue. This is different from the general V4L2 API definition of
> > > +pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue. Without
> > > +source bitstream data, there is not data to process and the hardware
> > > +remains idle. Conversely, using :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue
> > > +indicates a seek, which 1) drops all buffers in flight and 2) after a
> >
> > "... 1) drops all OUTPUT buffers in flight ... " ?
>
> Yeah, although it's kind of inferred from the standard behavior of
> VIDIOC_STREAMOFF on given queue.
>
> >
> > > +subsequent :c:func:`VIDIOC_STREAMON` will look for and only continue from a
> > > +resume point. This is usually undesirable for pause. The
> > > +STREAMOFF-STREAMON sequence is intended for seeking.
> > > +
> > > +Similarly, CAPTURE queue should remain streaming as well, as the
> > > +STREAMOFF-STREAMON sequence on it is intended solely for changing buffer
> > > +sets
> > > +
> > > +Dynamic resolution change
> > > +-------------------------
> > > +
> > > +When driver encounters a resolution change in the stream, the dynamic
> > > +resolution change sequence is started.
> >
> > Must all drivers support dynamic resolution change?
>
> I'd say no, but I guess that would mean that the driver never
> encounters it, because hardware wouldn't report it.
>
> I wonder would happen in such case, though. Obviously decoding of such
> stream couldn't continue without support in the driver.
>
> >
> > > +1.  On encountering a resolution change in the stream. The driver must
> > > +    first process and decode all remaining buffers from before the
> > > +    resolution change point.
> > > +
> > > +2.  After all buffers containing decoded frames from before the
> > > +    resolution change point are ready to be dequeued on the
> > > +    CAPTURE queue, the driver sends a ``V4L2_EVENT_SOURCE_CHANGE``
> > > +    event for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> > > +    The last buffer from before the change must be marked with
> > > +    :c:type:`v4l2_buffer` ``flags`` flag ``V4L2_BUF_FLAG_LAST`` as in the flush
> > > +    sequence.
> > > +
> > > +    .. note::
> > > +
> > > +       Any attempts to dequeue more buffers beyond the buffer marked
> > > +       with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
> > > +       :c:func:`VIDIOC_DQBUF`.
> > > +
> > > +3.  After dequeuing all remaining buffers from the CAPTURE queue, the
> > > +    client must call :c:func:`VIDIOC_STREAMOFF` on the CAPTURE queue. The
> > > +    OUTPUT queue remains streaming (calling STREAMOFF on it would
> > > +    trigger a seek).
> > > +    Until STREAMOFF is called on the CAPTURE queue (acknowledging
> > > +    the event), the driver operates as if the resolution hasn’t
> > > +    changed yet, i.e. :c:func:`VIDIOC_G_FMT`, etc. return previous
> > > +    resolution.
> >
> > What about the OUTPUT queue resolution, does it change as well?
>
> There shouldn't be resolution associated with OUTPUT queue, because
> pixel format is bitstream, not raw frame.
>
> >
> > > +4.  The client frees the buffers on the CAPTURE queue using
> > > +    :c:func:`VIDIOC_REQBUFS`.
> > > +
> > > +    a. Required fields:
> > > +
> > > +       i.   count = 0
> > > +
> > > +       ii.  type = CAPTURE
> > > +
> > > +       iii. memory = as per spec
> > > +
> > > +5.  The client calls :c:func:`VIDIOC_G_FMT` for CAPTURE to get the new format
> > > +    information.
> > > +    This is identical to calling :c:func:`VIDIOC_G_FMT` after
> > > +    ``V4L2_EVENT_SRC_CH_RESOLUTION`` in the initialization
> > > +    sequence and should be handled similarly.
> > > +
> > > +    .. note::
> > > +
> > > +       It is allowed for the driver not to support the same
> > > +       pixelformat as previously used (before the resolution change)
> > > +       for the new resolution. The driver must select a default
> > > +       supported pixelformat and return it from :c:func:`VIDIOC_G_FMT`, and
> > > +       client must take note of it.
> > > +
> >
> > Can steps 4. and 5. be done in reverse order (i.e. first G_FMT and then
> > REQBUFS(0))?
> > If the client already has buffers allocated that are large enough to
> > contain decoded buffers in the new resolution, it might be preferable to
> > just keep them instead of reallocating.
>
> I think we had some thoughts on similar cases. Pawel, do you recall
> what was the problem?
>
> I agree though, that it would make sense to keep the buffers, if they
> are big enough.
>
> >
> > > +6.  (optional) The client is allowed to enumerate available formats and
> > > +    select a different one than currently chosen (returned via
> > > +    :c:func:`VIDIOC_G_FMT)`. This is identical to a corresponding step in
> > > +    the initialization sequence.
> > > +
> > > +7.  (optional) The client acquires visible resolution as in
> > > +    initialization sequence.
> > > +
> > > +8.  (optional) The client acquires minimum number of buffers as in
> > > +    initialization sequence.
> > > +
> > > +9.  The client allocates a new set of buffers for the CAPTURE queue via
> > > +    :c:func:`VIDIOC_REQBUFS`. This is identical to a corresponding step in
> > > +    the initialization sequence.
> > > +
> > > +10. The client resumes decoding by issuing :c:func:`VIDIOC_STREAMON` on the
> > > +    CAPTURE queue.
> > > +
> > > +During the resolution change sequence, the OUTPUT queue must remain
> > > +streaming. Calling :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue will initiate seek.
> > > +
> > > +The OUTPUT queue operates separately from the CAPTURE queue for the
> > > +duration of the entire resolution change sequence. It is allowed (and
> > > +recommended for best performance and simplcity) for the client to keep
> > > +queuing/dequeuing buffers from/to OUTPUT queue even while processing
> > > +this sequence.
> > > +
> > > +.. note::
> > > +
> > > +   It is also possible for this sequence to be triggered without
> > > +   change in resolution if a different number of CAPTURE buffers is
> > > +   required in order to continue decoding the stream.
> > > +
> > > +Flush
> > > +-----
> > > +
> > > +Flush is the process of draining the CAPTURE queue of any remaining
> > > +buffers. After the flush sequence is complete, the client has received
> > > +all decoded frames for all OUTPUT buffers queued before the sequence was
> > > +started.
> > > +
> > > +1. Begin flush by issuing :c:func:`VIDIOC_DECODER_CMD`.
> > > +
> > > +   a. Required fields:
> > > +
> > > +      i. cmd = ``V4L2_DEC_CMD_STOP``
> > > +
> > > +2. The driver must process and decode as normal all OUTPUT buffers
> > > +   queued by the client before the :c:func:`VIDIOC_DECODER_CMD` was
> > > +   issued.
> > > +   Any operations triggered as a result of processing these
> > > +   buffers (including the initialization and resolution change
> > > +   sequences) must be processed as normal by both the driver and
> > > +   the client before proceeding with the flush sequence.
> > > +
> > > +3. Once all OUTPUT buffers queued before ``V4L2_DEC_CMD_STOP`` are
> > > +   processed:
> > > +
> > > +   a. If the CAPTURE queue is streaming, once all decoded frames (if
> > > +      any) are ready to be dequeued on the CAPTURE queue, the
> > > +      driver must send a ``V4L2_EVENT_EOS``. The driver must also
> > > +      set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer` ``flags`` field on the
> > > +      buffer on the CAPTURE queue containing the last frame (if
> > > +      any) produced as a result of processing the OUTPUT buffers
> > > +      queued before ``V4L2_DEC_CMD_STOP``. If no more frames are
> > > +      left to be returned at the point of handling
> > > +      ``V4L2_DEC_CMD_STOP``, the driver must return an empty buffer
> > > +      (with :c:type:`v4l2_buffer` ``bytesused`` = 0) as the last buffer with
> > > +      ``V4L2_BUF_FLAG_LAST`` set instead.
> > > +      Any attempts to dequeue more buffers beyond the buffer
> > > +      marked with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE
> > > +      error from :c:func:`VIDIOC_DQBUF`.
> > > +
> > > +   b. If the CAPTURE queue is NOT streaming, no action is necessary for
> > > +      CAPTURE queue and the driver must send a ``V4L2_EVENT_EOS``
> > > +      immediately after all OUTPUT buffers in question have been
> > > +      processed.
> > > +
> > > +4. To resume, client may issue ``V4L2_DEC_CMD_START``.
> > > +
> > > +End of stream
> > > +-------------
> > > +
> > > +When an explicit end of stream is encountered by the driver in the
> > > +stream, it must send a ``V4L2_EVENT_EOS`` to the client after all frames
> > > +are decoded and ready to be dequeued on the CAPTURE queue, with the
> > > +:c:type:`v4l2_buffer` ``flags`` set to ``V4L2_BUF_FLAG_LAST``. This behavior is
> > > +identical to the flush sequence as if triggered by the client via
> > > +``V4L2_DEC_CMD_STOP``.
> > > +
> > > +Commit points
> > > +-------------
> > > +
> > > +Setting formats and allocating buffers triggers changes in the behavior
> > > +of the driver.
> > > +
> > > +1. Setting format on OUTPUT queue may change the set of formats
> > > +   supported/advertised on the CAPTURE queue. It also must change
> > > +   the format currently selected on CAPTURE queue if it is not
> > > +   supported by the newly selected OUTPUT format to a supported one.
> >
> > Ok. Is the same true about the contained colorimetry? What should happen
> > if the stream contains colorimetry information that differs from
> > S_FMT(OUT) colorimetry?
>
> As I explained close to the top, IMHO we shouldn't be setting
> colorimetry on OUTPUT queue.
>
> >
> > > +2. Enumerating formats on CAPTURE queue must only return CAPTURE formats
> > > +   supported for the OUTPUT format currently set.
> > > +
> > > +3. Setting/changing format on CAPTURE queue does not change formats
> > > +   available on OUTPUT queue. An attempt to set CAPTURE format that
> > > +   is not supported for the currently selected OUTPUT format must
> > > +   result in an error (-EINVAL) from :c:func:`VIDIOC_S_FMT`.
> >
> > Is this limited to the pixel format? Surely setting out of bounds
> > width/height or incorrect colorimetry should not result in EINVAL but
> > still be corrected by the driver?
>
> That doesn't sound right to me indeed. The driver should fix up
> S_FMT(CAPTURE), including pixel format or anything else. It must only
> not alter OUTPUT settings.
>
> >
> > > +4. Enumerating formats on OUTPUT queue always returns a full set of
> > > +   supported formats, irrespective of the current format selected on
> > > +   CAPTURE queue.
> > > +
> > > +5. After allocating buffers on the OUTPUT queue, it is not possible to
> > > +   change format on it.
> >
> > So even after source change events the OUTPUT queue still keeps the
> > initial OUTPUT format?
>
> It would basically only have pixelformat (fourcc) assigned to it,
> since bitstream formats are not video frames, but just sequences of
> bytes. I don't think it makes sense to change e.g. from H264 to VP8
> during streaming.
>
> Best regards,
> Tomasz
Alexandre Courbot June 6, 2018, 1:13 p.m. UTC | #7
On Wed, Jun 6, 2018 at 6:04 PM Tomasz Figa <tfiga@chromium.org> wrote:
>
> Hi Dave,
>
> Thanks for review! Please see my replies inline.
>
> On Tue, Jun 5, 2018 at 10:10 PM Dave Stevenson
> <dave.stevenson@raspberrypi.org> wrote:
> >
> > Hi Tomasz.
> >
> > Thanks for formalising this.
> > I'm working on a stateful V4L2 codec driver on the Raspberry Pi and
> > was having to deduce various implementation details from other
> > drivers. I know how much we all tend to hate having to write
> > documentation, but it is useful to have.
>
> Agreed. Piles of other work showing up out of nowhere don't help either. :(
>
> A lot of credits go to Pawel, who wrote down most of details discussed
> earlier into a document that we used internally to implement Chrome OS
> video stack and drivers. He unfortunately got flooded with loads of
> other work and ran out of time to finalize it and produce something
> usable as kernel documentation (time was needed especially in the old
> DocBook xml days).
>
> >
> > On 5 June 2018 at 11:33, Tomasz Figa <tfiga@chromium.org> wrote:
> > > Due to complexity of the video decoding process, the V4L2 drivers of
> > > stateful decoder hardware require specific sequencies of V4L2 API calls
> > > to be followed. These include capability enumeration, initialization,
> > > decoding, seek, pause, dynamic resolution change, flush and end of
> > > stream.
> > >
> > > Specifics of the above have been discussed during Media Workshops at
> > > LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> > > Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> > > originated at those events was later implemented by the drivers we already
> > > have merged in mainline, such as s5p-mfc or mtk-vcodec.
> > >
> > > The only thing missing was the real specification included as a part of
> > > Linux Media documentation. Fix it now and document the decoder part of
> > > the Codec API.
> > >
> > > Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> > > ---
> > >  Documentation/media/uapi/v4l/dev-codec.rst | 771 +++++++++++++++++++++
> > >  Documentation/media/uapi/v4l/v4l2.rst      |  14 +-
> > >  2 files changed, 784 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> > > index c61e938bd8dc..0483b10c205e 100644
> > > --- a/Documentation/media/uapi/v4l/dev-codec.rst
> > > +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> > > @@ -34,3 +34,774 @@ the codec and reprogram it whenever another file handler gets access.
> > >  This is different from the usual video node behavior where the video
> > >  properties are global to the device (i.e. changing something through one
> > >  file handle is visible through another file handle).
> >
> > I know this isn't part of the changes, but raises a question in
> > v4l2-compliance (so probably one for Hans).
> > testUnlimitedOpens tries opening the device 100 times. On a normal
> > device this isn't a significant overhead, but when you're allocating
> > resources on a per instance basis it quickly adds up.
> > Internally I have state that has a limit of 64 codec instances (either
> > encode or decode), so either I allocate at start_streaming and fail on
> > the 65th one, or I fail on open. I generally take the view that
> > failing early is a good thing.
> > Opinions? Is 100 instances of an M2M device really sensible?
>
> I don't think we can guarantee opening an arbitrary number of
> instances. To add to your point about resource usage, this is
> something that can be limited already on hardware or firmware level.
> Another aspect is that the hardware is often rated to decode N streams
> at resolution X by Y at Z fps, so it might not even make practical
> sense to use it to decode M > N streams.
>
> >
> > > +This interface is generally appropriate for hardware that does not
> > > +require additional software involvement to parse/partially decode/manage
> > > +the stream before/after processing in hardware.
> > > +
> > > +Input data to the Stream API are buffers containing unprocessed video
> > > +stream (Annex-B H264/H265 stream, raw VP8/9 stream) only. The driver is
> > > +expected not to require any additional information from the client to
> > > +process these buffers, and to return decoded frames on the CAPTURE queue
> > > +in display order.
> >
> > This intersects with the question I asked on the list back in April
> > but got no reply [1].
> > Is there a requirement or expectation for the encoded data to be
> > framed as a single encoded frame per buffer, or is feeding in full
> > buffer sized chunks from a ES valid? It's not stated for the
> > description of V4L2_PIX_FMT_H264 etc either.
> > If not framed then anything assuming one-in one-out fails badly, but
> > it's likely to fail anyway if the stream has reference frames.
>
> I believe we agreed on the data to be framed. The details are
> explained in "Decoding" session, but I guess it could actually belong
> to the definition of each specific pixel format.
>
> >
> > This description is also exclusive to video decode, whereas the top
> > section states "A V4L2 codec can compress, decompress, transform, or
> > otherwise convert video data". Should it be in the decoder section
> > below?
>
> Yeah, looks like it should be moved indeed.
>
> >
> > Have I missed a statement of what the Stream API is and how it differs
> > from any other API?
>
> This is a leftover that I should have removed, since this document
> continues to call this interface "Codec Interface".
>
> The other API is the "Stateless Codec Interface" mentioned below. As
> opposed to the regular (stateful) Codec Interface, it would target the
> hardware that do not store any decoding state for its own use, but
> rather expects the software to provide necessary data for each chunk
> of framed bitstream, such as headers parsed into predefined structures
> (as per codec standard) or reference frame lists. With stateless API,
> userspace would have to explicitly manage which buffers are used as
> reference frames, reordering to display order and so on. It's a WiP
> and is partially blocked by Request API, since it needs extra data to
> be given in a per-buffer manner.
>
> >
> > [1] https://www.spinics.net/lists/linux-media/msg133102.html
> >
> > > +Performing software parsing, processing etc. of the stream in the driver
> > > +in order to support stream API is strongly discouraged. In such case use
> > > +of Stateless Codec Interface (in development) is preferred.
> > > +
> > > +Conventions and notation used in this document
> > > +==============================================
> > > +
> > > +1. The general V4L2 API rules apply if not specified in this document
> > > +   otherwise.
> > > +
> > > +2. The meaning of words “must”, “may”, “should”, etc. is as per RFC
> > > +   2119.
> > > +
> > > +3. All steps not marked “optional” are required.
> > > +
> > > +4. :c:func:`VIDIOC_G_EXT_CTRLS`, :c:func:`VIDIOC_S_EXT_CTRLS` may be used interchangeably with
> > > +   :c:func:`VIDIOC_G_CTRL`, :c:func:`VIDIOC_S_CTRL`, unless specified otherwise.
> > > +
> > > +5. Single-plane API (see spec) and applicable structures may be used
> > > +   interchangeably with Multi-plane API, unless specified otherwise.
> > > +
> > > +6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
> > > +   [0..2]: i = 0, 1, 2.
> > > +
> > > +7. For OUTPUT buffer A, A’ represents a buffer on the CAPTURE queue
> > > +   containing data (decoded or encoded frame/stream) that resulted
> > > +   from processing buffer A.
> > > +
> > > +Glossary
> > > +========
> > > +
> > > +CAPTURE
> > > +   the destination buffer queue, decoded frames for
> > > +   decoders, encoded bitstream for encoders;
> > > +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE`` or
> > > +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``
> > > +
> > > +client
> > > +   application client communicating with the driver
> > > +   implementing this API
> > > +
> > > +coded format
> > > +   encoded/compressed video bitstream format (e.g.
> > > +   H.264, VP8, etc.); see raw format; this is not equivalent to fourcc
> > > +   (V4L2 pixelformat), as each coded format may be supported by multiple
> > > +   fourccs (e.g. ``V4L2_PIX_FMT_H264``, ``V4L2_PIX_FMT_H264_SLICE``, etc.)
> > > +
> > > +coded height
> > > +   height for given coded resolution
> > > +
> > > +coded resolution
> > > +   stream resolution in pixels aligned to codec
> > > +   format and hardware requirements; see also visible resolution
> > > +
> > > +coded width
> > > +   width for given coded resolution
> > > +
> > > +decode order
> > > +   the order in which frames are decoded; may differ
> > > +   from display (output) order if frame reordering (B frames) is active in
> > > +   the stream; OUTPUT buffers must be queued in decode order; for frame
> > > +   API, CAPTURE buffers must be returned by the driver in decode order;
> > > +
> > > +display order
> > > +   the order in which frames must be displayed
> > > +   (outputted); for stream API, CAPTURE buffers must be returned by the
> > > +   driver in display order;
> > > +
> > > +EOS
> > > +   end of stream
> > > +
> > > +input height
> > > +   height in pixels for given input resolution
> > > +
> > > +input resolution
> > > +   resolution in pixels of source frames being input
> > > +   to the encoder and subject to further cropping to the bounds of visible
> > > +   resolution
> > > +
> > > +input width
> > > +   width in pixels for given input resolution
> > > +
> > > +OUTPUT
> > > +   the source buffer queue, encoded bitstream for
> > > +   decoders, raw frames for encoders; ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
> > > +   ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``
> > > +
> > > +raw format
> > > +   uncompressed format containing raw pixel data (e.g.
> > > +   YUV, RGB formats)
> > > +
> > > +resume point
> > > +   a point in the bitstream from which decoding may
> > > +   start/continue, without any previous state/data present, e.g.: a
> > > +   keyframe (VPX) or SPS/PPS/IDR sequence (H.264); a resume point is
> > > +   required to start decode of a new stream, or to resume decoding after a
> > > +   seek;
> > > +
> > > +source buffer
> > > +   buffers allocated for source queue
> > > +
> > > +source queue
> > > +   queue containing buffers used for source data, i.e.
> > > +
> > > +visible height
> > > +   height for given visible resolution
> > > +
> > > +visible resolution
> > > +   stream resolution of the visible picture, in
> > > +   pixels, to be used for display purposes; must be smaller or equal to
> > > +   coded resolution;
> > > +
> > > +visible width
> > > +   width for given visible resolution
> > > +
> > > +Decoder
> > > +=======
> > > +
> > > +Querying capabilities
> > > +---------------------
> > > +
> > > +1. To enumerate the set of coded formats supported by the driver, the
> > > +   client uses :c:func:`VIDIOC_ENUM_FMT` for OUTPUT. The driver must always
> > > +   return the full set of supported formats, irrespective of the
> > > +   format set on the CAPTURE queue.
> > > +
> > > +2. To enumerate the set of supported raw formats, the client uses
> > > +   :c:func:`VIDIOC_ENUM_FMT` for CAPTURE. The driver must return only the
> > > +   formats supported for the format currently set on the OUTPUT
> > > +   queue.
> > > +   In order to enumerate raw formats supported by a given coded
> > > +   format, the client must first set that coded format on the
> > > +   OUTPUT queue and then enumerate the CAPTURE queue.
> > > +
> > > +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> > > +   resolutions for a given format, passing its fourcc in
> > > +   :c:type:`v4l2_frmivalenum` ``pixel_format``.
> > > +
> > > +   a. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for coded formats
> > > +      must be maximums for given coded format for all supported raw
> > > +      formats.
> > > +
> > > +   b. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for raw formats must
> > > +      be maximums for given raw format for all supported coded
> > > +      formats.
> >
> > So in both these cases you expect index=0 to return a response with
> > the type V4L2_FRMSIZE_TYPE_DISCRETE, and the maximum resolution?
> > -EINVAL on any other index value?
> > And I assume you mean maximum coded resolution, not visible resolution.
> > Or is V4L2_FRMSIZE_TYPE_STEPWISE more appropriate? In which case the
> > minimum is presumably a single macroblock, max is the max coded
> > resolution, and step size is the macroblock size, at least on the
> > CAPTURE side.
>
> Codec size seems to make the most sense here, since that's what
> corresponds to the amount of data the decoder needs to process. Let's
> have it stated more explicitly.
>
> My understanding is that VIDIOC_ENUM_FRAMESIZES maintains its regular
> semantics here and which type of range is used would depend on the
> hardware capabilities. This actually matches to what we have
> implemented in Chromium video stack [1]. Let's state it more
> explicitly as well.
>
> [1] https://cs.chromium.org/chromium/src/media/gpu/v4l2/v4l2_device.cc?q=VIDIOC_ENUM_FRAMESIZES&sq=package:chromium&g=0&l=279
>
> >
> > > +   c. The client should derive the supported resolution for a
> > > +      combination of coded+raw format by calculating the
> > > +      intersection of resolutions returned from calls to
> > > +      :c:func:`VIDIOC_ENUM_FRAMESIZES` for the given coded and raw formats.
> > > +
> > > +4. Supported profiles and levels for given format, if applicable, may be
> > > +   queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> > > +
> > > +5. The client may use :c:func:`VIDIOC_ENUM_FRAMEINTERVALS` to enumerate maximum
> > > +   supported framerates by the driver/hardware for a given
> > > +   format+resolution combination.
> > > +
> > > +Initialization sequence
> > > +-----------------------
> > > +
> > > +1. (optional) Enumerate supported OUTPUT formats and resolutions. See
> > > +   capability enumeration.
> > > +
> > > +2. Set a coded format on the source queue via :c:func:`VIDIOC_S_FMT`
> > > +
> > > +   a. Required fields:
> > > +
> > > +      i.   type = OUTPUT
> > > +
> > > +      ii.  fmt.pix_mp.pixelformat set to a coded format
> > > +
> > > +      iii. fmt.pix_mp.width, fmt.pix_mp.height only if cannot be
> > > +           parsed from the stream for the given coded format;
> > > +           ignored otherwise;
> > > +
> > > +   b. Return values:
> > > +
> > > +      i.  EINVAL: unsupported format.
> > > +
> > > +      ii. Others: per spec
> > > +
> > > +   .. note::
> > > +
> > > +      The driver must not adjust pixelformat, so if
> > > +      ``V4L2_PIX_FMT_H264`` is passed but only
> > > +      ``V4L2_PIX_FMT_H264_SLICE`` is supported, S_FMT will return
> > > +      -EINVAL. If both are acceptable by client, calling S_FMT for
> > > +      the other after one gets rejected may be required (or use
> > > +      :c:func:`VIDIOC_ENUM_FMT` to discover beforehand, see Capability
> > > +      enumeration).
> >
> > I can't find V4L2_PIX_FMT_H264_SLICE in mainline. From trying to build
> > Chromium I believe it's a Rockchip special. Is it being upstreamed?
>
> This is a part of the stateless Codec Interface being in development.
> We used to call it "Slice API" internally and so the name. It is not
> specific to Rockchip, but rather the whole class of stateless codecs,
> as I explained by the way of your another comment.
>
> Any mention of it should be removed from the document for now.
>
> > Or use V4L2_PIX_FMT_H264 vs V4L2_PIX_FMT_H264_NO_SC as the example?
> > (I've just noticed I missed an instance of this further up as well).
>
> Yeah, sounds like it would be a better example.
>
> >
> > > +3.  (optional) Get minimum number of buffers required for OUTPUT queue
> > > +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> > > +    more buffers than minimum required by hardware/format (see
> > > +    allocation).
> > > +
> > > +    a. Required fields:
> > > +
> > > +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> > > +
> > > +    b. Return values: per spec.
> > > +
> > > +    c. Return fields:
> > > +
> > > +       i. value: required number of OUTPUT buffers for the currently set
> > > +          format;
> > > +
> > > +4.  Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on OUTPUT
> > > +    queue.
> > > +
> > > +    a. Required fields:
> > > +
> > > +       i.   count = n, where n > 0.
> > > +
> > > +       ii.  type = OUTPUT
> > > +
> > > +       iii. memory = as per spec
> > > +
> > > +    b. Return values: Per spec.
> > > +
> > > +    c. Return fields:
> > > +
> > > +       i. count: adjusted to allocated number of buffers
> > > +
> > > +    d. The driver must adjust count to minimum of required number of
> > > +       source buffers for given format and count passed. The client
> > > +       must check this value after the ioctl returns to get the
> > > +       number of buffers allocated.
> > > +
> > > +    .. note::
> > > +
> > > +       Passing count = 1 is useful for letting the driver choose
> > > +       the minimum according to the selected format/hardware
> > > +       requirements.
> > > +
> > > +    .. note::
> > > +
> > > +       To allocate more than minimum number of buffers (for pipeline
> > > +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT)`` to
> > > +       get minimum number of buffers required by the driver/format,
> > > +       and pass the obtained value plus the number of additional
> > > +       buffers needed in count to :c:func:`VIDIOC_REQBUFS`.
> > > +
> > > +5.  Begin parsing the stream for stream metadata via :c:func:`VIDIOC_STREAMON` on
> > > +    OUTPUT queue. This step allows the driver to parse/decode
> > > +    initial stream metadata until enough information to allocate
> > > +    CAPTURE buffers is found. This is indicated by the driver by
> > > +    sending a ``V4L2_EVENT_SOURCE_CHANGE`` event, which the client
> > > +    must handle.
> > > +
> > > +    a. Required fields: as per spec.
> > > +
> > > +    b. Return values: as per spec.
> > > +
> > > +    .. note::
> > > +
> > > +       Calling :c:func:`VIDIOC_REQBUFS`, :c:func:`VIDIOC_STREAMON`
> > > +       or :c:func:`VIDIOC_G_FMT` on the CAPTURE queue at this time is not
> > > +       allowed and must return EINVAL.
> >
> > I think you've just broken FFMpeg and Gstreamer with that statement.
> >
> > Gstreamer certainly doesn't subscribe to V4L2_EVENT_SOURCE_CHANGE but
> > has already parsed the stream and set the output format to the correct
> > resolution via S_FMT. IIRC it expects the driver to copy that across
> > from output to capture which was an interesting niggle to find.
> > FFMpeg does subscribe to V4L2_EVENT_SOURCE_CHANGE, although it seems
> > to currently have a bug around coded resolution != visible resolution
> > when it gets the event.
> >
> > One has to assume that these have been working quite happily against
> > various hardware platforms, so it seems a little unfair to just break
> > them.
>
> That's certainly not what existing drivers do and the examples would be:
>
> - s5p-mfc (the first codec driver in upstream) and mtk-vcodec (merged
> quite recently)
>     It just ignores width/height and OUTPUT queue
>       https://elixir.bootlin.com/linux/latest/source/drivers/media/platform/s5p-mfc/s5p_mfc_dec.c#L443
>     and reports what the hardware parses from bitstream on CAPTURE:
>       https://elixir.bootlin.com/linux/latest/source/drivers/media/platform/s5p-mfc/s5p_mfc_dec.c#L352
>
> - mtk-vcodec (merged quite recently):
>     It indeed accepts whatever is set on OUTPUT as some kind of defaults,
>       https://elixir.bootlin.com/linux/latest/source/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c#L856
>     but those are overridden as soon as the headers are parsed
>       https://elixir.bootlin.com/linux/latest/source/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c#L989
>
> However, the above probably doesn't prevent Gstreamer from working,
> because both drivers would allow REQBUFS(CAPTURE) before the parsing
> is done and luckily the resolution would match later after parsing.
>
> > So I guess my question is what is the reasoning for rejecting these
> > calls? If you know the resolution ahead of time, allocate buffers, and
> > start CAPTURE streaming before the event then should you be wrong
> > you're just going through the dynamic resolution change path described
> > later. If you're correct then you've saved some setup time. It also
> > avoids having to have a special startup case in the driver.
>
> We might need Pawel or Hans to comment on this, as I believe it has
> been decided to be like this in earlier Media Workshops.

I also don't see any hard reason to not let user-space configure the
CAPTURE queue itself if it has parsed the stream and decided to go
that way. I think of it also as a guaranteed to work, fallback
solution for devices that may not support the source change event - do
we know for sure that *all* stateful devices support this?

Supporting both flows would complicate the initialization protocol
quite a bit. There is a rather large and complex state machine that
all drivers need to maintain here. Maybe we could come with a "codec
framework" that would take care of this, with specific callbacks to be
implemented by drivers à la M2M?

>
> I personally don't see what would go wrong if we allow that and handle
> a fallback using the dynamic resolution change flow. Maybe except the
> need to rework the s5p-mfc driver.
>
> >
> > > +6.  This step only applies for coded formats that contain resolution
> > > +    information in the stream.
> > > +    Continue queuing/dequeuing bitstream buffers to/from the
> > > +    OUTPUT queue via :c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`. The driver
> > > +    must keep processing and returning each buffer to the client
> > > +    until required metadata to send a ``V4L2_EVENT_SOURCE_CHANGE``
> > > +    for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION`` is
> > > +    found. There is no requirement to pass enough data for this to
> > > +    occur in the first buffer and the driver must be able to
> > > +    process any number
> >
> > So back to my earlier question, we're supporting tiny fragments of
> > frames here? Or is the thought that you can pick up anywhere in a
> > stream and the decoder will wait for the required resume point?
>
> I think this is precisely about the hardware/driver discarding
> bitstream frames until a frame containing resolution data is found. So
> that would be the latter, I believe.
>
> >
> > > +    a. Required fields: as per spec.
> > > +
> > > +    b. Return values: as per spec.
> > > +
> > > +    c. If data in a buffer that triggers the event is required to decode
> > > +       the first frame, the driver must not return it to the client,
> > > +       but must retain it for further decoding.
> > > +
> > > +    d. Until the resolution source event is sent to the client, calling
> > > +       :c:func:`VIDIOC_G_FMT` on the CAPTURE queue must return -EINVAL.
> > > +
> > > +    .. note::
> > > +
> > > +       No decoded frames are produced during this phase.
> > > +
> > > +7.  This step only applies for coded formats that contain resolution
> > > +    information in the stream.
> > > +    Receive and handle ``V4L2_EVENT_SOURCE_CHANGE`` from the driver
> > > +    via :c:func:`VIDIOC_DQEVENT`. The driver must send this event once
> > > +    enough data is obtained from the stream to allocate CAPTURE
> > > +    buffers and to begin producing decoded frames.
> > > +
> > > +    a. Required fields:
> > > +
> > > +       i. type = ``V4L2_EVENT_SOURCE_CHANGE``
> > > +
> > > +    b. Return values: as per spec.
> > > +
> > > +    c. The driver must return u.src_change.changes =
> > > +       ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> > > +
> > > +8.  This step only applies for coded formats that contain resolution
> > > +    information in the stream.
> > > +    Call :c:func:`VIDIOC_G_FMT` for CAPTURE queue to get format for the
> > > +    destination buffers parsed/decoded from the bitstream.
> > > +
> > > +    a. Required fields:
> > > +
> > > +       i. type = CAPTURE
> > > +
> > > +    b. Return values: as per spec.
> > > +
> > > +    c. Return fields:
> > > +
> > > +       i.   fmt.pix_mp.width, fmt.pix_mp.height: coded resolution
> > > +            for the decoded frames
> > > +
> > > +       ii.  fmt.pix_mp.pixelformat: default/required/preferred by
> > > +            driver pixelformat for decoded frames.
> > > +
> > > +       iii. num_planes: set to number of planes for pixelformat.
> > > +
> > > +       iv.  For each plane p = [0, num_planes-1]:
> > > +            plane_fmt[p].sizeimage, plane_fmt[p].bytesperline as
> > > +            per spec for coded resolution.
> > > +
> > > +    .. note::
> > > +
> > > +       Te value of pixelformat may be any pixel format supported,
> >
> > s/Te/The
>
> Ack.
>
> >
> > > +       and must
> > > +       be supported for current stream, based on the information
> > > +       parsed from the stream and hardware capabilities. It is
> > > +       suggested that driver chooses the preferred/optimal format
> > > +       for given configuration. For example, a YUV format may be
> > > +       preferred over an RGB format, if additional conversion step
> > > +       would be required.
> > > +
> > > +9.  (optional) Enumerate CAPTURE formats via :c:func:`VIDIOC_ENUM_FMT` on
> > > +    CAPTURE queue.
> > > +    Once the stream information is parsed and known, the client
> > > +    may use this ioctl to discover which raw formats are supported
> > > +    for given stream and select on of them via :c:func:`VIDIOC_S_FMT`.
> > > +
> > > +    a. Fields/return values as per spec.
> > > +
> > > +    .. note::
> > > +
> > > +       The driver must return only formats supported for the
> > > +       current stream parsed in this initialization sequence, even
> > > +       if more formats may be supported by the driver in general.
> > > +       For example, a driver/hardware may support YUV and RGB
> > > +       formats for resolutions 1920x1088 and lower, but only YUV for
> > > +       higher resolutions (e.g. due to memory bandwidth
> > > +       limitations). After parsing a resolution of 1920x1088 or
> > > +       lower, :c:func:`VIDIOC_ENUM_FMT` may return a set of YUV and RGB
> > > +       pixelformats, but after parsing resolution higher than
> > > +       1920x1088, the driver must not return (unsupported for this
> > > +       resolution) RGB.
> >
> > There are some funny cases here then.
> > Whilst memory bandwidth may limit the resolution that can be decoded
> > in real-time, for a transcode use case you haven't got a real-time
> > requirement. Enforcing this means you can never transcode that
> > resolution to RGB.
>
> I think the above is not about performance, but the general hardware
> ability to decode into such format. The bandwidth might be just not
> enough to even process one frame leading to some bus timeouts for
> example. The history of hardware design knows a lot of funny cases. :)
>
> > Actually I can't see any information related to frame rates being
> > passed in other than timestamps, therefore the driver hasn't got
> > sufficient information to make a sensible call based on memory
> > bandwidth.
>
> Again, I believe this is not about frame rate, but rather one-shot
> bandwidth needed to fetch 1 frame data without breaking things.
>
> > Perhaps it's just that the example of memory bandwidth being the
> > limitation is a bad one.
>
> Yeah, it might just be a not very good example. It could as well be
> just a fixed size static memory inside the codec hardware, which would
> obviously be capable of holding less pixels for 32-bit RGBx than
> 12-bit (in average) YUV420.
>
> >
> > > +       However, subsequent resolution change event
> > > +       triggered after discovering a resolution change within the
> > > +       same stream may switch the stream into a lower resolution;
> > > +       :c:func:`VIDIOC_ENUM_FMT` must return RGB formats again in that case.
> > > +
> > > +10.  (optional) Choose a different CAPTURE format than suggested via
> > > +     :c:func:`VIDIOC_S_FMT` on CAPTURE queue. It is possible for the client
> > > +     to choose a different format than selected/suggested by the
> > > +     driver in :c:func:`VIDIOC_G_FMT`.
> > > +
> > > +     a. Required fields:
> > > +
> > > +        i.  type = CAPTURE
> > > +
> > > +        ii. fmt.pix_mp.pixelformat set to a coded format
> > > +
> > > +     b. Return values:
> > > +
> > > +        i. EINVAL: unsupported format.
> > > +
> > > +     c. Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available formats
> > > +        after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to find
> > > +        out a set of allowed pixelformats for given configuration,
> > > +        but not required.
> > > +
> > > +11.  (optional) Acquire visible resolution via :c:func:`VIDIOC_G_SELECTION`.
> > > +
> > > +    a. Required fields:
> > > +
> > > +       i.  type = CAPTURE
> > > +
> > > +       ii. target = ``V4L2_SEL_TGT_CROP``
> > > +
> > > +    b. Return values: per spec.
> > > +
> > > +    c. Return fields
> > > +
> > > +       i. r.left, r.top, r.width, r.height: visible rectangle; this must
> > > +          fit within coded resolution returned from :c:func:`VIDIOC_G_FMT`.
> > > +
> > > +12. (optional) Get minimum number of buffers required for CAPTURE queue
> > > +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> > > +    more buffers than minimum required by hardware/format (see
> > > +    allocation).
> > > +
> > > +    a. Required fields:
> > > +
> > > +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
> > > +
> > > +    b. Return values: per spec.
> > > +
> > > +    c. Return fields:
> > > +
> > > +       i. value: minimum number of buffers required to decode the stream
> > > +          parsed in this initialization sequence.
> > > +
> > > +    .. note::
> > > +
> > > +       Note that the minimum number of buffers must be at least the
> > > +       number required to successfully decode the current stream.
> > > +       This may for example be the required DPB size for an H.264
> > > +       stream given the parsed stream configuration (resolution,
> > > +       level).
> > > +
> > > +13. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS` on the
> > > +    CAPTURE queue.
> > > +
> > > +    a. Required fields:
> > > +
> > > +       i.   count = n, where n > 0.
> > > +
> > > +       ii.  type = CAPTURE
> > > +
> > > +       iii. memory = as per spec
> > > +
> > > +    b. Return values: Per spec.
> > > +
> > > +    c. Return fields:
> > > +
> > > +       i. count: adjusted to allocated number of buffers.
> > > +
> > > +    d. The driver must adjust count to minimum of required number of
> > > +       destination buffers for given format and stream configuration
> > > +       and the count passed. The client must check this value after
> > > +       the ioctl returns to get the number of buffers allocated.
> > > +
> > > +    .. note::
> > > +
> > > +       Passing count = 1 is useful for letting the driver choose
> > > +       the minimum.
> > > +
> > > +    .. note::
> > > +
> > > +       To allocate more than minimum number of buffers (for pipeline
> > > +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE)`` to
> > > +       get minimum number of buffers required, and pass the obtained
> > > +       value plus the number of additional buffers needed in count
> > > +       to :c:func:`VIDIOC_REQBUFS`.
> > > +
> > > +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
> > > +
> > > +    a. Required fields: as per spec.
> > > +
> > > +    b. Return values: as per spec.
> > > +
> > > +Decoding
> > > +--------
> > > +
> > > +This state is reached after a successful initialization sequence. In
> > > +this state, client queues and dequeues buffers to both queues via
> > > +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, as per spec.
> > > +
> > > +Both queues operate independently. The client may queue and dequeue
> > > +buffers to queues in any order and at any rate, also at a rate different
> > > +for each queue. The client may queue buffers within the same queue in
> > > +any order (V4L2 index-wise). It is recommended for the client to operate
> > > +the queues independently for best performance.
> >
> > Only recommended sounds like a great case for clients to treat codecs
> > as one-in one-out, and then fall over if you get extra header byte
> > frames in the stream.
>
> I think the meaning of "operating the queues independently" is a bit
> different here, e.g. from separate threads.
>
> But agreed that we need to make sure that the documentation explicitly
> says that there is neither one-in one-out guarantee nor 1:1 relation
> between OUT and CAP buffers, if it doesn't say it already.
>
> >
> > > +Source OUTPUT buffers must contain:
> > > +
> > > +-  H.264/AVC: one or more complete NALUs of an Annex B elementary
> > > +   stream; one buffer does not have to contain enough data to decode
> > > +   a frame;
> >
> > This appears to be answering my earlier question, but doesn't it
> > belong in the definition of V4L2_PIX_FMT_H264 rather than buried in
> > the codec description?
> > I'm OK with that choice, but you are closing off the use case of
> > effectively cat'ing an ES into the codec to be decoded.
>
> I think it would indeed make sense to make this behavior a part of the
> pixel format. Pawel, what do you think?
>
> >
> > There's the other niggle of how to specify sizeimage in the
> > pixelformat for compressed data. I have never seen a satisfactory
> > answer in most of the APIs I've encountered (*). How big can an
> > I-frame be in a random stream? It may be a very badly coded stream,
> > but if other decoders can cope, then it's the decoder that can't which
> > will be seen to be buggy.
>
> That's a very good question. I think we just empirically came up with
> some values that seem to work in Chromium:
> https://cs.chromium.org/chromium/src/media/gpu/v4l2/v4l2_slice_video_decode_accelerator.h?rcl=eed597a7f14cb03cd7db9d9722820dddd86b4c41&l=102
> https://cs.chromium.org/chromium/src/media/gpu/v4l2/v4l2_video_decode_accelerator.cc?rcl=eed597a7f14cb03cd7db9d9722820dddd86b4c41&l=2241
>
> Pawel, any background behind those?
>
> >
> > (* ) OpenMAX IL is the exception as you can pass partial frames with
> > appropriate values in nFlags. Not many other positives one can say
> > about IL though.
> >
> > > +-  VP8/VP9: one or more complete frames.
> > > +
> > > +No direct relationship between source and destination buffers and the
> > > +timing of buffers becoming available to dequeue should be assumed in the
> > > +Stream API. Specifically:
> > > +
> > > +-  a buffer queued to OUTPUT queue may result in no buffers being
> > > +   produced on the CAPTURE queue (e.g. if it does not contain
> > > +   encoded data, or if only metadata syntax structures are present
> > > +   in it), or one or more buffers produced on the CAPTURE queue (if
> > > +   the encoded data contained more than one frame, or if returning a
> > > +   decoded frame allowed the driver to return a frame that preceded
> > > +   it in decode, but succeeded it in display order)
> > > +
> > > +-  a buffer queued to OUTPUT may result in a buffer being produced on
> > > +   the CAPTURE queue later into decode process, and/or after
> > > +   processing further OUTPUT buffers, or be returned out of order,
> > > +   e.g. if display reordering is used
> > > +
> > > +-  buffers may become available on the CAPTURE queue without additional
> > > +   buffers queued to OUTPUT (e.g. during flush or EOS)
> > > +
> > > +Seek
> > > +----
> > > +
> > > +Seek is controlled by the OUTPUT queue, as it is the source of bitstream
> > > +data. CAPTURE queue remains unchanged/unaffected.
> > > +
> > > +1. Stop the OUTPUT queue to begin the seek sequence via
> > > +   :c:func:`VIDIOC_STREAMOFF`.
> > > +
> > > +   a. Required fields:
> > > +
> > > +      i. type = OUTPUT
> > > +
> > > +   b. The driver must drop all the pending OUTPUT buffers and they are
> > > +      treated as returned to the client (as per spec).
> > > +
> > > +2. Restart the OUTPUT queue via :c:func:`VIDIOC_STREAMON`
> > > +
> > > +   a. Required fields:
> > > +
> > > +      i. type = OUTPUT
> > > +
> > > +   b. The driver must be put in a state after seek and be ready to
> > > +      accept new source bitstream buffers.
> > > +
> > > +3. Start queuing buffers to OUTPUT queue containing stream data after
> > > +   the seek until a suitable resume point is found.
> > > +
> > > +   .. note::
> > > +
> > > +      There is no requirement to begin queuing stream
> > > +      starting exactly from a resume point (e.g. SPS or a keyframe).
> > > +      The driver must handle any data queued and must keep processing
> > > +      the queued buffers until it finds a suitable resume point.
> > > +      While looking for a resume point, the driver processes OUTPUT
> > > +      buffers and returns them to the client without producing any
> > > +      decoded frames.
> > > +
> > > +4. After a resume point is found, the driver will start returning
> > > +   CAPTURE buffers with decoded frames.
> > > +
> > > +   .. note::
> > > +
> > > +      There is no precise specification for CAPTURE queue of when it
> > > +      will start producing buffers containing decoded data from
> > > +      buffers queued after the seek, as it operates independently
> > > +      from OUTPUT queue.
> > > +
> > > +      -  The driver is allowed to and may return a number of remaining CAPTURE
> > > +         buffers containing decoded frames from before the seek after the
> > > +         seek sequence (STREAMOFF-STREAMON) is performed.
> > > +
> > > +      -  The driver is also allowed to and may not return all decoded frames
> > > +         queued but not decode before the seek sequence was initiated.
> > > +         E.g. for an OUTPUT queue sequence: QBUF(A), QBUF(B),
> > > +         STREAMOFF(OUT), STREAMON(OUT), QBUF(G), QBUF(H), any of the
> > > +         following results on the CAPTURE queue is allowed: {A’, B’, G’,
> > > +         H’}, {A’, G’, H’}, {G’, H’}.
> > > +
> > > +Pause
> > > +-----
> > > +
> > > +In order to pause, the client should just cease queuing buffers onto the
> > > +OUTPUT queue. This is different from the general V4L2 API definition of
> > > +pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue. Without
> > > +source bitstream data, there is not data to process and the hardware
> >
> > s/not/no
>
> Ack.
>
> >
> > > +remains idle. Conversely, using :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue
> > > +indicates a seek, which 1) drops all buffers in flight and 2) after a
> > > +subsequent :c:func:`VIDIOC_STREAMON` will look for and only continue from a
> > > +resume point. This is usually undesirable for pause. The
> > > +STREAMOFF-STREAMON sequence is intended for seeking.
> > > +
> > > +Similarly, CAPTURE queue should remain streaming as well, as the
> > > +STREAMOFF-STREAMON sequence on it is intended solely for changing buffer
> > > +sets
> > > +
> > > +Dynamic resolution change
> > > +-------------------------
> > > +
> > > +When driver encounters a resolution change in the stream, the dynamic
> > > +resolution change sequence is started.
> > > +
> > > +1.  On encountering a resolution change in the stream. The driver must
> > > +    first process and decode all remaining buffers from before the
> > > +    resolution change point.
> > > +
> > > +2.  After all buffers containing decoded frames from before the
> > > +    resolution change point are ready to be dequeued on the
> > > +    CAPTURE queue, the driver sends a ``V4L2_EVENT_SOURCE_CHANGE``
> > > +    event for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> > > +    The last buffer from before the change must be marked with
> > > +    :c:type:`v4l2_buffer` ``flags`` flag ``V4L2_BUF_FLAG_LAST`` as in the flush
> > > +    sequence.
> >
> > How does the driver ensure the last buffer gets that flag? You may not
> > have had the new header bytes queued to the OUTPUT queue before the
> > previous frame has been decoded and dequeued on the CAPTURE queue.
> > Empty buffer with the flag set?
>
> Yes, an empty buffer. I think that was explained by the way of the
> general flush sequence later. We should state it here as well.
>
> >
> > > +    .. note::
> > > +
> > > +       Any attempts to dequeue more buffers beyond the buffer marked
> > > +       with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
> > > +       :c:func:`VIDIOC_DQBUF`.
> > > +
> > > +3.  After dequeuing all remaining buffers from the CAPTURE queue, the
> > > +    client must call :c:func:`VIDIOC_STREAMOFF` on the CAPTURE queue. The
> > > +    OUTPUT queue remains streaming (calling STREAMOFF on it would
> > > +    trigger a seek).
> > > +    Until STREAMOFF is called on the CAPTURE queue (acknowledging
> > > +    the event), the driver operates as if the resolution hasn’t
> > > +    changed yet, i.e. :c:func:`VIDIOC_G_FMT`, etc. return previous
> > > +    resolution.
> > > +
> > > +4.  The client frees the buffers on the CAPTURE queue using
> > > +    :c:func:`VIDIOC_REQBUFS`.
> > > +
> > > +    a. Required fields:
> > > +
> > > +       i.   count = 0
> > > +
> > > +       ii.  type = CAPTURE
> > > +
> > > +       iii. memory = as per spec
> > > +
> > > +5.  The client calls :c:func:`VIDIOC_G_FMT` for CAPTURE to get the new format
> > > +    information.
> > > +    This is identical to calling :c:func:`VIDIOC_G_FMT` after
> > > +    ``V4L2_EVENT_SRC_CH_RESOLUTION`` in the initialization
> > > +    sequence and should be handled similarly.
> > > +
> > > +    .. note::
> > > +
> > > +       It is allowed for the driver not to support the same
> > > +       pixelformat as previously used (before the resolution change)
> > > +       for the new resolution. The driver must select a default
> > > +       supported pixelformat and return it from :c:func:`VIDIOC_G_FMT`, and
> > > +       client must take note of it.
> > > +
> > > +6.  (optional) The client is allowed to enumerate available formats and
> > > +    select a different one than currently chosen (returned via
> > > +    :c:func:`VIDIOC_G_FMT)`. This is identical to a corresponding step in
> > > +    the initialization sequence.
> > > +
> > > +7.  (optional) The client acquires visible resolution as in
> > > +    initialization sequence.
> > > +
> > > +8.  (optional) The client acquires minimum number of buffers as in
> > > +    initialization sequence.
> > > +
> > > +9.  The client allocates a new set of buffers for the CAPTURE queue via
> > > +    :c:func:`VIDIOC_REQBUFS`. This is identical to a corresponding step in
> > > +    the initialization sequence.
> > > +
> > > +10. The client resumes decoding by issuing :c:func:`VIDIOC_STREAMON` on the
> > > +    CAPTURE queue.
> > > +
> > > +During the resolution change sequence, the OUTPUT queue must remain
> > > +streaming. Calling :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue will initiate seek.
> > > +
> > > +The OUTPUT queue operates separately from the CAPTURE queue for the
> > > +duration of the entire resolution change sequence. It is allowed (and
> > > +recommended for best performance and simplcity) for the client to keep
> > > +queuing/dequeuing buffers from/to OUTPUT queue even while processing
> > > +this sequence.
> > > +
> > > +.. note::
> > > +
> > > +   It is also possible for this sequence to be triggered without
> > > +   change in resolution if a different number of CAPTURE buffers is
> > > +   required in order to continue decoding the stream.
> > > +
> > > +Flush
> > > +-----
> > > +
> > > +Flush is the process of draining the CAPTURE queue of any remaining
> > > +buffers. After the flush sequence is complete, the client has received
> > > +all decoded frames for all OUTPUT buffers queued before the sequence was
> > > +started.
> > > +
> > > +1. Begin flush by issuing :c:func:`VIDIOC_DECODER_CMD`.
> > > +
> > > +   a. Required fields:
> > > +
> > > +      i. cmd = ``V4L2_DEC_CMD_STOP``
> > > +
> > > +2. The driver must process and decode as normal all OUTPUT buffers
> > > +   queued by the client before the :c:func:`VIDIOC_DECODER_CMD` was
> > > +   issued.
> > > +   Any operations triggered as a result of processing these
> > > +   buffers (including the initialization and resolution change
> > > +   sequences) must be processed as normal by both the driver and
> > > +   the client before proceeding with the flush sequence.
> > > +
> > > +3. Once all OUTPUT buffers queued before ``V4L2_DEC_CMD_STOP`` are
> > > +   processed:
> > > +
> > > +   a. If the CAPTURE queue is streaming, once all decoded frames (if
> > > +      any) are ready to be dequeued on the CAPTURE queue, the
> > > +      driver must send a ``V4L2_EVENT_EOS``. The driver must also
> > > +      set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer` ``flags`` field on the
> > > +      buffer on the CAPTURE queue containing the last frame (if
> > > +      any) produced as a result of processing the OUTPUT buffers
> > > +      queued before ``V4L2_DEC_CMD_STOP``. If no more frames are
> > > +      left to be returned at the point of handling
> > > +      ``V4L2_DEC_CMD_STOP``, the driver must return an empty buffer
> > > +      (with :c:type:`v4l2_buffer` ``bytesused`` = 0) as the last buffer with
> > > +      ``V4L2_BUF_FLAG_LAST`` set instead.
> > > +      Any attempts to dequeue more buffers beyond the buffer
> > > +      marked with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE
> > > +      error from :c:func:`VIDIOC_DQBUF`.
> >
> > I guess that answers my earlier question on resolution change when
> > there are no CAPTURE buffers left to be delivered.
> >
> > > +   b. If the CAPTURE queue is NOT streaming, no action is necessary for
> > > +      CAPTURE queue and the driver must send a ``V4L2_EVENT_EOS``
> > > +      immediately after all OUTPUT buffers in question have been
> > > +      processed.
> > > +
> > > +4. To resume, client may issue ``V4L2_DEC_CMD_START``.
> > > +
> > > +End of stream
> > > +-------------
> > > +
> > > +When an explicit end of stream is encountered by the driver in the
> > > +stream, it must send a ``V4L2_EVENT_EOS`` to the client after all frames
> > > +are decoded and ready to be dequeued on the CAPTURE queue, with the
> > > +:c:type:`v4l2_buffer` ``flags`` set to ``V4L2_BUF_FLAG_LAST``. This behavior is
> > > +identical to the flush sequence as if triggered by the client via
> > > +``V4L2_DEC_CMD_STOP``.
> > > +
> > > +Commit points
> > > +-------------
> > > +
> > > +Setting formats and allocating buffers triggers changes in the behavior
> > > +of the driver.
> > > +
> > > +1. Setting format on OUTPUT queue may change the set of formats
> > > +   supported/advertised on the CAPTURE queue. It also must change
> > > +   the format currently selected on CAPTURE queue if it is not
> > > +   supported by the newly selected OUTPUT format to a supported one.
> > > +
> > > +2. Enumerating formats on CAPTURE queue must only return CAPTURE formats
> > > +   supported for the OUTPUT format currently set.
> > > +
> > > +3. Setting/changing format on CAPTURE queue does not change formats
> > > +   available on OUTPUT queue. An attempt to set CAPTURE format that
> > > +   is not supported for the currently selected OUTPUT format must
> > > +   result in an error (-EINVAL) from :c:func:`VIDIOC_S_FMT`.
> > > +
> > > +4. Enumerating formats on OUTPUT queue always returns a full set of
> > > +   supported formats, irrespective of the current format selected on
> > > +   CAPTURE queue.
> > > +
> > > +5. After allocating buffers on the OUTPUT queue, it is not possible to
> > > +   change format on it.
> > > +
> > > +To summarize, setting formats and allocation must always start with the
> > > +OUTPUT queue and the OUTPUT queue is the master that governs the set of
> > > +supported formats for the CAPTURE queue.
> > > diff --git a/Documentation/media/uapi/v4l/v4l2.rst b/Documentation/media/uapi/v4l/v4l2.rst
> > > index b89e5621ae69..563d5b861d1c 100644
> > > --- a/Documentation/media/uapi/v4l/v4l2.rst
> > > +++ b/Documentation/media/uapi/v4l/v4l2.rst
> > > @@ -53,6 +53,10 @@ Authors, in alphabetical order:
> > >
> > >    - Original author of the V4L2 API and documentation.
> > >
> > > +- Figa, Tomasz <tfiga@chromium.org>
> > > +
> > > +  - Documented parts of the V4L2 (stateful) Codec Interface. Migrated from Google Docs to kernel documentation.
> > > +
> > >  - H Schimek, Michael <mschimek@gmx.at>
> > >
> > >    - Original author of the V4L2 API and documentation.
> > > @@ -65,6 +69,10 @@ Authors, in alphabetical order:
> > >
> > >    - Designed and documented the multi-planar API.
> > >
> > > +- Osciak, Pawel <posciak@chromium.org>
> > > +
> > > +  - Documented the V4L2 (stateful) Codec Interface.
> > > +
> > >  - Palosaari, Antti <crope@iki.fi>
> > >
> > >    - SDR API.
> > > @@ -85,7 +93,7 @@ Authors, in alphabetical order:
> > >
> > >    - Designed and documented the VIDIOC_LOG_STATUS ioctl, the extended control ioctls, major parts of the sliced VBI API, the MPEG encoder and decoder APIs and the DV Timings API.
> > >
> > > -**Copyright** |copy| 1999-2016: Bill Dirks, Michael H. Schimek, Hans Verkuil, Martin Rubli, Andy Walls, Muralidharan Karicheri, Mauro Carvalho Chehab, Pawel Osciak, Sakari Ailus & Antti Palosaari.
> > > +**Copyright** |copy| 1999-2018: Bill Dirks, Michael H. Schimek, Hans Verkuil, Martin Rubli, Andy Walls, Muralidharan Karicheri, Mauro Carvalho Chehab, Pawel Osciak, Sakari Ailus & Antti Palosaari, Tomasz Figa.
> > >
> > >  Except when explicitly stated as GPL, programming examples within this
> > >  part can be used and distributed without restrictions.
> > > @@ -94,6 +102,10 @@ part can be used and distributed without restrictions.
> > >  Revision History
> > >  ****************
> > >
> > > +:revision: TBD / TBD (*tf*)
> > > +
> > > +Add specification of V4L2 Codec Interface UAPI.
> > > +
> > >  :revision: 4.10 / 2016-07-15 (*rr*)
> > >
> > >  Introduce HSV formats.
> > > --
> > > 2.17.1.1185.g55be947832-goog
> >
> > Related to an earlier comment, whilst the driver has to support
> > multiple instances, there is no arbitration over the overall decode
> > rate with regard real-time performance.
> > I know our hardware is capable of 1080P60, but there's no easy way to
> > stop someone trying to decode 2 1080P60 streams simultaneously. From a
> > software perspective it'll do it, but not in real-time. I'd assume
> > most other platforms will give the similar behaviour.
> > Is it worth adding a note that real-time performance is not guaranteed
> > should multiple instances be running simultaneously, or a comment made
> > somewhere about expected performance? Or enforce it by knowing the max
> > data rates and analysing the level of each stream (please no)?
>
> This is a very interesting problem in general.
>
> I believe we don't really do anything like the latter in Chromium and
> if someone tries to play too many videos, they would just start
> dropping frames. (Pawel, correct me if I'm wrong.) It's actually
> exactly what would happen if one starts too many videos with software
> decoder running on CPU (and possibly with less instances).
>
> Best regards,
> Tomasz
Hans Verkuil June 7, 2018, 7:21 a.m. UTC | #8
On 06/05/2018 03:10 PM, Dave Stevenson wrote:
> Hi Tomasz.
> 
> Thanks for formalising this.
> I'm working on a stateful V4L2 codec driver on the Raspberry Pi and
> was having to deduce various implementation details from other
> drivers. I know how much we all tend to hate having to write
> documentation, but it is useful to have.
> 
> On 5 June 2018 at 11:33, Tomasz Figa <tfiga@chromium.org> wrote:
>> Due to complexity of the video decoding process, the V4L2 drivers of
>> stateful decoder hardware require specific sequencies of V4L2 API calls
>> to be followed. These include capability enumeration, initialization,
>> decoding, seek, pause, dynamic resolution change, flush and end of
>> stream.
>>
>> Specifics of the above have been discussed during Media Workshops at
>> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
>> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
>> originated at those events was later implemented by the drivers we already
>> have merged in mainline, such as s5p-mfc or mtk-vcodec.
>>
>> The only thing missing was the real specification included as a part of
>> Linux Media documentation. Fix it now and document the decoder part of
>> the Codec API.
>>
>> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
>> ---
>>  Documentation/media/uapi/v4l/dev-codec.rst | 771 +++++++++++++++++++++
>>  Documentation/media/uapi/v4l/v4l2.rst      |  14 +-
>>  2 files changed, 784 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
>> index c61e938bd8dc..0483b10c205e 100644
>> --- a/Documentation/media/uapi/v4l/dev-codec.rst
>> +++ b/Documentation/media/uapi/v4l/dev-codec.rst
>> @@ -34,3 +34,774 @@ the codec and reprogram it whenever another file handler gets access.
>>  This is different from the usual video node behavior where the video
>>  properties are global to the device (i.e. changing something through one
>>  file handle is visible through another file handle).
> 
> I know this isn't part of the changes, but raises a question in
> v4l2-compliance (so probably one for Hans).
> testUnlimitedOpens tries opening the device 100 times. On a normal
> device this isn't a significant overhead, but when you're allocating
> resources on a per instance basis it quickly adds up.
> Internally I have state that has a limit of 64 codec instances (either
> encode or decode), so either I allocate at start_streaming and fail on
> the 65th one, or I fail on open. I generally take the view that
> failing early is a good thing.
> Opinions? Is 100 instances of an M2M device really sensible?

Resources should not be allocated by the driver until needed (i.e. the
queue_setup op is a good place for that).

It is perfectly legal to open a video node just to call QUERYCAP to
see what it is, and I don't expect that to allocate any hardware resources.
And if I want to open it 100 times, then that should just work.

It is *always* wrong to limit the number of open arbitrarily.

Regards,

	Hans
Tomasz Figa June 7, 2018, 7:27 a.m. UTC | #9
On Wed, Jun 6, 2018 at 7:45 PM Philipp Zabel <p.zabel@pengutronix.de> wrote:
>
> On Tue, 2018-06-05 at 22:42 +0900, Tomasz Figa wrote:
> [...]
> > > > +   a. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for coded formats
> > > > +      must be maximums for given coded format for all supported raw
> > > > +      formats.
> > >
> > > I don't understand what maximums means in this context.
> > >
> > > If I have a decoder that can decode from 16x16 up to 1920x1088, should
> > > this return a continuous range from minimum frame size to maximum frame
> > > size?
> >
> > Looks like the wording here is a bit off. It should be as you say +/-
> > alignment requirements, which can be specified by using
> > v4l2_frmsize_stepwise. Hardware that supports only a fixed set of
> > resolutions (if such exists), should use v4l2_frmsize_discrete.
> > Basically this should follow the standard description of
> > VIDIOC_ENUM_FRAMESIZES.
>
> Should this contain coded sizes or visible sizes?

Since it relates to the format of the queue and we are considering
setting coded size there for formats/hardware for which it can't be
obtained from the stream, I'd say this one should be coded as well.
This is also how we interpret them in Chromium video stack.

>
> > >
> > > > +   b. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for raw formats must
> > > > +      be maximums for given raw format for all supported coded
> > > > +      formats.
> > >
> > > Same here, this is unclear to me.
> >
> > Should be as above, i.e. according to standard operation of
> > VIDIOC_ENUM_FRAMESIZES.
>
> How about just:
>
>    a. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for coded formats
>       must contain all possible (coded?) frame sizes for the given coded format
>       for all supported raw formats.

I wouldn't mention raw formats here, since what's supported will
actually depend on coded format.

>
>    b. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for raw formats
>       must contain all possible coded frame sizes for the given raw format
>       for all supported encoded formats.

I'd say that this should be "for currently set coded format", because
otherwise userspace would have no way to find the real supported range
for the codec it wants to decode.

>
> And then a note somewhere that explains that coded frame sizes are
> usually visible frame size rounded up to macro block size, possibly a
> link to the coded resolution glossary.

Agreed on the note.

I'm not yet sure about the link, because it might just clutter the
source text. I think the reader would be looking through the document
from the top anyway, so should be able to notice the glossary and
scroll back to it, if necessary.

>
> [...]
> > Actually, when I think of it now, I wonder if we really should be
> > setting resolution here for bitstream formats that don't include
> > resolution, rather than on CAPTURE queue. Pawel, could you clarify
> > what was the intention here?
>
> Setting the resolution here makes it possible to start streaming,
> allocate buffers on both queues etc. without relying on the hardware to
> actually parse the headers. If we are given the right information, the
> first source change event will just confirm the currently set
> resolution.

I think the same could be achievable by userspace setting the format
on CAPTURE, rather than OUTPUT.

However, I guess it just depends on the convention we agree on. If we
decide that coded formats are characterized by width/height that
represent coded size, I guess we might just set it on OUTPUT and make
the format on CAPTURE read-only (unless the hardware supports some
kind of transformations). If we decide so, then we would also have to:
 - update OUTPUT format on initial bitstream parse and dynamic
resolution change,
 - for encoder, make CAPTURE format correspond to coded size of
encoded bitstream.

It sounds quite reasonable to me and it might not even conflict (too
much) with what existing drivers and userspace do.

>
> [...]
> > > What about devices that have a frame buffer registration step before
> > > stream start? For coda I need to know all CAPTURE buffers before I can
> > > start streaming, because there is no way to register them after
> > > STREAMON. Do I have to split the driver internally to do streamoff and
> > > restart when the capture queue is brought up?
> >
> > Do you mean that the hardware requires registering framebuffers before
> > the headers are parsed and resolution is detected? That sounds quite
> > unusual.
>
> I meant that, but I was mistaken. For coda that is just how the driver
> currently works, but it is not required by the hardware.
>
> > Other drivers would:
> > 1) parse the header on STREAMON(OUTPUT),
>
> coda has a SEQ_INIT command, which parses the headers, and a
> SET_FRAME_BUF command that registers allocated (internal) buffers.
> Both are currently done during streamon, but it should be possible to
> split this up. SET_FRAME_BUF can be only issued once between SEQ_INIT
> and SEQ_END, but it is a separate command.
>
> > 2) report resolution to userspace,
> > 3) have framebuffers allocated in REQBUFS(CAPTURE),
> > 4) register framebuffers in STREAMON(CAPTURE).
>
> coda has a peculiarity in that the registered frame buffers are internal
> only, and another part of the codec (copy/rotator) or another part of
> the SoC (VDOA) copies those frames into the CAPTURE buffers that don't
> have to be registered at all in advance in a separate step. But it
> should still be possible to do the internal buffer allocation and
> registration in the right places.

Out of curiosity, why is that? Couldn't the internal frame buffers be
just directly exposed to userspace with the agreement that userspace
doesn't write to them, as s5p-mfc does? Actually, s5p-mfc hw seems to
include a similar mode that includes a copy/rotate step, but it only
imposes higher bandwidth requirements.

>
> [...]
> > Should be the same. There was "+5. Single-plane API (see spec) and
> > applicable structures may be used interchangeably with Multi-plane
> > API, unless specified otherwise." mentioned at the beginning of the
> > documentation, but I guess we could just make the description generic
> > instead.
>
> Yes, please. Especially when using this as a reference during driver
> development, it would be very helpful to have all relevant information
> in place or at least referenced, instead of having to read and memorize
> the whole document linearly.

Ack.

>
> [...]
> > > Isn't CROP supposed to be set on the OUTPUT queue only and COMPOSE on
> > > the CAPTURE queue?
> >
> > Why? Both CROP and COMPOSE can be used on any queue, if supported by
> > given interface.
> >
> > However, on codecs, since OUTPUT queue is a bitstream, I don't think
> > selection makes sense there.
> >
> > > I would expect COMPOSE/COMPOSE_DEFAULT to be set to the visible
> > > rectangle and COMPOSE_PADDED to be set to the rectangle that the
> > > hardware actually overwrites.
> >
> > Yes, that's a good point. I'd also say that CROP/CROP_DEFAULT should
> > be set to the visible rectangle as well, to allow adding handling for
> > cases when the hardware can actually do further cropping.
>
> Should CROP_BOUNDS be set to visible rectangle or to the coded
> rectangle? This is related the question to whether coded G/S_FMT should
> handle coded sizes or visible sizes.

I'd say that the format on CAPTURE should represent framebuffer size,
which might be hardware specific and not necessarily equal to coded
size. This would also enable allocating bigger framebuffers beforehand
to avoid reallocation for resolution changes.

If we want to make selection consistent with CAPTURE format, we should
probably have CROP_BOUNDS equal to framebuffer resolution. I'm not
sure how it would work for hardware that can't do any transformations,
e.g. those where CROP == CROP_DEFAULT == COMPOSE == COMPOSE_DEFAULT ==
visible size. I couldn't find in the spec whether it is allowed for
CROP_BOUNDS to report a rectangle unsupported by CROP.

>
> For video capture devices, the cropping bounds should represent those
> pixels that can be sampled. If we can 'sample' the coded pixels beyond
> the visible rectangle, should decoders behave the same?
>
> I think Documentation/media/uapi/v4l/selection-api-004.rst is missing a
> section about mem2mem devices and/or codecs to clarify this.

Ack. I guess we should add something there.

[snip]
> > >
> > > > +subsequent :c:func:`VIDIOC_STREAMON` will look for and only continue from a
> > > > +resume point. This is usually undesirable for pause. The
> > > > +STREAMOFF-STREAMON sequence is intended for seeking.
> > > > +
> > > > +Similarly, CAPTURE queue should remain streaming as well, as the
> > > > +STREAMOFF-STREAMON sequence on it is intended solely for changing buffer
> > > > +sets
> > > > +
> > > > +Dynamic resolution change
> > > > +-------------------------
> > > > +
> > > > +When driver encounters a resolution change in the stream, the dynamic
> > > > +resolution change sequence is started.
> > >
> > > Must all drivers support dynamic resolution change?
> >
> > I'd say no, but I guess that would mean that the driver never
> > encounters it, because hardware wouldn't report it.
> >
> > I wonder would happen in such case, though. Obviously decoding of such
> > stream couldn't continue without support in the driver.
>
> GStreamer supports decoding of variable resolution streams without
> driver support by just stopping and restarting streaming completely.

What about userspace that doesn't parse the stream on its own? Do we
want to impose the requirement of full bitstream parsing even for
hardware that can just do it itself?

>
> > >
> > > > +1.  On encountering a resolution change in the stream. The driver must
> > > > +    first process and decode all remaining buffers from before the
> > > > +    resolution change point.
> > > > +
> > > > +2.  After all buffers containing decoded frames from before the
> > > > +    resolution change point are ready to be dequeued on the
> > > > +    CAPTURE queue, the driver sends a ``V4L2_EVENT_SOURCE_CHANGE``
> > > > +    event for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> > > > +    The last buffer from before the change must be marked with
> > > > +    :c:type:`v4l2_buffer` ``flags`` flag ``V4L2_BUF_FLAG_LAST`` as in the flush
> > > > +    sequence.
> > > > +
> > > > +    .. note::
> > > > +
> > > > +       Any attempts to dequeue more buffers beyond the buffer marked
> > > > +       with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
> > > > +       :c:func:`VIDIOC_DQBUF`.
> > > > +
> > > > +3.  After dequeuing all remaining buffers from the CAPTURE queue, the
> > > > +    client must call :c:func:`VIDIOC_STREAMOFF` on the CAPTURE queue. The
> > > > +    OUTPUT queue remains streaming (calling STREAMOFF on it would
> > > > +    trigger a seek).
> > > > +    Until STREAMOFF is called on the CAPTURE queue (acknowledging
> > > > +    the event), the driver operates as if the resolution hasn’t
> > > > +    changed yet, i.e. :c:func:`VIDIOC_G_FMT`, etc. return previous
> > > > +    resolution.
> > >
> > > What about the OUTPUT queue resolution, does it change as well?
> >
> > There shouldn't be resolution associated with OUTPUT queue, because
> > pixel format is bitstream, not raw frame.
>
> So the width and height field may just contain bogus values for coded
> formats?

This is probably as per the convention we agree on, as I mentioned
above. If we assume that coded formats are characterized by coded
size, then width and height would indeed have to always contain the
coded size (and it would change after dynamic resolution change).

>
> [...]
> > > Ok. Is the same true about the contained colorimetry? What should happen
> > > if the stream contains colorimetry information that differs from
> > > S_FMT(OUT) colorimetry?
> >
> > As I explained close to the top, IMHO we shouldn't be setting
> > colorimetry on OUTPUT queue.
>
> Does that mean that if userspace sets those fields though, we correct to
> V4L2_COLORSPACE_DEFAULT and friends? Or just accept anything and ignore
> it?

As I mentioned in other comments, I rethought this and we should be
okay with having colorimetry (and properties) set on OUTPUT queue.

>
> > > > +2. Enumerating formats on CAPTURE queue must only return CAPTURE formats
> > > > +   supported for the OUTPUT format currently set.
> > > > +
> > > > +3. Setting/changing format on CAPTURE queue does not change formats
> > > > +   available on OUTPUT queue. An attempt to set CAPTURE format that
> > > > +   is not supported for the currently selected OUTPUT format must
> > > > +   result in an error (-EINVAL) from :c:func:`VIDIOC_S_FMT`.
> > >
> > > Is this limited to the pixel format? Surely setting out of bounds
> > > width/height or incorrect colorimetry should not result in EINVAL but
> > > still be corrected by the driver?
> >
> > That doesn't sound right to me indeed. The driver should fix up
> > S_FMT(CAPTURE), including pixel format or anything else. It must only
> > not alter OUTPUT settings.
>
> That's what I would have expected as well.
>
> > >
> > > > +4. Enumerating formats on OUTPUT queue always returns a full set of
> > > > +   supported formats, irrespective of the current format selected on
> > > > +   CAPTURE queue.
> > > > +
> > > > +5. After allocating buffers on the OUTPUT queue, it is not possible to
> > > > +   change format on it.
> > >
> > > So even after source change events the OUTPUT queue still keeps the
> > > initial OUTPUT format?
> >
> > It would basically only have pixelformat (fourcc) assigned to it,
> > since bitstream formats are not video frames, but just sequences of
> > bytes. I don't think it makes sense to change e.g. from H264 to VP8
> > during streaming.
>
> What should the width and height format fields be set to then? Is there
> a precedent for this? Capture devices that produce compressed output
> usually set width and height to the visible resolution.

s5p-mfc (the first upstream codec driver) always sets them to 0. That
might be fixed if we agree on a consistent convention, though.

Best regards,
Tomasz
Tomasz Figa June 7, 2018, 7:30 a.m. UTC | #10
On Thu, Jun 7, 2018 at 4:22 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> On 06/05/2018 03:10 PM, Dave Stevenson wrote:
> > Hi Tomasz.
> >
> > Thanks for formalising this.
> > I'm working on a stateful V4L2 codec driver on the Raspberry Pi and
> > was having to deduce various implementation details from other
> > drivers. I know how much we all tend to hate having to write
> > documentation, but it is useful to have.
> >
> > On 5 June 2018 at 11:33, Tomasz Figa <tfiga@chromium.org> wrote:
> >> Due to complexity of the video decoding process, the V4L2 drivers of
> >> stateful decoder hardware require specific sequencies of V4L2 API calls
> >> to be followed. These include capability enumeration, initialization,
> >> decoding, seek, pause, dynamic resolution change, flush and end of
> >> stream.
> >>
> >> Specifics of the above have been discussed during Media Workshops at
> >> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> >> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> >> originated at those events was later implemented by the drivers we already
> >> have merged in mainline, such as s5p-mfc or mtk-vcodec.
> >>
> >> The only thing missing was the real specification included as a part of
> >> Linux Media documentation. Fix it now and document the decoder part of
> >> the Codec API.
> >>
> >> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> >> ---
> >>  Documentation/media/uapi/v4l/dev-codec.rst | 771 +++++++++++++++++++++
> >>  Documentation/media/uapi/v4l/v4l2.rst      |  14 +-
> >>  2 files changed, 784 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> >> index c61e938bd8dc..0483b10c205e 100644
> >> --- a/Documentation/media/uapi/v4l/dev-codec.rst
> >> +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> >> @@ -34,3 +34,774 @@ the codec and reprogram it whenever another file handler gets access.
> >>  This is different from the usual video node behavior where the video
> >>  properties are global to the device (i.e. changing something through one
> >>  file handle is visible through another file handle).
> >
> > I know this isn't part of the changes, but raises a question in
> > v4l2-compliance (so probably one for Hans).
> > testUnlimitedOpens tries opening the device 100 times. On a normal
> > device this isn't a significant overhead, but when you're allocating
> > resources on a per instance basis it quickly adds up.
> > Internally I have state that has a limit of 64 codec instances (either
> > encode or decode), so either I allocate at start_streaming and fail on
> > the 65th one, or I fail on open. I generally take the view that
> > failing early is a good thing.
> > Opinions? Is 100 instances of an M2M device really sensible?
>
> Resources should not be allocated by the driver until needed (i.e. the
> queue_setup op is a good place for that).
>
> It is perfectly legal to open a video node just to call QUERYCAP to
> see what it is, and I don't expect that to allocate any hardware resources.
> And if I want to open it 100 times, then that should just work.
>
> It is *always* wrong to limit the number of open arbitrarily.

That's a valid point indeed. Besides the querying use case, userspace
might just want to pre-open a bigger number of instances, but it
doesn't mean that they would be streaming all at the same time indeed.

Best regards,
Tomasz
Hans Verkuil June 7, 2018, 8:47 a.m. UTC | #11
Hi Tomasz,

First of all: thank you very much for working on this. It's a big missing piece of
information, so filling this in is very helpful.

On 06/05/2018 12:33 PM, Tomasz Figa wrote:
> Due to complexity of the video decoding process, the V4L2 drivers of
> stateful decoder hardware require specific sequencies of V4L2 API calls
> to be followed. These include capability enumeration, initialization,
> decoding, seek, pause, dynamic resolution change, flush and end of
> stream.
> 
> Specifics of the above have been discussed during Media Workshops at
> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> originated at those events was later implemented by the drivers we already
> have merged in mainline, such as s5p-mfc or mtk-vcodec.
> 
> The only thing missing was the real specification included as a part of
> Linux Media documentation. Fix it now and document the decoder part of
> the Codec API.
> 
> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> ---
>  Documentation/media/uapi/v4l/dev-codec.rst | 771 +++++++++++++++++++++
>  Documentation/media/uapi/v4l/v4l2.rst      |  14 +-
>  2 files changed, 784 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> index c61e938bd8dc..0483b10c205e 100644
> --- a/Documentation/media/uapi/v4l/dev-codec.rst
> +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> @@ -34,3 +34,774 @@ the codec and reprogram it whenever another file handler gets access.
>  This is different from the usual video node behavior where the video
>  properties are global to the device (i.e. changing something through one
>  file handle is visible through another file handle).

To what extent does the information in this patch series apply specifically to
video (de)compression hardware and to what extent it is applicable for any m2m
device? It looks like most if not all is specific to video (de)compression hw
and not to e.g. a simple deinterlacer.

Ideally there would be a common section first describing the requirements for
all m2m devices, followed by an encoder and decoder section going into details
for those specific devices.

I also think that we need an additional paragraph somewhere at the beginning
of the Codec Interface chapter that explains more clearly that OUTPUT buffers
send data to the hardware to be processed and that CAPTURE buffers contains
the processed data. It is always confusing for newcomers to understand that
in V4L2 this is seen from the point of view of the CPU.

> +
> +This interface is generally appropriate for hardware that does not
> +require additional software involvement to parse/partially decode/manage
> +the stream before/after processing in hardware.
> +
> +Input data to the Stream API are buffers containing unprocessed video
> +stream (Annex-B H264/H265 stream, raw VP8/9 stream) only. The driver is
> +expected not to require any additional information from the client to
> +process these buffers, and to return decoded frames on the CAPTURE queue
> +in display order.
> +
> +Performing software parsing, processing etc. of the stream in the driver
> +in order to support stream API is strongly discouraged. In such case use
> +of Stateless Codec Interface (in development) is preferred.
> +
> +Conventions and notation used in this document
> +==============================================
> +
> +1. The general V4L2 API rules apply if not specified in this document
> +   otherwise.
> +
> +2. The meaning of words “must”, “may”, “should”, etc. is as per RFC
> +   2119.
> +
> +3. All steps not marked “optional” are required.
> +
> +4. :c:func:`VIDIOC_G_EXT_CTRLS`, :c:func:`VIDIOC_S_EXT_CTRLS` may be used interchangeably with
> +   :c:func:`VIDIOC_G_CTRL`, :c:func:`VIDIOC_S_CTRL`, unless specified otherwise.
> +
> +5. Single-plane API (see spec) and applicable structures may be used
> +   interchangeably with Multi-plane API, unless specified otherwise.
> +
> +6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
> +   [0..2]: i = 0, 1, 2.
> +
> +7. For OUTPUT buffer A, A’ represents a buffer on the CAPTURE queue
> +   containing data (decoded or encoded frame/stream) that resulted
> +   from processing buffer A.
> +
> +Glossary
> +========
> +
> +CAPTURE
> +   the destination buffer queue, decoded frames for
> +   decoders, encoded bitstream for encoders;
> +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE`` or
> +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``
> +
> +client
> +   application client communicating with the driver
> +   implementing this API
> +
> +coded format
> +   encoded/compressed video bitstream format (e.g.
> +   H.264, VP8, etc.); see raw format; this is not equivalent to fourcc
> +   (V4L2 pixelformat), as each coded format may be supported by multiple
> +   fourccs (e.g. ``V4L2_PIX_FMT_H264``, ``V4L2_PIX_FMT_H264_SLICE``, etc.)
> +
> +coded height
> +   height for given coded resolution
> +
> +coded resolution
> +   stream resolution in pixels aligned to codec
> +   format and hardware requirements; see also visible resolution
> +
> +coded width
> +   width for given coded resolution
> +
> +decode order
> +   the order in which frames are decoded; may differ
> +   from display (output) order if frame reordering (B frames) is active in
> +   the stream; OUTPUT buffers must be queued in decode order; for frame
> +   API, CAPTURE buffers must be returned by the driver in decode order;

"frame API"

> +
> +display order
> +   the order in which frames must be displayed
> +   (outputted); for stream API, CAPTURE buffers must be returned by the
> +   driver in display order;

"stream API"

Old terms, need to be replaced. Also need to be defined in this glossary.

> +
> +EOS
> +   end of stream
> +
> +input height
> +   height in pixels for given input resolution

'input' is a confusing name. Because I think this refers to the resolution
set for the OUTPUT buffer. How about renaming this to 'source'?

I.e.: an OUTPUT buffer contains the source data for the hardware. The capture
buffer contains the sink data from the hardware.

> +
> +input resolution
> +   resolution in pixels of source frames being input

"source resolution
	resolution in pixels of source frames passed"

> +   to the encoder and subject to further cropping to the bounds of visible
> +   resolution
> +
> +input width
> +   width in pixels for given input resolution
> +
> +OUTPUT
> +   the source buffer queue, encoded bitstream for
> +   decoders, raw frames for encoders; ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
> +   ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``
> +
> +raw format
> +   uncompressed format containing raw pixel data (e.g.
> +   YUV, RGB formats)
> +
> +resume point
> +   a point in the bitstream from which decoding may
> +   start/continue, without any previous state/data present, e.g.: a
> +   keyframe (VPX) or SPS/PPS/IDR sequence (H.264); a resume point is
> +   required to start decode of a new stream, or to resume decoding after a
> +   seek;
> +
> +source buffer
> +   buffers allocated for source queue

"OUTPUT buffers allocated..."

> +
> +source queue
> +   queue containing buffers used for source data, i.e.

Line suddenly ends.

I'd say: "queue containing OUTPUT buffers"

> +
> +visible height
> +   height for given visible resolution
> +
> +visible resolution
> +   stream resolution of the visible picture, in
> +   pixels, to be used for display purposes; must be smaller or equal to
> +   coded resolution;
> +
> +visible width
> +   width for given visible resolution
> +
> +Decoder
> +=======
> +
> +Querying capabilities
> +---------------------
> +
> +1. To enumerate the set of coded formats supported by the driver, the
> +   client uses :c:func:`VIDIOC_ENUM_FMT` for OUTPUT. The driver must always
> +   return the full set of supported formats, irrespective of the
> +   format set on the CAPTURE queue.
> +
> +2. To enumerate the set of supported raw formats, the client uses
> +   :c:func:`VIDIOC_ENUM_FMT` for CAPTURE. The driver must return only the
> +   formats supported for the format currently set on the OUTPUT
> +   queue.
> +   In order to enumerate raw formats supported by a given coded
> +   format, the client must first set that coded format on the
> +   OUTPUT queue and then enumerate the CAPTURE queue.
> +
> +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> +   resolutions for a given format, passing its fourcc in
> +   :c:type:`v4l2_frmivalenum` ``pixel_format``.
> +
> +   a. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for coded formats
> +      must be maximums for given coded format for all supported raw
> +      formats.
> +
> +   b. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for raw formats must
> +      be maximums for given raw format for all supported coded
> +      formats.
> +
> +   c. The client should derive the supported resolution for a
> +      combination of coded+raw format by calculating the
> +      intersection of resolutions returned from calls to
> +      :c:func:`VIDIOC_ENUM_FRAMESIZES` for the given coded and raw formats.
> +
> +4. Supported profiles and levels for given format, if applicable, may be
> +   queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> +
> +5. The client may use :c:func:`VIDIOC_ENUM_FRAMEINTERVALS` to enumerate maximum
> +   supported framerates by the driver/hardware for a given
> +   format+resolution combination.
> +
> +Initialization sequence
> +-----------------------
> +
> +1. (optional) Enumerate supported OUTPUT formats and resolutions. See
> +   capability enumeration.
> +
> +2. Set a coded format on the source queue via :c:func:`VIDIOC_S_FMT`
> +
> +   a. Required fields:
> +
> +      i.   type = OUTPUT
> +
> +      ii.  fmt.pix_mp.pixelformat set to a coded format
> +
> +      iii. fmt.pix_mp.width, fmt.pix_mp.height only if cannot be
> +           parsed from the stream for the given coded format;
> +           ignored otherwise;
> +
> +   b. Return values:
> +
> +      i.  EINVAL: unsupported format.
> +
> +      ii. Others: per spec
> +
> +   .. note::
> +
> +      The driver must not adjust pixelformat, so if
> +      ``V4L2_PIX_FMT_H264`` is passed but only
> +      ``V4L2_PIX_FMT_H264_SLICE`` is supported, S_FMT will return
> +      -EINVAL. If both are acceptable by client, calling S_FMT for
> +      the other after one gets rejected may be required (or use
> +      :c:func:`VIDIOC_ENUM_FMT` to discover beforehand, see Capability
> +      enumeration).

This needs to be documented in S_FMT as well.

What will TRY_FMT do? Return EINVAL as well, or replace the pixelformat?

Should this be a general rule for output devices that S_FMT (and perhaps TRY_FMT)
fail with EINVAL if the pixelformat is not supported? There is something to be
said for that.

> +
> +3.  (optional) Get minimum number of buffers required for OUTPUT queue
> +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> +    more buffers than minimum required by hardware/format (see
> +    allocation).
> +
> +    a. Required fields:
> +
> +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> +
> +    b. Return values: per spec.
> +
> +    c. Return fields:
> +
> +       i. value: required number of OUTPUT buffers for the currently set
> +          format;
> +
> +4.  Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on OUTPUT
> +    queue.
> +
> +    a. Required fields:
> +
> +       i.   count = n, where n > 0.
> +
> +       ii.  type = OUTPUT
> +
> +       iii. memory = as per spec
> +
> +    b. Return values: Per spec.
> +
> +    c. Return fields:
> +
> +       i. count: adjusted to allocated number of buffers
> +
> +    d. The driver must adjust count to minimum of required number of
> +       source buffers for given format and count passed. The client
> +       must check this value after the ioctl returns to get the
> +       number of buffers allocated.
> +
> +    .. note::
> +
> +       Passing count = 1 is useful for letting the driver choose
> +       the minimum according to the selected format/hardware
> +       requirements.
> +
> +    .. note::
> +
> +       To allocate more than minimum number of buffers (for pipeline
> +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT)`` to
> +       get minimum number of buffers required by the driver/format,
> +       and pass the obtained value plus the number of additional
> +       buffers needed in count to :c:func:`VIDIOC_REQBUFS`.
> +
> +5.  Begin parsing the stream for stream metadata via :c:func:`VIDIOC_STREAMON` on
> +    OUTPUT queue. This step allows the driver to parse/decode
> +    initial stream metadata until enough information to allocate
> +    CAPTURE buffers is found. This is indicated by the driver by
> +    sending a ``V4L2_EVENT_SOURCE_CHANGE`` event, which the client
> +    must handle.
> +
> +    a. Required fields: as per spec.
> +
> +    b. Return values: as per spec.
> +
> +    .. note::
> +
> +       Calling :c:func:`VIDIOC_REQBUFS`, :c:func:`VIDIOC_STREAMON`
> +       or :c:func:`VIDIOC_G_FMT` on the CAPTURE queue at this time is not
> +       allowed and must return EINVAL.

I dislike EINVAL here. It is too generic. Also, the passed arguments can be
perfectly valid, you just aren't in the right state. EPERM might be better.

> +
> +6.  This step only applies for coded formats that contain resolution
> +    information in the stream.
> +    Continue queuing/dequeuing bitstream buffers to/from the
> +    OUTPUT queue via :c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`. The driver
> +    must keep processing and returning each buffer to the client
> +    until required metadata to send a ``V4L2_EVENT_SOURCE_CHANGE``
> +    for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION`` is
> +    found.

This sentence is confusing. It's not clear what you mean here.

 There is no requirement to pass enough data for this to
> +    occur in the first buffer and the driver must be able to
> +    process any number

Missing period at the end of the sentence.

> +
> +    a. Required fields: as per spec.
> +
> +    b. Return values: as per spec.
> +
> +    c. If data in a buffer that triggers the event is required to decode
> +       the first frame, the driver must not return it to the client,
> +       but must retain it for further decoding.
> +
> +    d. Until the resolution source event is sent to the client, calling
> +       :c:func:`VIDIOC_G_FMT` on the CAPTURE queue must return -EINVAL.

EPERM?

> +
> +    .. note::
> +
> +       No decoded frames are produced during this phase.
> +
> +7.  This step only applies for coded formats that contain resolution

applies to  (same elsewhere)

> +    information in the stream.
> +    Receive and handle ``V4L2_EVENT_SOURCE_CHANGE`` from the driver
> +    via :c:func:`VIDIOC_DQEVENT`. The driver must send this event once
> +    enough data is obtained from the stream to allocate CAPTURE
> +    buffers and to begin producing decoded frames.
> +
> +    a. Required fields:
> +
> +       i. type = ``V4L2_EVENT_SOURCE_CHANGE``
> +
> +    b. Return values: as per spec.
> +
> +    c. The driver must return u.src_change.changes =
> +       ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> +
> +8.  This step only applies for coded formats that contain resolution
> +    information in the stream.
> +    Call :c:func:`VIDIOC_G_FMT` for CAPTURE queue to get format for the
> +    destination buffers parsed/decoded from the bitstream.
> +
> +    a. Required fields:
> +
> +       i. type = CAPTURE
> +
> +    b. Return values: as per spec.
> +
> +    c. Return fields:
> +
> +       i.   fmt.pix_mp.width, fmt.pix_mp.height: coded resolution
> +            for the decoded frames
> +
> +       ii.  fmt.pix_mp.pixelformat: default/required/preferred by
> +            driver pixelformat for decoded frames.
> +
> +       iii. num_planes: set to number of planes for pixelformat.
> +
> +       iv.  For each plane p = [0, num_planes-1]:
> +            plane_fmt[p].sizeimage, plane_fmt[p].bytesperline as
> +            per spec for coded resolution.
> +
> +    .. note::
> +
> +       Te value of pixelformat may be any pixel format supported,
> +       and must
> +       be supported for current stream, based on the information
> +       parsed from the stream and hardware capabilities. It is
> +       suggested that driver chooses the preferred/optimal format
> +       for given configuration. For example, a YUV format may be
> +       preferred over an RGB format, if additional conversion step
> +       would be required.
> +
> +9.  (optional) Enumerate CAPTURE formats via :c:func:`VIDIOC_ENUM_FMT` on
> +    CAPTURE queue.
> +    Once the stream information is parsed and known, the client
> +    may use this ioctl to discover which raw formats are supported
> +    for given stream and select on of them via :c:func:`VIDIOC_S_FMT`.
> +
> +    a. Fields/return values as per spec.
> +
> +    .. note::
> +
> +       The driver must return only formats supported for the
> +       current stream parsed in this initialization sequence, even
> +       if more formats may be supported by the driver in general.
> +       For example, a driver/hardware may support YUV and RGB
> +       formats for resolutions 1920x1088 and lower, but only YUV for
> +       higher resolutions (e.g. due to memory bandwidth
> +       limitations). After parsing a resolution of 1920x1088 or
> +       lower, :c:func:`VIDIOC_ENUM_FMT` may return a set of YUV and RGB
> +       pixelformats, but after parsing resolution higher than
> +       1920x1088, the driver must not return (unsupported for this
> +       resolution) RGB.
> +
> +       However, subsequent resolution change event
> +       triggered after discovering a resolution change within the
> +       same stream may switch the stream into a lower resolution;
> +       :c:func:`VIDIOC_ENUM_FMT` must return RGB formats again in that case.
> +
> +10.  (optional) Choose a different CAPTURE format than suggested via
> +     :c:func:`VIDIOC_S_FMT` on CAPTURE queue. It is possible for the client
> +     to choose a different format than selected/suggested by the
> +     driver in :c:func:`VIDIOC_G_FMT`.
> +
> +     a. Required fields:
> +
> +        i.  type = CAPTURE
> +
> +        ii. fmt.pix_mp.pixelformat set to a coded format
> +
> +     b. Return values:
> +
> +        i. EINVAL: unsupported format.

Or replace it with a supported format. I'm inclined to do that instead of
returning EINVAL.

> +
> +     c. Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available formats
> +        after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to find
> +        out a set of allowed pixelformats for given configuration,
> +        but not required.
> +
> +11.  (optional) Acquire visible resolution via :c:func:`VIDIOC_G_SELECTION`.
> +
> +    a. Required fields:
> +
> +       i.  type = CAPTURE
> +
> +       ii. target = ``V4L2_SEL_TGT_CROP``

I don't think this is the right selection target to use, but I think others
commented on that already.

> +
> +    b. Return values: per spec.
> +
> +    c. Return fields
> +
> +       i. r.left, r.top, r.width, r.height: visible rectangle; this must
> +          fit within coded resolution returned from :c:func:`VIDIOC_G_FMT`.
> +
> +12. (optional) Get minimum number of buffers required for CAPTURE queue
> +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> +    more buffers than minimum required by hardware/format (see
> +    allocation).
> +
> +    a. Required fields:
> +
> +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
> +
> +    b. Return values: per spec.
> +
> +    c. Return fields:
> +
> +       i. value: minimum number of buffers required to decode the stream
> +          parsed in this initialization sequence.
> +
> +    .. note::
> +
> +       Note that the minimum number of buffers must be at least the
> +       number required to successfully decode the current stream.
> +       This may for example be the required DPB size for an H.264

Is DPB in the glossary?

> +       stream given the parsed stream configuration (resolution,
> +       level).
> +
> +13. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS` on the
> +    CAPTURE queue.
> +
> +    a. Required fields:
> +
> +       i.   count = n, where n > 0.
> +
> +       ii.  type = CAPTURE
> +
> +       iii. memory = as per spec
> +
> +    b. Return values: Per spec.
> +
> +    c. Return fields:
> +
> +       i. count: adjusted to allocated number of buffers.
> +
> +    d. The driver must adjust count to minimum of required number of
> +       destination buffers for given format and stream configuration
> +       and the count passed. The client must check this value after
> +       the ioctl returns to get the number of buffers allocated.
> +
> +    .. note::
> +
> +       Passing count = 1 is useful for letting the driver choose
> +       the minimum.
> +
> +    .. note::
> +
> +       To allocate more than minimum number of buffers (for pipeline
> +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE)`` to
> +       get minimum number of buffers required, and pass the obtained
> +       value plus the number of additional buffers needed in count
> +       to :c:func:`VIDIOC_REQBUFS`.
> +
> +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
> +
> +    a. Required fields: as per spec.
> +
> +    b. Return values: as per spec.
> +
> +Decoding
> +--------
> +
> +This state is reached after a successful initialization sequence. In
> +this state, client queues and dequeues buffers to both queues via
> +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, as per spec.
> +
> +Both queues operate independently. The client may queue and dequeue
> +buffers to queues in any order and at any rate, also at a rate different
> +for each queue. The client may queue buffers within the same queue in
> +any order (V4L2 index-wise). It is recommended for the client to operate
> +the queues independently for best performance.
> +
> +Source OUTPUT buffers must contain:
> +
> +-  H.264/AVC: one or more complete NALUs of an Annex B elementary
> +   stream; one buffer does not have to contain enough data to decode
> +   a frame;
> +
> +-  VP8/VP9: one or more complete frames.
> +
> +No direct relationship between source and destination buffers and the
> +timing of buffers becoming available to dequeue should be assumed in the
> +Stream API. Specifically:
> +
> +-  a buffer queued to OUTPUT queue may result in no buffers being
> +   produced on the CAPTURE queue (e.g. if it does not contain
> +   encoded data, or if only metadata syntax structures are present
> +   in it), or one or more buffers produced on the CAPTURE queue (if
> +   the encoded data contained more than one frame, or if returning a
> +   decoded frame allowed the driver to return a frame that preceded
> +   it in decode, but succeeded it in display order)
> +
> +-  a buffer queued to OUTPUT may result in a buffer being produced on
> +   the CAPTURE queue later into decode process, and/or after
> +   processing further OUTPUT buffers, or be returned out of order,
> +   e.g. if display reordering is used
> +
> +-  buffers may become available on the CAPTURE queue without additional
> +   buffers queued to OUTPUT (e.g. during flush or EOS)
> +
> +Seek
> +----
> +
> +Seek is controlled by the OUTPUT queue, as it is the source of bitstream
> +data. CAPTURE queue remains unchanged/unaffected.
> +
> +1. Stop the OUTPUT queue to begin the seek sequence via
> +   :c:func:`VIDIOC_STREAMOFF`.
> +
> +   a. Required fields:
> +
> +      i. type = OUTPUT
> +
> +   b. The driver must drop all the pending OUTPUT buffers and they are
> +      treated as returned to the client (as per spec).
> +
> +2. Restart the OUTPUT queue via :c:func:`VIDIOC_STREAMON`
> +
> +   a. Required fields:
> +
> +      i. type = OUTPUT
> +
> +   b. The driver must be put in a state after seek and be ready to
> +      accept new source bitstream buffers.
> +
> +3. Start queuing buffers to OUTPUT queue containing stream data after
> +   the seek until a suitable resume point is found.
> +
> +   .. note::
> +
> +      There is no requirement to begin queuing stream
> +      starting exactly from a resume point (e.g. SPS or a keyframe).

SPS, keyframe: are they in the glossary?

> +      The driver must handle any data queued and must keep processing
> +      the queued buffers until it finds a suitable resume point.
> +      While looking for a resume point, the driver processes OUTPUT
> +      buffers and returns them to the client without producing any
> +      decoded frames.
> +
> +4. After a resume point is found, the driver will start returning
> +   CAPTURE buffers with decoded frames.
> +
> +   .. note::
> +
> +      There is no precise specification for CAPTURE queue of when it
> +      will start producing buffers containing decoded data from
> +      buffers queued after the seek, as it operates independently
> +      from OUTPUT queue.
> +
> +      -  The driver is allowed to and may return a number of remaining CAPTURE
> +         buffers containing decoded frames from before the seek after the
> +         seek sequence (STREAMOFF-STREAMON) is performed.
> +
> +      -  The driver is also allowed to and may not return all decoded frames
> +         queued but not decode before the seek sequence was initiated.
> +         E.g. for an OUTPUT queue sequence: QBUF(A), QBUF(B),
> +         STREAMOFF(OUT), STREAMON(OUT), QBUF(G), QBUF(H), any of the
> +         following results on the CAPTURE queue is allowed: {A’, B’, G’,
> +         H’}, {A’, G’, H’}, {G’, H’}.
> +
> +Pause
> +-----
> +
> +In order to pause, the client should just cease queuing buffers onto the
> +OUTPUT queue. This is different from the general V4L2 API definition of
> +pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue. Without
> +source bitstream data, there is not data to process and the hardware

not data -> no data

> +remains idle. Conversely, using :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue
> +indicates a seek, which 1) drops all buffers in flight and 2) after a
> +subsequent :c:func:`VIDIOC_STREAMON` will look for and only continue from a
> +resume point. This is usually undesirable for pause. The
> +STREAMOFF-STREAMON sequence is intended for seeking.
> +
> +Similarly, CAPTURE queue should remain streaming as well, as the
> +STREAMOFF-STREAMON sequence on it is intended solely for changing buffer
> +sets
> +
> +Dynamic resolution change
> +-------------------------
> +
> +When driver encounters a resolution change in the stream, the dynamic
> +resolution change sequence is started.
> +
> +1.  On encountering a resolution change in the stream. The driver must

. The -> , the

> +    first process and decode all remaining buffers from before the
> +    resolution change point.
> +
> +2.  After all buffers containing decoded frames from before the
> +    resolution change point are ready to be dequeued on the
> +    CAPTURE queue, the driver sends a ``V4L2_EVENT_SOURCE_CHANGE``
> +    event for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> +    The last buffer from before the change must be marked with
> +    :c:type:`v4l2_buffer` ``flags`` flag ``V4L2_BUF_FLAG_LAST`` as in the flush
> +    sequence.
> +
> +    .. note::
> +
> +       Any attempts to dequeue more buffers beyond the buffer marked
> +       with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
> +       :c:func:`VIDIOC_DQBUF`.
> +
> +3.  After dequeuing all remaining buffers from the CAPTURE queue, the
> +    client must call :c:func:`VIDIOC_STREAMOFF` on the CAPTURE queue. The
> +    OUTPUT queue remains streaming (calling STREAMOFF on it would
> +    trigger a seek).
> +    Until STREAMOFF is called on the CAPTURE queue (acknowledging
> +    the event), the driver operates as if the resolution hasn’t
> +    changed yet, i.e. :c:func:`VIDIOC_G_FMT`, etc. return previous
> +    resolution.
> +
> +4.  The client frees the buffers on the CAPTURE queue using
> +    :c:func:`VIDIOC_REQBUFS`.
> +
> +    a. Required fields:
> +
> +       i.   count = 0
> +
> +       ii.  type = CAPTURE
> +
> +       iii. memory = as per spec
> +
> +5.  The client calls :c:func:`VIDIOC_G_FMT` for CAPTURE to get the new format
> +    information.
> +    This is identical to calling :c:func:`VIDIOC_G_FMT` after
> +    ``V4L2_EVENT_SRC_CH_RESOLUTION`` in the initialization
> +    sequence and should be handled similarly.
> +
> +    .. note::
> +
> +       It is allowed for the driver not to support the same
> +       pixelformat as previously used (before the resolution change)
> +       for the new resolution. The driver must select a default
> +       supported pixelformat and return it from :c:func:`VIDIOC_G_FMT`, and
> +       client must take note of it.
> +
> +6.  (optional) The client is allowed to enumerate available formats and
> +    select a different one than currently chosen (returned via
> +    :c:func:`VIDIOC_G_FMT)`. This is identical to a corresponding step in
> +    the initialization sequence.
> +
> +7.  (optional) The client acquires visible resolution as in
> +    initialization sequence.
> +
> +8.  (optional) The client acquires minimum number of buffers as in
> +    initialization sequence.
> +
> +9.  The client allocates a new set of buffers for the CAPTURE queue via
> +    :c:func:`VIDIOC_REQBUFS`. This is identical to a corresponding step in
> +    the initialization sequence.
> +
> +10. The client resumes decoding by issuing :c:func:`VIDIOC_STREAMON` on the
> +    CAPTURE queue.
> +
> +During the resolution change sequence, the OUTPUT queue must remain
> +streaming. Calling :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue will initiate seek.
> +
> +The OUTPUT queue operates separately from the CAPTURE queue for the
> +duration of the entire resolution change sequence. It is allowed (and
> +recommended for best performance and simplcity) for the client to keep

simplcity -> simplicity

> +queuing/dequeuing buffers from/to OUTPUT queue even while processing
> +this sequence.
> +
> +.. note::
> +
> +   It is also possible for this sequence to be triggered without
> +   change in resolution if a different number of CAPTURE buffers is
> +   required in order to continue decoding the stream.
> +
> +Flush
> +-----
> +
> +Flush is the process of draining the CAPTURE queue of any remaining
> +buffers. After the flush sequence is complete, the client has received
> +all decoded frames for all OUTPUT buffers queued before the sequence was
> +started.
> +
> +1. Begin flush by issuing :c:func:`VIDIOC_DECODER_CMD`.
> +
> +   a. Required fields:
> +
> +      i. cmd = ``V4L2_DEC_CMD_STOP``

Drivers should set the V4L2_DEC_CMD_STOP_IMMEDIATELY flag since I doubt any
m2m driver supports stopping at a specific pts.

They should also support VIDIOC_DECODER_CMD_TRY!

You can probably make default implementations in v4l2-mem2mem.c since the only
thing that I expect is supported is the STOP command with the STOP_IMMEDIATELY
flag set.

> +
> +2. The driver must process and decode as normal all OUTPUT buffers
> +   queued by the client before the :c:func:`VIDIOC_DECODER_CMD` was
> +   issued.
> +   Any operations triggered as a result of processing these
> +   buffers (including the initialization and resolution change
> +   sequences) must be processed as normal by both the driver and
> +   the client before proceeding with the flush sequence.
> +
> +3. Once all OUTPUT buffers queued before ``V4L2_DEC_CMD_STOP`` are
> +   processed:
> +
> +   a. If the CAPTURE queue is streaming, once all decoded frames (if
> +      any) are ready to be dequeued on the CAPTURE queue, the
> +      driver must send a ``V4L2_EVENT_EOS``. The driver must also
> +      set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer` ``flags`` field on the
> +      buffer on the CAPTURE queue containing the last frame (if
> +      any) produced as a result of processing the OUTPUT buffers
> +      queued before ``V4L2_DEC_CMD_STOP``. If no more frames are
> +      left to be returned at the point of handling
> +      ``V4L2_DEC_CMD_STOP``, the driver must return an empty buffer
> +      (with :c:type:`v4l2_buffer` ``bytesused`` = 0) as the last buffer with
> +      ``V4L2_BUF_FLAG_LAST`` set instead.
> +      Any attempts to dequeue more buffers beyond the buffer
> +      marked with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE
> +      error from :c:func:`VIDIOC_DQBUF`.
> +
> +   b. If the CAPTURE queue is NOT streaming, no action is necessary for
> +      CAPTURE queue and the driver must send a ``V4L2_EVENT_EOS``
> +      immediately after all OUTPUT buffers in question have been
> +      processed.
> +
> +4. To resume, client may issue ``V4L2_DEC_CMD_START``.
> +
> +End of stream
> +-------------
> +
> +When an explicit end of stream is encountered by the driver in the
> +stream, it must send a ``V4L2_EVENT_EOS`` to the client after all frames
> +are decoded and ready to be dequeued on the CAPTURE queue, with the
> +:c:type:`v4l2_buffer` ``flags`` set to ``V4L2_BUF_FLAG_LAST``. This behavior is
> +identical to the flush sequence as if triggered by the client via
> +``V4L2_DEC_CMD_STOP``.
> +
> +Commit points
> +-------------
> +
> +Setting formats and allocating buffers triggers changes in the behavior
> +of the driver.
> +
> +1. Setting format on OUTPUT queue may change the set of formats
> +   supported/advertised on the CAPTURE queue. It also must change
> +   the format currently selected on CAPTURE queue if it is not
> +   supported by the newly selected OUTPUT format to a supported one.
> +
> +2. Enumerating formats on CAPTURE queue must only return CAPTURE formats
> +   supported for the OUTPUT format currently set.
> +
> +3. Setting/changing format on CAPTURE queue does not change formats
> +   available on OUTPUT queue.

True.

 An attempt to set CAPTURE format that
> +   is not supported for the currently selected OUTPUT format must
> +   result in an error (-EINVAL) from :c:func:`VIDIOC_S_FMT`.

I'm not sure about that. I believe it is valid to replace it with the
first supported pixelformat. TRY_FMT certainly should do that.

> +
> +4. Enumerating formats on OUTPUT queue always returns a full set of
> +   supported formats, irrespective of the current format selected on
> +   CAPTURE queue.
> +
> +5. After allocating buffers on the OUTPUT queue, it is not possible to
> +   change format on it.
> +
> +To summarize, setting formats and allocation must always start with the
> +OUTPUT queue and the OUTPUT queue is the master that governs the set of
> +supported formats for the CAPTURE queue.
> diff --git a/Documentation/media/uapi/v4l/v4l2.rst b/Documentation/media/uapi/v4l/v4l2.rst
> index b89e5621ae69..563d5b861d1c 100644
> --- a/Documentation/media/uapi/v4l/v4l2.rst
> +++ b/Documentation/media/uapi/v4l/v4l2.rst
> @@ -53,6 +53,10 @@ Authors, in alphabetical order:
>  
>    - Original author of the V4L2 API and documentation.
>  
> +- Figa, Tomasz <tfiga@chromium.org>
> +
> +  - Documented parts of the V4L2 (stateful) Codec Interface. Migrated from Google Docs to kernel documentation.
> +
>  - H Schimek, Michael <mschimek@gmx.at>
>  
>    - Original author of the V4L2 API and documentation.
> @@ -65,6 +69,10 @@ Authors, in alphabetical order:
>  
>    - Designed and documented the multi-planar API.
>  
> +- Osciak, Pawel <posciak@chromium.org>
> +
> +  - Documented the V4L2 (stateful) Codec Interface.
> +
>  - Palosaari, Antti <crope@iki.fi>
>  
>    - SDR API.
> @@ -85,7 +93,7 @@ Authors, in alphabetical order:
>  
>    - Designed and documented the VIDIOC_LOG_STATUS ioctl, the extended control ioctls, major parts of the sliced VBI API, the MPEG encoder and decoder APIs and the DV Timings API.
>  
> -**Copyright** |copy| 1999-2016: Bill Dirks, Michael H. Schimek, Hans Verkuil, Martin Rubli, Andy Walls, Muralidharan Karicheri, Mauro Carvalho Chehab, Pawel Osciak, Sakari Ailus & Antti Palosaari.
> +**Copyright** |copy| 1999-2018: Bill Dirks, Michael H. Schimek, Hans Verkuil, Martin Rubli, Andy Walls, Muralidharan Karicheri, Mauro Carvalho Chehab, Pawel Osciak, Sakari Ailus & Antti Palosaari, Tomasz Figa.
>  
>  Except when explicitly stated as GPL, programming examples within this
>  part can be used and distributed without restrictions.
> @@ -94,6 +102,10 @@ part can be used and distributed without restrictions.
>  Revision History
>  ****************
>  
> +:revision: TBD / TBD (*tf*)
> +
> +Add specification of V4L2 Codec Interface UAPI.
> +
>  :revision: 4.10 / 2016-07-15 (*rr*)
>  
>  Introduce HSV formats.
> 

Regards,

	Hans
Philipp Zabel June 7, 2018, 11:01 a.m. UTC | #12
On Thu, 2018-06-07 at 10:47 +0200, Hans Verkuil wrote:
> Hi Tomasz,
> 
> First of all: thank you very much for working on this. It's a big missing piece of
> information, so filling this in is very helpful.
[...]
> > diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> > index c61e938bd8dc..0483b10c205e 100644
> > --- a/Documentation/media/uapi/v4l/dev-codec.rst
> > +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> > @@ -34,3 +34,774 @@ the codec and reprogram it whenever another file handler gets access.
> >  This is different from the usual video node behavior where the video
> >  properties are global to the device (i.e. changing something through one
> >  file handle is visible through another file handle).
> 
> To what extent does the information in this patch series apply specifically to
> video (de)compression hardware and to what extent it is applicable for any m2m
> device? It looks like most if not all is specific to video (de)compression hw
> and not to e.g. a simple deinterlacer.

Most of this is specific to codecs, or at least to mem2mem devices that
are not simple 1 input frame -> 1 output frame converters, and where
some parameters (resolution, colorspace) can be unknown in advance.

> Ideally there would be a common section first describing the requirements for
> all m2m devices, followed by an encoder and decoder section going into details
> for those specific devices.
> 
> I also think that we need an additional paragraph somewhere at the beginning
> of the Codec Interface chapter that explains more clearly that OUTPUT buffers
> send data to the hardware to be processed and that CAPTURE buffers contains
> the processed data. It is always confusing for newcomers to understand that
> in V4L2 this is seen from the point of view of the CPU.

Yes, please!

> > +EOS
> > +   end of stream
> > +
> > +input height
> > +   height in pixels for given input resolution
> 
> 'input' is a confusing name. Because I think this refers to the resolution
> set for the OUTPUT buffer. How about renaming this to 'source'?
>
> I.e.: an OUTPUT buffer contains the source data for the hardware. The capture
> buffer contains the sink data from the hardware.

This could be confusing as well to people used to the v4l2_subdev
sink/source pad model.

[...]
> > +Initialization sequence
> > +-----------------------
> > +
> > +1. (optional) Enumerate supported OUTPUT formats and resolutions. See
> > +   capability enumeration.
> > +
> > +2. Set a coded format on the source queue via :c:func:`VIDIOC_S_FMT`
> > +
> > +   a. Required fields:
> > +
> > +      i.   type = OUTPUT
> > +
> > +      ii.  fmt.pix_mp.pixelformat set to a coded format
> > +
> > +      iii. fmt.pix_mp.width, fmt.pix_mp.height only if cannot be
> > +           parsed from the stream for the given coded format;
> > +           ignored otherwise;
> > +
> > +   b. Return values:
> > +
> > +      i.  EINVAL: unsupported format.
> > +
> > +      ii. Others: per spec
> > +
> > +   .. note::
> > +
> > +      The driver must not adjust pixelformat, so if
> > +      ``V4L2_PIX_FMT_H264`` is passed but only
> > +      ``V4L2_PIX_FMT_H264_SLICE`` is supported, S_FMT will return
> > +      -EINVAL. If both are acceptable by client, calling S_FMT for
> > +      the other after one gets rejected may be required (or use
> > +      :c:func:`VIDIOC_ENUM_FMT` to discover beforehand, see Capability
> > +      enumeration).
> 
> This needs to be documented in S_FMT as well.
> 
> What will TRY_FMT do? Return EINVAL as well, or replace the pixelformat?
> 
> Should this be a general rule for output devices that S_FMT (and perhaps TRY_FMT)
> fail with EINVAL if the pixelformat is not supported? There is something to be
> said for that.

Why is trying to set an unsupported pixelformat an error, but trying to
set an unsupported resolution is not?

[...]
> > +5.  Begin parsing the stream for stream metadata via :c:func:`VIDIOC_STREAMON` on
> > +    OUTPUT queue. This step allows the driver to parse/decode
> > +    initial stream metadata until enough information to allocate
> > +    CAPTURE buffers is found. This is indicated by the driver by
> > +    sending a ``V4L2_EVENT_SOURCE_CHANGE`` event, which the client
> > +    must handle.
> > +
> > +    a. Required fields: as per spec.
> > +
> > +    b. Return values: as per spec.
> > +
> > +    .. note::
> > +
> > +       Calling :c:func:`VIDIOC_REQBUFS`, :c:func:`VIDIOC_STREAMON`
> > +       or :c:func:`VIDIOC_G_FMT` on the CAPTURE queue at this time is not
> > +       allowed and must return EINVAL.
> 
> I dislike EINVAL here. It is too generic. Also, the passed arguments can be
> perfectly valid, you just aren't in the right state. EPERM might be better.
[...]
> > +
> > +    a. Required fields: as per spec.
> > +
> > +    b. Return values: as per spec.
> > +
> > +    c. If data in a buffer that triggers the event is required to decode
> > +       the first frame, the driver must not return it to the client,
> > +       but must retain it for further decoding.
> > +
> > +    d. Until the resolution source event is sent to the client, calling
> > +       :c:func:`VIDIOC_G_FMT` on the CAPTURE queue must return -EINVAL.
> 
> EPERM?

I dislike returning an error here at all. If the queues are independent,
why can't the capture queue STREAMON and just not produce any frames?

If we guessed format and frame size correctly (maybe we know something
about the stream from container metadata), the capture queue can just
start producing images. If not, we get a source change event.

regards
Philipp
Nicolas Dufresne June 7, 2018, 5:30 p.m. UTC | #13
Thanks Tomasz for this work.

The following is my first read review, please ignore my comments if
they already have been mentioned by others or discussed, I'll catchup
on the appropriate threads later on.

Le mardi 05 juin 2018 à 19:33 +0900, Tomasz Figa a écrit :
> Due to complexity of the video decoding process, the V4L2 drivers of
> stateful decoder hardware require specific sequencies of V4L2 API calls
> to be followed. These include capability enumeration, initialization,
> decoding, seek, pause, dynamic resolution change, flush and end of
> stream.
> 
> Specifics of the above have been discussed during Media Workshops at
> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> originated at those events was later implemented by the drivers we already
> have merged in mainline, such as s5p-mfc or mtk-vcodec.
> 
> The only thing missing was the real specification included as a part of
> Linux Media documentation. Fix it now and document the decoder part of
> the Codec API.
> 
> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> ---
>  Documentation/media/uapi/v4l/dev-codec.rst | 771 +++++++++++++++++++++
>  Documentation/media/uapi/v4l/v4l2.rst      |  14 +-
>  2 files changed, 784 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> index c61e938bd8dc..0483b10c205e 100644
> --- a/Documentation/media/uapi/v4l/dev-codec.rst
> +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> @@ -34,3 +34,774 @@ the codec and reprogram it whenever another file handler gets access.
>  This is different from the usual video node behavior where the video
>  properties are global to the device (i.e. changing something through one
>  file handle is visible through another file handle).
> +
> +This interface is generally appropriate for hardware that does not
> +require additional software involvement to parse/partially decode/manage
> +the stream before/after processing in hardware.
> +
> +Input data to the Stream API are buffers containing unprocessed video
> +stream (Annex-B H264/H265 stream, raw VP8/9 stream) only. The driver is

We should probably use HEVC instead of H265, as this is the name we
have picked for that format.

> +expected not to require any additional information from the client to
> +process these buffers, and to return decoded frames on the CAPTURE queue
> +in display order.

It might confused some users with the fact that first buffer for non-
bytestream formats is special and must contain only the headers (VP8/9
and H264_NO_SC which is also known as H264 AVC, the format used in
ISOMP4). Also, these formats must be framed by userspace, as it's not
possible to divide the frames/nal later on. I would suggest to be a bit
less strict in the introduction here.

> +
> +Performing software parsing, processing etc. of the stream in the driver
> +in order to support stream API is strongly discouraged. In such case use
> +of Stateless Codec Interface (in development) is preferred.
> +
> +Conventions and notation used in this document
> +==============================================
> +
> +1. The general V4L2 API rules apply if not specified in this document
> +   otherwise.
> +
> +2. The meaning of words “must”, “may”, “should”, etc. is as per RFC
> +   2119.
> +
> +3. All steps not marked “optional” are required.
> +
> +4. :c:func:`VIDIOC_G_EXT_CTRLS`, :c:func:`VIDIOC_S_EXT_CTRLS` may be used interchangeably with
> +   :c:func:`VIDIOC_G_CTRL`, :c:func:`VIDIOC_S_CTRL`, unless specified otherwise.
> +
> +5. Single-plane API (see spec) and applicable structures may be used
> +   interchangeably with Multi-plane API, unless specified otherwise.
> +
> +6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
> +   [0..2]: i = 0, 1, 2.
> +
> +7. For OUTPUT buffer A, A’ represents a buffer on the CAPTURE queue
> +   containing data (decoded or encoded frame/stream) that resulted
> +   from processing buffer A.
> +
> +Glossary
> +========
> +
> +CAPTURE
> +   the destination buffer queue, decoded frames for
> +   decoders, encoded bitstream for encoders;
> +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE`` or
> +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``
> +
> +client
> +   application client communicating with the driver
> +   implementing this API
> +
> +coded format
> +   encoded/compressed video bitstream format (e.g.
> +   H.264, VP8, etc.); see raw format; this is not equivalent to fourcc
> +   (V4L2 pixelformat), as each coded format may be supported by multiple
> +   fourccs (e.g. ``V4L2_PIX_FMT_H264``, ``V4L2_PIX_FMT_H264_SLICE``, etc.)
> +
> +coded height
> +   height for given coded resolution
> +
> +coded resolution
> +   stream resolution in pixels aligned to codec
> +   format and hardware requirements; see also visible resolution
> +
> +coded width
> +   width for given coded resolution
> +
> +decode order
> +   the order in which frames are decoded; may differ
> +   from display (output) order if frame reordering (B frames) is active in
> +   the stream; OUTPUT buffers must be queued in decode order; for frame
> +   API, CAPTURE buffers must be returned by the driver in decode order;
> +
> +display order
> +   the order in which frames must be displayed
> +   (outputted); for stream API, CAPTURE buffers must be returned by the
> +   driver in display order;
> +
> +EOS
> +   end of stream
> +
> +input height
> +   height in pixels for given input resolution
> +
> +input resolution
> +   resolution in pixels of source frames being input
> +   to the encoder and subject to further cropping to the bounds of visible
> +   resolution
> +
> +input width
> +   width in pixels for given input resolution
> +
> +OUTPUT
> +   the source buffer queue, encoded bitstream for
> +   decoders, raw frames for encoders; ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
> +   ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``
> +
> +raw format
> +   uncompressed format containing raw pixel data (e.g.
> +   YUV, RGB formats)
> +
> +resume point
> +   a point in the bitstream from which decoding may
> +   start/continue, without any previous state/data present, e.g.: a
> +   keyframe (VPX) or SPS/PPS/IDR sequence (H.264); a resume point is
> +   required to start decode of a new stream, or to resume decoding after a
> +   seek;

I would prefer synchronisation point, but resume point is also good.
The description makes it obvious, thanks.

> +
> +source buffer
> +   buffers allocated for source queue
> +
> +source queue
> +   queue containing buffers used for source data, i.e.
> +
> +visible height
> +   height for given visible resolution

I do believe 'display width/height/resolution' is more common.

> +
> +visible resolution
> +   stream resolution of the visible picture, in
> +   pixels, to be used for display purposes; must be smaller or equal to
> +   coded resolution;
> +
> +visible width
> +   width for given visible resolution
> +
> +Decoder
> +=======
> +
> +Querying capabilities
> +---------------------
> +
> +1. To enumerate the set of coded formats supported by the driver, the
> +   client uses :c:func:`VIDIOC_ENUM_FMT` for OUTPUT. The driver must always
> +   return the full set of supported formats, irrespective of the
> +   format set on the CAPTURE queue.
> +
> +2. To enumerate the set of supported raw formats, the client uses
> +   :c:func:`VIDIOC_ENUM_FMT` for CAPTURE. The driver must return only the
> +   formats supported for the format currently set on the OUTPUT
> +   queue.
> +   In order to enumerate raw formats supported by a given coded
> +   format, the client must first set that coded format on the
> +   OUTPUT queue and then enumerate the CAPTURE queue.

As of today, GStreamer expects an initial state, before the first
S_FMT(OUTPUT) that results in all possible formats regardless. Later
on, after S_FMT(OUTPUT) + header buffers has been passed, a new
enumeration is done, and is expected to return a subset (or the same
list). If a better output format then the one chosen by the driver is
found, it will be tried, if not supported, it will simply keep the
driver selected output format. This way, drivers don't need to do extra
work if their output format is completely fixed by the input/headers.
The only upstream driver that have this flexibility is CODA. To be
fair, we don't in GStreamer need to know about the output format, it's
simply exposed to fail earlier if users tries to connect to elements
that are incompatible by nature. We could just remove that initial
probing and it would still work as expected. I think probing all the
output format is not that of a good idea, with the profiles and level
it becomes all very complex.

> +
> +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> +   resolutions for a given format, passing its fourcc in
> +   :c:type:`v4l2_frmivalenum` ``pixel_format``.

Good thing this is a may, since it's all very complex and not that
useful with the levels and profiles. Userspace can figure-out really if
needed.

> +
> +   a. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for coded formats
> +      must be maximums for given coded format for all supported raw
> +      formats.
> +
> +   b. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for raw formats must
> +      be maximums for given raw format for all supported coded
> +      formats.
> +
> +   c. The client should derive the supported resolution for a
> +      combination of coded+raw format by calculating the
> +      intersection of resolutions returned from calls to
> +      :c:func:`VIDIOC_ENUM_FRAMESIZES` for the given coded and raw formats.
> +
> +4. Supported profiles and levels for given format, if applicable, may be
> +   queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> +
> +5. The client may use :c:func:`VIDIOC_ENUM_FRAMEINTERVALS` to enumerate maximum
> +   supported framerates by the driver/hardware for a given
> +   format+resolution combination.

I think we'll need to add a section to help with that one. All drivers
supports ranges in fps. Venus have this bug were it sets a range with a
step of 1/1, but because we expose frame intervals instead of
framerate, the result is not as expected. If you want an interval
between 1 and 60 fps, that would be from 1/60s to 1/1s, there is no
valid step that can be used, you are forced to use CONTINUOUS, or
DISCRETE.

> +
> +Initialization sequence
> +-----------------------
> +
> +1. (optional) Enumerate supported OUTPUT formats and resolutions. See
> +   capability enumeration.
> +
> +2. Set a coded format on the source queue via :c:func:`VIDIOC_S_FMT`
> +
> +   a. Required fields:
> +
> +      i.   type = OUTPUT

In the introduction, maybe we could say that we use OUTPUT and CAPTURE
to mean both format (with and without MPLANE ?).

> +
> +      ii.  fmt.pix_mp.pixelformat set to a coded format
> +
> +      iii. fmt.pix_mp.width, fmt.pix_mp.height only if cannot be
> +           parsed from the stream for the given coded format;
> +           ignored otherwise;

GStreamer passes the display size, as the display size found in the
bitstream maybe not match the selected display size by the container
(e.g. ISOMP4/Matroska). I'm not sure what drivers endup doing, it was
not really thought through, we later query the selection to know the
display size. We could follow this new rule by not passing anything and
then simply picking the smallest from bitstream display size and
container display size. I'm just giving a reference of what existing
userspace may be doing at the moment, as we'll have to care about
breaking existing software when implementing this.

> +
> +   b. Return values:
> +
> +      i.  EINVAL: unsupported format.
> +
> +      ii. Others: per spec
> +
> +   .. note::
> +
> +      The driver must not adjust pixelformat, so if
> +      ``V4L2_PIX_FMT_H264`` is passed but only
> +      ``V4L2_PIX_FMT_H264_SLICE`` is supported, S_FMT will return
> +      -EINVAL. If both are acceptable by client, calling S_FMT for
> +      the other after one gets rejected may be required (or use
> +      :c:func:`VIDIOC_ENUM_FMT` to discover beforehand, see Capability
> +      enumeration).

Ok, that's new, in GStreamer we validate that the format haven't been
changed. Should be backward compatible though. What we don't do though
is check back the OUTPUT format after setting the CAPTURE format, that
would seem totally invalid. You mention that this isn't allowed later
on, so that's great.

> +
> +3.  (optional) Get minimum number of buffers required for OUTPUT queue
> +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> +    more buffers than minimum required by hardware/format (see
> +    allocation).

I have never seen such restriction on a decoder, though it's optional
here, so probably fine.

> +
> +    a. Required fields:
> +
> +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> +
> +    b. Return values: per spec.
> +
> +    c. Return fields:
> +
> +       i. value: required number of OUTPUT buffers for the currently set
> +          format;
> +
> +4.  Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on OUTPUT
> +    queue.
> +
> +    a. Required fields:
> +
> +       i.   count = n, where n > 0.
> +
> +       ii.  type = OUTPUT
> +
> +       iii. memory = as per spec
> +
> +    b. Return values: Per spec.
> +
> +    c. Return fields:
> +
> +       i. count: adjusted to allocated number of buffers
> +
> +    d. The driver must adjust count to minimum of required number of
> +       source buffers for given format and count passed. The client
> +       must check this value after the ioctl returns to get the
> +       number of buffers allocated.
> +
> +    .. note::
> +
> +       Passing count = 1 is useful for letting the driver choose
> +       the minimum according to the selected format/hardware
> +       requirements.

This raises a question, should V4L2_CID_MIN_BUFFERS_FOR_OUTPUT really
be the minimum, or min+1. Since REQBUFS is likely to allocate min+1 to
be efficient ? Allocating just the minimum, means that the decoder will
always be idle while the userspace is handling an output.

> +
> +    .. note::
> +
> +       To allocate more than minimum number of buffers (for pipeline
> +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT)`` to
> +       get minimum number of buffers required by the driver/format,
> +       and pass the obtained value plus the number of additional
> +       buffers needed in count to :c:func:`VIDIOC_REQBUFS`.
> +
> +5.  Begin parsing the stream for stream metadata via :c:func:`VIDIOC_STREAMON` on
> +    OUTPUT queue. This step allows the driver to parse/decode
> +    initial stream metadata until enough information to allocate
> +    CAPTURE buffers is found. This is indicated by the driver by
> +    sending a ``V4L2_EVENT_SOURCE_CHANGE`` event, which the client
> +    must handle.

GStreamer still uses legacy path, expecting G_FMT to block if there is
headers in the queue. Do we want to document this legacy method or not
?

> +
> +    a. Required fields: as per spec.
> +
> +    b. Return values: as per spec.
> +
> +    .. note::
> +
> +       Calling :c:func:`VIDIOC_REQBUFS`, :c:func:`VIDIOC_STREAMON`
> +       or :c:func:`VIDIOC_G_FMT` on the CAPTURE queue at this time is not
> +       allowed and must return EINVAL.
> +
> +6.  This step only applies for coded formats that contain resolution
> +    information in the stream.
> +    Continue queuing/dequeuing bitstream buffers to/from the
> +    OUTPUT queue via :c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`. The driver
> +    must keep processing and returning each buffer to the client
> +    until required metadata to send a ``V4L2_EVENT_SOURCE_CHANGE``
> +    for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION`` is
> +    found. There is no requirement to pass enough data for this to
> +    occur in the first buffer and the driver must be able to
> +    process any number
> +
> +    a. Required fields: as per spec.
> +
> +    b. Return values: as per spec.
> +
> +    c. If data in a buffer that triggers the event is required to decode
> +       the first frame, the driver must not return it to the client,
> +       but must retain it for further decoding.
> +
> +    d. Until the resolution source event is sent to the client, calling
> +       :c:func:`VIDIOC_G_FMT` on the CAPTURE queue must return -EINVAL.
> +
> +    .. note::
> +
> +       No decoded frames are produced during this phase.
> +
> +7.  This step only applies for coded formats that contain resolution
> +    information in the stream.
> +    Receive and handle ``V4L2_EVENT_SOURCE_CHANGE`` from the driver
> +    via :c:func:`VIDIOC_DQEVENT`. The driver must send this event once
> +    enough data is obtained from the stream to allocate CAPTURE
> +    buffers and to begin producing decoded frames.
> +
> +    a. Required fields:
> +
> +       i. type = ``V4L2_EVENT_SOURCE_CHANGE``
> +
> +    b. Return values: as per spec.
> +
> +    c. The driver must return u.src_change.changes =
> +       ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> +
> +8.  This step only applies for coded formats that contain resolution
> +    information in the stream.
> +    Call :c:func:`VIDIOC_G_FMT` for CAPTURE queue to get format for the
> +    destination buffers parsed/decoded from the bitstream.
> +
> +    a. Required fields:
> +
> +       i. type = CAPTURE
> +
> +    b. Return values: as per spec.
> +
> +    c. Return fields:
> +
> +       i.   fmt.pix_mp.width, fmt.pix_mp.height: coded resolution
> +            for the decoded frames
> +
> +       ii.  fmt.pix_mp.pixelformat: default/required/preferred by
> +            driver pixelformat for decoded frames.
> +
> +       iii. num_planes: set to number of planes for pixelformat.
> +
> +       iv.  For each plane p = [0, num_planes-1]:
> +            plane_fmt[p].sizeimage, plane_fmt[p].bytesperline as
> +            per spec for coded resolution.
> +
> +    .. note::
> +
> +       Te value of pixelformat may be any pixel format supported,
> +       and must
> +       be supported for current stream, based on the information
> +       parsed from the stream and hardware capabilities. It is
> +       suggested that driver chooses the preferred/optimal format
> +       for given configuration. For example, a YUV format may be
> +       preferred over an RGB format, if additional conversion step
> +       would be required.
> +
> +9.  (optional) Enumerate CAPTURE formats via :c:func:`VIDIOC_ENUM_FMT` on
> +    CAPTURE queue.
> +    Once the stream information is parsed and known, the client
> +    may use this ioctl to discover which raw formats are supported
> +    for given stream and select on of them via :c:func:`VIDIOC_S_FMT`.
> +
> +    a. Fields/return values as per spec.
> +
> +    .. note::
> +
> +       The driver must return only formats supported for the
> +       current stream parsed in this initialization sequence, even
> +       if more formats may be supported by the driver in general.
> +       For example, a driver/hardware may support YUV and RGB
> +       formats for resolutions 1920x1088 and lower, but only YUV for
> +       higher resolutions (e.g. due to memory bandwidth
> +       limitations). After parsing a resolution of 1920x1088 or
> +       lower, :c:func:`VIDIOC_ENUM_FMT` may return a set of YUV and RGB
> +       pixelformats, but after parsing resolution higher than
> +       1920x1088, the driver must not return (unsupported for this
> +       resolution) RGB.
> +
> +       However, subsequent resolution change event
> +       triggered after discovering a resolution change within the
> +       same stream may switch the stream into a lower resolution;
> +       :c:func:`VIDIOC_ENUM_FMT` must return RGB formats again in that case.
> +
> +10.  (optional) Choose a different CAPTURE format than suggested via
> +     :c:func:`VIDIOC_S_FMT` on CAPTURE queue. It is possible for the client
> +     to choose a different format than selected/suggested by the
> +     driver in :c:func:`VIDIOC_G_FMT`.
> +
> +     a. Required fields:
> +
> +        i.  type = CAPTURE
> +
> +        ii. fmt.pix_mp.pixelformat set to a coded format
> +
> +     b. Return values:
> +
> +        i. EINVAL: unsupported format.
> +
> +     c. Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available formats
> +        after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to find
> +        out a set of allowed pixelformats for given configuration,
> +        but not required.
> +
> +11.  (optional) Acquire visible resolution via :c:func:`VIDIOC_G_SELECTION`.
> +
> +    a. Required fields:
> +
> +       i.  type = CAPTURE
> +
> +       ii. target = ``V4L2_SEL_TGT_CROP``
> +
> +    b. Return values: per spec.
> +
> +    c. Return fields
> +
> +       i. r.left, r.top, r.width, r.height: visible rectangle; this must
> +          fit within coded resolution returned from :c:func:`VIDIOC_G_FMT`.
> +
> +12. (optional) Get minimum number of buffers required for CAPTURE queue
> +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> +    more buffers than minimum required by hardware/format (see
> +    allocation).

Should not be optional if the driver have this restriction.

> +
> +    a. Required fields:
> +
> +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
> +
> +    b. Return values: per spec.
> +
> +    c. Return fields:
> +
> +       i. value: minimum number of buffers required to decode the stream
> +          parsed in this initialization sequence.
> +
> +    .. note::
> +
> +       Note that the minimum number of buffers must be at least the
> +       number required to successfully decode the current stream.
> +       This may for example be the required DPB size for an H.264
> +       stream given the parsed stream configuration (resolution,
> +       level).
> +
> +13. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS` on the
> +    CAPTURE queue.
> +
> +    a. Required fields:
> +
> +       i.   count = n, where n > 0.
> +
> +       ii.  type = CAPTURE
> +
> +       iii. memory = as per spec
> +
> +    b. Return values: Per spec.
> +
> +    c. Return fields:
> +
> +       i. count: adjusted to allocated number of buffers.
> +
> +    d. The driver must adjust count to minimum of required number of
> +       destination buffers for given format and stream configuration
> +       and the count passed. The client must check this value after
> +       the ioctl returns to get the number of buffers allocated.
> +
> +    .. note::
> +
> +       Passing count = 1 is useful for letting the driver choose
> +       the minimum.
> +
> +    .. note::
> +
> +       To allocate more than minimum number of buffers (for pipeline
> +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE)`` to
> +       get minimum number of buffers required, and pass the obtained
> +       value plus the number of additional buffers needed in count
> +       to :c:func:`VIDIOC_REQBUFS`.
> +
> +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
> +
> +    a. Required fields: as per spec.
> +
> +    b. Return values: as per spec.
> +
> +Decoding
> +--------
> +
> +This state is reached after a successful initialization sequence. In
> +this state, client queues and dequeues buffers to both queues via
> +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, as per spec.
> +
> +Both queues operate independently. The client may queue and dequeue
> +buffers to queues in any order and at any rate, also at a rate different
> +for each queue. The client may queue buffers within the same queue in
> +any order (V4L2 index-wise). It is recommended for the client to operate
> +the queues independently for best performance.
> +
> +Source OUTPUT buffers must contain:
> +
> +-  H.264/AVC: one or more complete NALUs of an Annex B elementary
> +   stream; one buffer does not have to contain enough data to decode
> +   a frame;
> +
> +-  VP8/VP9: one or more complete frames.
> +
> +No direct relationship between source and destination buffers and the
> +timing of buffers becoming available to dequeue should be assumed in the
> +Stream API. Specifically:
> +
> +-  a buffer queued to OUTPUT queue may result in no buffers being
> +   produced on the CAPTURE queue (e.g. if it does not contain
> +   encoded data, or if only metadata syntax structures are present
> +   in it), or one or more buffers produced on the CAPTURE queue (if
> +   the encoded data contained more than one frame, or if returning a
> +   decoded frame allowed the driver to return a frame that preceded
> +   it in decode, but succeeded it in display order)
> +
> +-  a buffer queued to OUTPUT may result in a buffer being produced on
> +   the CAPTURE queue later into decode process, and/or after
> +   processing further OUTPUT buffers, or be returned out of order,
> +   e.g. if display reordering is used
> +
> +-  buffers may become available on the CAPTURE queue without additional
> +   buffers queued to OUTPUT (e.g. during flush or EOS)

There is no mention of timestamp passing and
V4L2_BUF_FLAG_TIMESTAMP_COPY. These though are rather important
respectively to match decoded frames with appropriate metadata and to
discard stored metadata from the userspace queue.

Unlike the suggestion here, most decoder are frame base, it would be
nice to check if this is an actual firmware limitation in certain
cases.

> +
> +Seek
> +----
> +
> +Seek is controlled by the OUTPUT queue, as it is the source of bitstream
> +data. CAPTURE queue remains unchanged/unaffected.
> +
> +1. Stop the OUTPUT queue to begin the seek sequence via
> +   :c:func:`VIDIOC_STREAMOFF`.
> +
> +   a. Required fields:
> +
> +      i. type = OUTPUT
> +
> +   b. The driver must drop all the pending OUTPUT buffers and they are
> +      treated as returned to the client (as per spec).
> +
> +2. Restart the OUTPUT queue via :c:func:`VIDIOC_STREAMON`
> +
> +   a. Required fields:
> +
> +      i. type = OUTPUT
> +
> +   b. The driver must be put in a state after seek and be ready to
> +      accept new source bitstream buffers.
> +
> +3. Start queuing buffers to OUTPUT queue containing stream data after
> +   the seek until a suitable resume point is found.
> +
> +   .. note::
> +
> +      There is no requirement to begin queuing stream
> +      starting exactly from a resume point (e.g. SPS or a keyframe).
> +      The driver must handle any data queued and must keep processing
> +      the queued buffers until it finds a suitable resume point.
> +      While looking for a resume point, the driver processes OUTPUT
> +      buffers and returns them to the client without producing any
> +      decoded frames.

I have some doubts that this actually works. What you describe here is
a flush/reset seqeuence. The drivers I have worked with totally forgets
about their state after STREAMOFF on any queues. The initialization
process need to happen again. Though, adding this support with just
resetting the OUTPUT queue should be backward compatible. GStreamer
always STREAMOFF on both sides.

> +
> +4. After a resume point is found, the driver will start returning
> +   CAPTURE buffers with decoded frames.
> +
> +   .. note::
> +
> +      There is no precise specification for CAPTURE queue of when it
> +      will start producing buffers containing decoded data from
> +      buffers queued after the seek, as it operates independently
> +      from OUTPUT queue.

Also, in practice it is totally un-reliable to start from random point.
 Some decoder will produce corrupted frame, some will wait, you never
known. Seek code in ffmpeg, gstreamer, vlc, etc. always pick a good
sync point. Then marks the extra as "decode only", hence the need for
matching input/output for metadata, and drops the extra.
> +
> +      -  The driver is allowed to and may return a number of remaining CAPTURE
> +         buffers containing decoded frames from before the seek after the
> +         seek sequence (STREAMOFF-STREAMON) is performed.

This is not a proper seek. That's probably why we also streamoff the
capture queue to get rid of these ancient buffers. This seems only
useful if you are trying to do seamless seeking (aka non flushing
seek), which is a very niche use case.

> +
> +      -  The driver is also allowed to and may not return all decoded frames
> +         queued but not decode before the seek sequence was initiated.
> +         E.g. for an OUTPUT queue sequence: QBUF(A), QBUF(B),
> +         STREAMOFF(OUT), STREAMON(OUT), QBUF(G), QBUF(H), any of the
> +         following results on the CAPTURE queue is allowed: {A’, B’, G’,
> +         H’}, {A’, G’, H’}, {G’, H’}.
> +
> +Pause
> +-----
> +
> +In order to pause, the client should just cease queuing buffers onto the
> +OUTPUT queue. This is different from the general V4L2 API definition of
> +pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue. Without
> +source bitstream data, there is not data to process and the hardware
> +remains idle. Conversely, using :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue
> +indicates a seek, which 1) drops all buffers in flight and 2) after a
> +subsequent :c:func:`VIDIOC_STREAMON` will look for and only continue from a
> +resume point. This is usually undesirable for pause. The
> +STREAMOFF-STREAMON sequence is intended for seeking.
> +
> +Similarly, CAPTURE queue should remain streaming as well, as the
> +STREAMOFF-STREAMON sequence on it is intended solely for changing buffer
> +sets
> +
> +Dynamic resolution change
> +-------------------------
> +
> +When driver encounters a resolution change in the stream, the dynamic
> +resolution change sequence is started.
> +
> +1.  On encountering a resolution change in the stream. The driver must
> +    first process and decode all remaining buffers from before the
> +    resolution change point.
> +
> +2.  After all buffers containing decoded frames from before the
> +    resolution change point are ready to be dequeued on the
> +    CAPTURE queue, the driver sends a ``V4L2_EVENT_SOURCE_CHANGE``
> +    event for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> +    The last buffer from before the change must be marked with
> +    :c:type:`v4l2_buffer` ``flags`` flag ``V4L2_BUF_FLAG_LAST`` as in the flush
> +    sequence.
> +
> +    .. note::
> +
> +       Any attempts to dequeue more buffers beyond the buffer marked
> +       with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
> +       :c:func:`VIDIOC_DQBUF`.
> +
> +3.  After dequeuing all remaining buffers from the CAPTURE queue, the
> +    client must call :c:func:`VIDIOC_STREAMOFF` on the CAPTURE queue. The
> +    OUTPUT queue remains streaming (calling STREAMOFF on it would
> +    trigger a seek).
> +    Until STREAMOFF is called on the CAPTURE queue (acknowledging
> +    the event), the driver operates as if the resolution hasn’t
> +    changed yet, i.e. :c:func:`VIDIOC_G_FMT`, etc. return previous
> +    resolution.

It's a bit more complicated, if the resolution goes bigger, the encoded
buffer size needed to fit a full frame may be bidger. Reallocation of
the output queue may be needed. In some memory constraint device, we'll
also want to reallocate if it's going smaller. FFMPEG implement
something clever for selected the size, CODA driver does the same but
in the driver. It's a bit of a mess.

In the long term, for gapless change (specially with CMA) we might want
to support using larger buffer in the CAPTURE queue to avoid
reallocation. OMX supports this.

> +
> +4.  The client frees the buffers on the CAPTURE queue using
> +    :c:func:`VIDIOC_REQBUFS`.
> +
> +    a. Required fields:
> +
> +       i.   count = 0
> +
> +       ii.  type = CAPTURE
> +
> +       iii. memory = as per spec
> +
> +5.  The client calls :c:func:`VIDIOC_G_FMT` for CAPTURE to get the new format
> +    information.
> +    This is identical to calling :c:func:`VIDIOC_G_FMT` after
> +    ``V4L2_EVENT_SRC_CH_RESOLUTION`` in the initialization
> +    sequence and should be handled similarly.
> +
> +    .. note::
> +
> +       It is allowed for the driver not to support the same
> +       pixelformat as previously used (before the resolution change)
> +       for the new resolution. The driver must select a default
> +       supported pixelformat and return it from :c:func:`VIDIOC_G_FMT`, and
> +       client must take note of it.
> +
> +6.  (optional) The client is allowed to enumerate available formats and
> +    select a different one than currently chosen (returned via
> +    :c:func:`VIDIOC_G_FMT)`. This is identical to a corresponding step in
> +    the initialization sequence.
> +
> +7.  (optional) The client acquires visible resolution as in
> +    initialization sequence.
> +
> +8.  (optional) The client acquires minimum number of buffers as in
> +    initialization sequence.
> +
> +9.  The client allocates a new set of buffers for the CAPTURE queue via
> +    :c:func:`VIDIOC_REQBUFS`. This is identical to a corresponding step in
> +    the initialization sequence.
> +
> +10. The client resumes decoding by issuing :c:func:`VIDIOC_STREAMON` on the
> +    CAPTURE queue.
> +
> +During the resolution change sequence, the OUTPUT queue must remain
> +streaming. Calling :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue will initiate seek.
> +
> +The OUTPUT queue operates separately from the CAPTURE queue for the
> +duration of the entire resolution change sequence. It is allowed (and
> +recommended for best performance and simplcity) for the client to keep
> +queuing/dequeuing buffers from/to OUTPUT queue even while processing
> +this sequence.
> +
> +.. note::
> +
> +   It is also possible for this sequence to be triggered without
> +   change in resolution if a different number of CAPTURE buffers is
> +   required in order to continue decoding the stream.

Perhaps the driver should be queried for the new display resolution
through G_SELECTION ?

> +
> +Flush
> +-----
> +
> +Flush is the process of draining the CAPTURE queue of any remaining

Ok, call this Drain if it's the process of draining, it's really
confusing as GStreamer makes a distinction between flush (getting rid
of, like a reset) and draining (which involved displaying the
remaining, but stop producing new data).

> +buffers. After the flush sequence is complete, the client has received
> +all decoded frames for all OUTPUT buffers queued before the sequence was
> +started.
> +
> +1. Begin flush by issuing :c:func:`VIDIOC_DECODER_CMD`.
> +
> +   a. Required fields:
> +
> +      i. cmd = ``V4L2_DEC_CMD_STOP``
> +
> +2. The driver must process and decode as normal all OUTPUT buffers
> +   queued by the client before the :c:func:`VIDIOC_DECODER_CMD` was
> +   issued.
> +   Any operations triggered as a result of processing these
> +   buffers (including the initialization and resolution change
> +   sequences) must be processed as normal by both the driver and
> +   the client before proceeding with the flush sequence.
> +
> +3. Once all OUTPUT buffers queued before ``V4L2_DEC_CMD_STOP`` are
> +   processed:
> +
> +   a. If the CAPTURE queue is streaming, once all decoded frames (if
> +      any) are ready to be dequeued on the CAPTURE queue, the
> +      driver must send a ``V4L2_EVENT_EOS``. The driver must also

I have never used the EOS event, and I bet many drivers don't implement
it, why is that a must ?

> +      set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer` ``flags`` field on the

In MFC we don't know, so we use the other method, EPIPE, there will be
no FLAG_LAST on MFC, it's just not possible with that firmware. So
FLAG_LAST is preferred, EPIPE is the fallback.

> +      buffer on the CAPTURE queue containing the last frame (if
> +      any) produced as a result of processing the OUTPUT buffers
> +      queued before ``V4L2_DEC_CMD_STOP``. If no more frames are
> +      left to be returned at the point of handling
> +      ``V4L2_DEC_CMD_STOP``, the driver must return an empty buffer
> +      (with :c:type:`v4l2_buffer` ``bytesused`` = 0) as the last buffer with
> +      ``V4L2_BUF_FLAG_LAST`` set instead.
> +      Any attempts to dequeue more buffers beyond the buffer
> +      marked with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE
> +      error from :c:func:`VIDIOC_DQBUF`.
> +
> +   b. If the CAPTURE queue is NOT streaming, no action is necessary for
> +      CAPTURE queue and the driver must send a ``V4L2_EVENT_EOS``
> +      immediately after all OUTPUT buffers in question have been
> +      processed.
> +
> +4. To resume, client may issue ``V4L2_DEC_CMD_START``.
> +
> +End of stream
> +-------------
> +
> +When an explicit end of stream is encountered by the driver in the
> +stream, it must send a ``V4L2_EVENT_EOS`` to the client after all frames
> +are decoded and ready to be dequeued on the CAPTURE queue, with the
> +:c:type:`v4l2_buffer` ``flags`` set to ``V4L2_BUF_FLAG_LAST``. This behavior is
> +identical to the flush sequence as if triggered by the client via
> +``V4L2_DEC_CMD_STOP``.

I never heard of such a thing as an implicit EOS, can you elaborate ?

> +
> +Commit points
> +-------------
> +
> +Setting formats and allocating buffers triggers changes in the behavior
> +of the driver.
> +
> +1. Setting format on OUTPUT queue may change the set of formats
> +   supported/advertised on the CAPTURE queue. It also must change
> +   the format currently selected on CAPTURE queue if it is not
> +   supported by the newly selected OUTPUT format to a supported one.
> +
> +2. Enumerating formats on CAPTURE queue must only return CAPTURE formats
> +   supported for the OUTPUT format currently set.
> +
> +3. Setting/changing format on CAPTURE queue does not change formats
> +   available on OUTPUT queue. An attempt to set CAPTURE format that
> +   is not supported for the currently selected OUTPUT format must
> +   result in an error (-EINVAL) from :c:func:`VIDIOC_S_FMT`.

That's great clarification !

> +
> +4. Enumerating formats on OUTPUT queue always returns a full set of
> +   supported formats, irrespective of the current format selected on
> +   CAPTURE queue.
> +
> +5. After allocating buffers on the OUTPUT queue, it is not possible to
> +   change format on it.
> +
> +To summarize, setting formats and allocation must always start with the
> +OUTPUT queue and the OUTPUT queue is the master that governs the set of
> +supported formats for the CAPTURE queue.
> diff --git a/Documentation/media/uapi/v4l/v4l2.rst b/Documentation/media/uapi/v4l/v4l2.rst
> index b89e5621ae69..563d5b861d1c 100644
> --- a/Documentation/media/uapi/v4l/v4l2.rst
> +++ b/Documentation/media/uapi/v4l/v4l2.rst
> @@ -53,6 +53,10 @@ Authors, in alphabetical order:
>  
>    - Original author of the V4L2 API and documentation.
>  
> +- Figa, Tomasz <tfiga@chromium.org>
> +
> +  - Documented parts of the V4L2 (stateful) Codec Interface. Migrated from Google Docs to kernel documentation.
> +
>  - H Schimek, Michael <mschimek@gmx.at>
>  
>    - Original author of the V4L2 API and documentation.
> @@ -65,6 +69,10 @@ Authors, in alphabetical order:
>  
>    - Designed and documented the multi-planar API.
>  
> +- Osciak, Pawel <posciak@chromium.org>
> +
> +  - Documented the V4L2 (stateful) Codec Interface.
> +
>  - Palosaari, Antti <crope@iki.fi>
>  
>    - SDR API.
> @@ -85,7 +93,7 @@ Authors, in alphabetical order:
>  
>    - Designed and documented the VIDIOC_LOG_STATUS ioctl, the extended control ioctls, major parts of the sliced VBI API, the MPEG encoder and decoder APIs and the DV Timings API.
>  
> -**Copyright** |copy| 1999-2016: Bill Dirks, Michael H. Schimek, Hans Verkuil, Martin Rubli, Andy Walls, Muralidharan Karicheri, Mauro Carvalho Chehab, Pawel Osciak, Sakari Ailus & Antti Palosaari.
> +**Copyright** |copy| 1999-2018: Bill Dirks, Michael H. Schimek, Hans Verkuil, Martin Rubli, Andy Walls, Muralidharan Karicheri, Mauro Carvalho Chehab, Pawel Osciak, Sakari Ailus & Antti Palosaari, Tomasz Figa.
>  
>  Except when explicitly stated as GPL, programming examples within this
>  part can be used and distributed without restrictions.
> @@ -94,6 +102,10 @@ part can be used and distributed without restrictions.
>  Revision History
>  ****************
>  
> +:revision: TBD / TBD (*tf*)
> +
> +Add specification of V4L2 Codec Interface UAPI.
> +
>  :revision: 4.10 / 2016-07-15 (*rr*)
>  
>  Introduce HSV formats.
Nicolas Dufresne June 7, 2018, 5:49 p.m. UTC | #14
Le jeudi 07 juin 2018 à 16:27 +0900, Tomasz Figa a écrit :
> > > I'd say no, but I guess that would mean that the driver never
> > > encounters it, because hardware wouldn't report it.
> > > 
> > > I wonder would happen in such case, though. Obviously decoding of such
> > > stream couldn't continue without support in the driver.
> > 
> > GStreamer supports decoding of variable resolution streams without
> > driver support by just stopping and restarting streaming completely.
> 
> What about userspace that doesn't parse the stream on its own? Do we
> want to impose the requirement of full bitstream parsing even for
> hardware that can just do it itself?

We do it this way in GStreamer because we can and is more reliable with
existing drivers. I do think that the driver driven renegotiation is
superior as it allow a lot more optimization. Full reset is a just the
slowest possible method of renegotiating. It is not visually fantastic
with dynamic streams, like DASH and HLS. Though, we should think of a
way driver can signal that this renegotiation is supported.

Nicolas
Nicolas Dufresne June 7, 2018, 5:53 p.m. UTC | #15
Le jeudi 07 juin 2018 à 16:30 +0900, Tomasz Figa a écrit :
> > > v4l2-compliance (so probably one for Hans).
> > > testUnlimitedOpens tries opening the device 100 times. On a normal
> > > device this isn't a significant overhead, but when you're allocating
> > > resources on a per instance basis it quickly adds up.
> > > Internally I have state that has a limit of 64 codec instances (either
> > > encode or decode), so either I allocate at start_streaming and fail on
> > > the 65th one, or I fail on open. I generally take the view that
> > > failing early is a good thing.
> > > Opinions? Is 100 instances of an M2M device really sensible?
> > 
> > Resources should not be allocated by the driver until needed (i.e. the
> > queue_setup op is a good place for that).
> > 
> > It is perfectly legal to open a video node just to call QUERYCAP to
> > see what it is, and I don't expect that to allocate any hardware resources.
> > And if I want to open it 100 times, then that should just work.
> > 
> > It is *always* wrong to limit the number of open arbitrarily.
> 
> That's a valid point indeed. Besides the querying use case, userspace
> might just want to pre-open a bigger number of instances, but it
> doesn't mean that they would be streaming all at the same time indeed.

We have used in GStreamer the open() failure to be able to fallback to
software when the instances are exhausted. The pros was it fails really
early, so falling back is easy. If you remove this, it might not fail
before STREAMON. At least in GStreamer, it too late to fallback to
software.  So I don't have better idea then limiting on Open calls.

Nicolas
Hans Verkuil June 7, 2018, 7:36 p.m. UTC | #16
On 06/07/2018 07:53 PM, Nicolas Dufresne wrote:
> Le jeudi 07 juin 2018 à 16:30 +0900, Tomasz Figa a écrit :
>>>> v4l2-compliance (so probably one for Hans).
>>>> testUnlimitedOpens tries opening the device 100 times. On a normal
>>>> device this isn't a significant overhead, but when you're allocating
>>>> resources on a per instance basis it quickly adds up.
>>>> Internally I have state that has a limit of 64 codec instances (either
>>>> encode or decode), so either I allocate at start_streaming and fail on
>>>> the 65th one, or I fail on open. I generally take the view that
>>>> failing early is a good thing.
>>>> Opinions? Is 100 instances of an M2M device really sensible?
>>>
>>> Resources should not be allocated by the driver until needed (i.e. the
>>> queue_setup op is a good place for that).
>>>
>>> It is perfectly legal to open a video node just to call QUERYCAP to
>>> see what it is, and I don't expect that to allocate any hardware resources.
>>> And if I want to open it 100 times, then that should just work.
>>>
>>> It is *always* wrong to limit the number of open arbitrarily.
>>
>> That's a valid point indeed. Besides the querying use case, userspace
>> might just want to pre-open a bigger number of instances, but it
>> doesn't mean that they would be streaming all at the same time indeed.
> 
> We have used in GStreamer the open() failure to be able to fallback to
> software when the instances are exhausted. The pros was it fails really
> early, so falling back is easy. If you remove this, it might not fail
> before STREAMON. At least in GStreamer, it too late to fallback to
> software.  So I don't have better idea then limiting on Open calls.

It should fail when you call REQBUFS. That's the point at which you commit
to allocating resources. Everything before that is just querying things.

STREAMON is way too late, but REQBUFS/CREATE_BUFS (i.e. when queue_setup
is called) is a good point. You already allocate memory there, you can
also claim the m2m hw resource(s) you need.

Regards,

	Hans
Tomasz Figa June 8, 2018, 9:03 a.m. UTC | #17
Hi Hans,

On Thu, Jun 7, 2018 at 5:48 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> Hi Tomasz,
>
> First of all: thank you very much for working on this. It's a big missing piece of
> information, so filling this in is very helpful.

Thanks for review!

>
> On 06/05/2018 12:33 PM, Tomasz Figa wrote:
> > Due to complexity of the video decoding process, the V4L2 drivers of
> > stateful decoder hardware require specific sequencies of V4L2 API calls
> > to be followed. These include capability enumeration, initialization,
> > decoding, seek, pause, dynamic resolution change, flush and end of
> > stream.
> >
> > Specifics of the above have been discussed during Media Workshops at
> > LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> > Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> > originated at those events was later implemented by the drivers we already
> > have merged in mainline, such as s5p-mfc or mtk-vcodec.
> >
> > The only thing missing was the real specification included as a part of
> > Linux Media documentation. Fix it now and document the decoder part of
> > the Codec API.
> >
> > Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> > ---
> >  Documentation/media/uapi/v4l/dev-codec.rst | 771 +++++++++++++++++++++
> >  Documentation/media/uapi/v4l/v4l2.rst      |  14 +-
> >  2 files changed, 784 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> > index c61e938bd8dc..0483b10c205e 100644
> > --- a/Documentation/media/uapi/v4l/dev-codec.rst
> > +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> > @@ -34,3 +34,774 @@ the codec and reprogram it whenever another file handler gets access.
> >  This is different from the usual video node behavior where the video
> >  properties are global to the device (i.e. changing something through one
> >  file handle is visible through another file handle).
>
> To what extent does the information in this patch series apply specifically to
> video (de)compression hardware and to what extent it is applicable for any m2m
> device? It looks like most if not all is specific to video (de)compression hw
> and not to e.g. a simple deinterlacer.

It is specifically written for stateful codecs, i.e. those that can
work on bitstream directly.

>
> Ideally there would be a common section first describing the requirements for
> all m2m devices, followed by an encoder and decoder section going into details
> for those specific devices.

I wonder if we can say too much in general about "all m2m devices".
The simple m2m devices (scalers, deinterlacers) do not have much in
common with codecs that operate in a quite complicated manner (and so
need all the things defined below).

This brings quite an interesting question of whether we can really
call such simple m2m device "a V4L2 codec" as the original text of
dev-codec.rst does. I guess it depends on the convention we agree on,
but I personally have only heard the term "codec" in context of
audio/video/etc. compression.

>
> I also think that we need an additional paragraph somewhere at the beginning
> of the Codec Interface chapter that explains more clearly that OUTPUT buffers
> send data to the hardware to be processed and that CAPTURE buffers contains
> the processed data. It is always confusing for newcomers to understand that
> in V4L2 this is seen from the point of view of the CPU.

I believe this is included in the glossary below, although using a
slightly different wording that doesn't involve CPU.

[snip]
> > +
> > +EOS
> > +   end of stream
> > +
> > +input height
> > +   height in pixels for given input resolution
>
> 'input' is a confusing name. Because I think this refers to the resolution
> set for the OUTPUT buffer. How about renaming this to 'source'?
>
> I.e.: an OUTPUT buffer contains the source data for the hardware. The capture
> buffer contains the sink data from the hardware.

Yes, indeed, "source" sounds more logical.

>
> > +
> > +input resolution
> > +   resolution in pixels of source frames being input
>
> "source resolution
>         resolution in pixels of source frames passed"
>
> > +   to the encoder and subject to further cropping to the bounds of visible
> > +   resolution
> > +
> > +input width
> > +   width in pixels for given input resolution
> > +
> > +OUTPUT
> > +   the source buffer queue, encoded bitstream for
> > +   decoders, raw frames for encoders; ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
> > +   ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``
> > +
> > +raw format
> > +   uncompressed format containing raw pixel data (e.g.
> > +   YUV, RGB formats)
> > +
> > +resume point
> > +   a point in the bitstream from which decoding may
> > +   start/continue, without any previous state/data present, e.g.: a
> > +   keyframe (VPX) or SPS/PPS/IDR sequence (H.264); a resume point is
> > +   required to start decode of a new stream, or to resume decoding after a
> > +   seek;
> > +
> > +source buffer
> > +   buffers allocated for source queue
>
> "OUTPUT buffers allocated..."

Ack.

>
> > +
> > +source queue
> > +   queue containing buffers used for source data, i.e.
>
> Line suddenly ends.
>
> I'd say: "queue containing OUTPUT buffers"

Ack.

[snip]
> > +Initialization sequence
> > +-----------------------
> > +
> > +1. (optional) Enumerate supported OUTPUT formats and resolutions. See
> > +   capability enumeration.
> > +
> > +2. Set a coded format on the source queue via :c:func:`VIDIOC_S_FMT`
> > +
> > +   a. Required fields:
> > +
> > +      i.   type = OUTPUT
> > +
> > +      ii.  fmt.pix_mp.pixelformat set to a coded format
> > +
> > +      iii. fmt.pix_mp.width, fmt.pix_mp.height only if cannot be
> > +           parsed from the stream for the given coded format;
> > +           ignored otherwise;
> > +
> > +   b. Return values:
> > +
> > +      i.  EINVAL: unsupported format.
> > +
> > +      ii. Others: per spec
> > +
> > +   .. note::
> > +
> > +      The driver must not adjust pixelformat, so if
> > +      ``V4L2_PIX_FMT_H264`` is passed but only
> > +      ``V4L2_PIX_FMT_H264_SLICE`` is supported, S_FMT will return
> > +      -EINVAL. If both are acceptable by client, calling S_FMT for
> > +      the other after one gets rejected may be required (or use
> > +      :c:func:`VIDIOC_ENUM_FMT` to discover beforehand, see Capability
> > +      enumeration).
>
> This needs to be documented in S_FMT as well.
>
> What will TRY_FMT do? Return EINVAL as well, or replace the pixelformat?
>
> Should this be a general rule for output devices that S_FMT (and perhaps TRY_FMT)
> fail with EINVAL if the pixelformat is not supported? There is something to be
> said for that.

I think this was covered by other reviewers already and I believe we
should stick to the general semantics of TRY_/S_FMT, which are
specified to never return error if unsupported values are given (and
silently adjust to supported ones). I don't see any reason to make
codecs different from that - userspace can just check if the pixel
format matches what was set.

[snip]
> > +5.  Begin parsing the stream for stream metadata via :c:func:`VIDIOC_STREAMON` on
> > +    OUTPUT queue. This step allows the driver to parse/decode
> > +    initial stream metadata until enough information to allocate
> > +    CAPTURE buffers is found. This is indicated by the driver by
> > +    sending a ``V4L2_EVENT_SOURCE_CHANGE`` event, which the client
> > +    must handle.
> > +
> > +    a. Required fields: as per spec.
> > +
> > +    b. Return values: as per spec.
> > +
> > +    .. note::
> > +
> > +       Calling :c:func:`VIDIOC_REQBUFS`, :c:func:`VIDIOC_STREAMON`
> > +       or :c:func:`VIDIOC_G_FMT` on the CAPTURE queue at this time is not
> > +       allowed and must return EINVAL.
>
> I dislike EINVAL here. It is too generic. Also, the passed arguments can be
> perfectly valid, you just aren't in the right state. EPERM might be better.

The problem of hardware that can't parse the resolution or software
that wants to pre-allocate buffers was brought up in different
replies. I think we might want to revise this in general, but I agree
that EPERM sounds better than EINVAL in this context.

>
> > +
> > +6.  This step only applies for coded formats that contain resolution
> > +    information in the stream.
> > +    Continue queuing/dequeuing bitstream buffers to/from the
> > +    OUTPUT queue via :c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`. The driver
> > +    must keep processing and returning each buffer to the client
> > +    until required metadata to send a ``V4L2_EVENT_SOURCE_CHANGE``
> > +    for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION`` is
> > +    found.
>
> This sentence is confusing. It's not clear what you mean here.

The point is that userspace needs to keep providing new bitstream data
until the header can be parsed.

>
>  There is no requirement to pass enough data for this to
> > +    occur in the first buffer and the driver must be able to
> > +    process any number
>
> Missing period at the end of the sentence.
>
> > +
> > +    a. Required fields: as per spec.
> > +
> > +    b. Return values: as per spec.
> > +
> > +    c. If data in a buffer that triggers the event is required to decode
> > +       the first frame, the driver must not return it to the client,
> > +       but must retain it for further decoding.
> > +
> > +    d. Until the resolution source event is sent to the client, calling
> > +       :c:func:`VIDIOC_G_FMT` on the CAPTURE queue must return -EINVAL.
>
> EPERM?

Ack. +/- the problem of userspace that wants to pre-allocate CAPTURE queue.

Although when I think of it now, such userspace would set coded
resolution on OUTPUT queue and then driver could instantly signal
source change event on CAPTURE queue even before the hardware finishes
the parsing. If what the hardware parses doesn't match what the
userspace set, yet another event would be signaled.

>
> > +
> > +    .. note::
> > +
> > +       No decoded frames are produced during this phase.
> > +
> > +7.  This step only applies for coded formats that contain resolution
>
> applies to  (same elsewhere)
>
> > +    information in the stream.
> > +    Receive and handle ``V4L2_EVENT_SOURCE_CHANGE`` from the driver
> > +    via :c:func:`VIDIOC_DQEVENT`. The driver must send this event once
> > +    enough data is obtained from the stream to allocate CAPTURE
> > +    buffers and to begin producing decoded frames.
> > +
> > +    a. Required fields:
> > +
> > +       i. type = ``V4L2_EVENT_SOURCE_CHANGE``
> > +
> > +    b. Return values: as per spec.
> > +
> > +    c. The driver must return u.src_change.changes =
> > +       ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> > +
> > +8.  This step only applies for coded formats that contain resolution
> > +    information in the stream.
> > +    Call :c:func:`VIDIOC_G_FMT` for CAPTURE queue to get format for the
> > +    destination buffers parsed/decoded from the bitstream.
> > +
> > +    a. Required fields:
> > +
> > +       i. type = CAPTURE
> > +
> > +    b. Return values: as per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i.   fmt.pix_mp.width, fmt.pix_mp.height: coded resolution
> > +            for the decoded frames
> > +
> > +       ii.  fmt.pix_mp.pixelformat: default/required/preferred by
> > +            driver pixelformat for decoded frames.
> > +
> > +       iii. num_planes: set to number of planes for pixelformat.
> > +
> > +       iv.  For each plane p = [0, num_planes-1]:
> > +            plane_fmt[p].sizeimage, plane_fmt[p].bytesperline as
> > +            per spec for coded resolution.
> > +
> > +    .. note::
> > +
> > +       Te value of pixelformat may be any pixel format supported,
> > +       and must
> > +       be supported for current stream, based on the information
> > +       parsed from the stream and hardware capabilities. It is
> > +       suggested that driver chooses the preferred/optimal format
> > +       for given configuration. For example, a YUV format may be
> > +       preferred over an RGB format, if additional conversion step
> > +       would be required.
> > +
> > +9.  (optional) Enumerate CAPTURE formats via :c:func:`VIDIOC_ENUM_FMT` on
> > +    CAPTURE queue.
> > +    Once the stream information is parsed and known, the client
> > +    may use this ioctl to discover which raw formats are supported
> > +    for given stream and select on of them via :c:func:`VIDIOC_S_FMT`.
> > +
> > +    a. Fields/return values as per spec.
> > +
> > +    .. note::
> > +
> > +       The driver must return only formats supported for the
> > +       current stream parsed in this initialization sequence, even
> > +       if more formats may be supported by the driver in general.
> > +       For example, a driver/hardware may support YUV and RGB
> > +       formats for resolutions 1920x1088 and lower, but only YUV for
> > +       higher resolutions (e.g. due to memory bandwidth
> > +       limitations). After parsing a resolution of 1920x1088 or
> > +       lower, :c:func:`VIDIOC_ENUM_FMT` may return a set of YUV and RGB
> > +       pixelformats, but after parsing resolution higher than
> > +       1920x1088, the driver must not return (unsupported for this
> > +       resolution) RGB.
> > +
> > +       However, subsequent resolution change event
> > +       triggered after discovering a resolution change within the
> > +       same stream may switch the stream into a lower resolution;
> > +       :c:func:`VIDIOC_ENUM_FMT` must return RGB formats again in that case.
> > +
> > +10.  (optional) Choose a different CAPTURE format than suggested via
> > +     :c:func:`VIDIOC_S_FMT` on CAPTURE queue. It is possible for the client
> > +     to choose a different format than selected/suggested by the
> > +     driver in :c:func:`VIDIOC_G_FMT`.
> > +
> > +     a. Required fields:
> > +
> > +        i.  type = CAPTURE
> > +
> > +        ii. fmt.pix_mp.pixelformat set to a coded format
> > +
> > +     b. Return values:
> > +
> > +        i. EINVAL: unsupported format.
>
> Or replace it with a supported format. I'm inclined to do that instead of
> returning EINVAL.

Agreed.

>
> > +
> > +     c. Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available formats
> > +        after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to find
> > +        out a set of allowed pixelformats for given configuration,
> > +        but not required.
> > +
> > +11.  (optional) Acquire visible resolution via :c:func:`VIDIOC_G_SELECTION`.
> > +
> > +    a. Required fields:
> > +
> > +       i.  type = CAPTURE
> > +
> > +       ii. target = ``V4L2_SEL_TGT_CROP``
>
> I don't think this is the right selection target to use, but I think others
> commented on that already.

Yes, Philipp brought this topic before and we had some further
exchange on it, which I think would benefit from you taking a look. :)

>
> > +
> > +    b. Return values: per spec.
> > +
> > +    c. Return fields
> > +
> > +       i. r.left, r.top, r.width, r.height: visible rectangle; this must
> > +          fit within coded resolution returned from :c:func:`VIDIOC_G_FMT`.
> > +
> > +12. (optional) Get minimum number of buffers required for CAPTURE queue
> > +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> > +    more buffers than minimum required by hardware/format (see
> > +    allocation).
> > +
> > +    a. Required fields:
> > +
> > +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
> > +
> > +    b. Return values: per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i. value: minimum number of buffers required to decode the stream
> > +          parsed in this initialization sequence.
> > +
> > +    .. note::
> > +
> > +       Note that the minimum number of buffers must be at least the
> > +       number required to successfully decode the current stream.
> > +       This may for example be the required DPB size for an H.264
>
> Is DPB in the glossary?

Need to add indeed.

[snip]
> > +Seek
> > +----
> > +
> > +Seek is controlled by the OUTPUT queue, as it is the source of bitstream
> > +data. CAPTURE queue remains unchanged/unaffected.
> > +
> > +1. Stop the OUTPUT queue to begin the seek sequence via
> > +   :c:func:`VIDIOC_STREAMOFF`.
> > +
> > +   a. Required fields:
> > +
> > +      i. type = OUTPUT
> > +
> > +   b. The driver must drop all the pending OUTPUT buffers and they are
> > +      treated as returned to the client (as per spec).
> > +
> > +2. Restart the OUTPUT queue via :c:func:`VIDIOC_STREAMON`
> > +
> > +   a. Required fields:
> > +
> > +      i. type = OUTPUT
> > +
> > +   b. The driver must be put in a state after seek and be ready to
> > +      accept new source bitstream buffers.
> > +
> > +3. Start queuing buffers to OUTPUT queue containing stream data after
> > +   the seek until a suitable resume point is found.
> > +
> > +   .. note::
> > +
> > +      There is no requirement to begin queuing stream
> > +      starting exactly from a resume point (e.g. SPS or a keyframe).
>
> SPS, keyframe: are they in the glossary?

Will add.

[snip]
> > +Flush
> > +-----
> > +
> > +Flush is the process of draining the CAPTURE queue of any remaining
> > +buffers. After the flush sequence is complete, the client has received
> > +all decoded frames for all OUTPUT buffers queued before the sequence was
> > +started.
> > +
> > +1. Begin flush by issuing :c:func:`VIDIOC_DECODER_CMD`.
> > +
> > +   a. Required fields:
> > +
> > +      i. cmd = ``V4L2_DEC_CMD_STOP``
>
> Drivers should set the V4L2_DEC_CMD_STOP_IMMEDIATELY flag since I doubt any
> m2m driver supports stopping at a specific pts.

The documentation says:

"If V4L2_DEC_CMD_STOP_IMMEDIATELY is set, then the decoder stops
immediately (ignoring the pts value), otherwise it will keep decoding
until timestamp >= pts or until the last of the pending data from its
internal buffers was decoded."

also for the pts field:

"Stop playback at this pts or immediately if the playback is already
past that timestamp. Leave to 0 if you want to stop after the last
frame was decoded."

What we want the decoder to do here is to "keep decoding [...] until
the last of the pending data from its internal buffers was decoded",
which looks like something happening exactly without
V4L2_DEC_CMD_STOP_IMMEDIATELY when pts is set to 0.

>
> They should also support VIDIOC_DECODER_CMD_TRY!

Agreed.

>
> You can probably make default implementations in v4l2-mem2mem.c since the only
> thing that I expect is supported is the STOP command with the STOP_IMMEDIATELY
> flag set.

Is there any useful case for STOP_IMMEDIATELY with m2m decoders? The
only thing I can think of is some kind of power management trick that
could stop the decoder until the client collect enough OUTPUT buffers
to make it process in longer batch. Still, that could be done by the
client just holding on with QBUF(OUTPUT) until enough data to fill the
desired number of buffers is collected.

[snip]
> > +Commit points
> > +-------------
> > +
> > +Setting formats and allocating buffers triggers changes in the behavior
> > +of the driver.
> > +
> > +1. Setting format on OUTPUT queue may change the set of formats
> > +   supported/advertised on the CAPTURE queue. It also must change
> > +   the format currently selected on CAPTURE queue if it is not
> > +   supported by the newly selected OUTPUT format to a supported one.
> > +
> > +2. Enumerating formats on CAPTURE queue must only return CAPTURE formats
> > +   supported for the OUTPUT format currently set.
> > +
> > +3. Setting/changing format on CAPTURE queue does not change formats
> > +   available on OUTPUT queue.
>
> True.
>
>  An attempt to set CAPTURE format that
> > +   is not supported for the currently selected OUTPUT format must
> > +   result in an error (-EINVAL) from :c:func:`VIDIOC_S_FMT`.
>
> I'm not sure about that. I believe it is valid to replace it with the
> first supported pixelformat. TRY_FMT certainly should do that.

As above, we probably should stay consistent with general semantics.

Best regards,
Tomasz
Hans Verkuil June 8, 2018, 10:13 a.m. UTC | #18
On 06/08/2018 11:03 AM, Tomasz Figa wrote:
> Hi Hans,
> 
> On Thu, Jun 7, 2018 at 5:48 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>>
>> Hi Tomasz,
>>
>> First of all: thank you very much for working on this. It's a big missing piece of
>> information, so filling this in is very helpful.
> 
> Thanks for review!
> 
>>
>> On 06/05/2018 12:33 PM, Tomasz Figa wrote:
>>> Due to complexity of the video decoding process, the V4L2 drivers of
>>> stateful decoder hardware require specific sequencies of V4L2 API calls
>>> to be followed. These include capability enumeration, initialization,
>>> decoding, seek, pause, dynamic resolution change, flush and end of
>>> stream.
>>>
>>> Specifics of the above have been discussed during Media Workshops at
>>> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
>>> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
>>> originated at those events was later implemented by the drivers we already
>>> have merged in mainline, such as s5p-mfc or mtk-vcodec.
>>>
>>> The only thing missing was the real specification included as a part of
>>> Linux Media documentation. Fix it now and document the decoder part of
>>> the Codec API.
>>>
>>> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
>>> ---
>>>  Documentation/media/uapi/v4l/dev-codec.rst | 771 +++++++++++++++++++++
>>>  Documentation/media/uapi/v4l/v4l2.rst      |  14 +-
>>>  2 files changed, 784 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
>>> index c61e938bd8dc..0483b10c205e 100644
>>> --- a/Documentation/media/uapi/v4l/dev-codec.rst
>>> +++ b/Documentation/media/uapi/v4l/dev-codec.rst
>>> @@ -34,3 +34,774 @@ the codec and reprogram it whenever another file handler gets access.
>>>  This is different from the usual video node behavior where the video
>>>  properties are global to the device (i.e. changing something through one
>>>  file handle is visible through another file handle).
>>
>> To what extent does the information in this patch series apply specifically to
>> video (de)compression hardware and to what extent it is applicable for any m2m
>> device? It looks like most if not all is specific to video (de)compression hw
>> and not to e.g. a simple deinterlacer.
> 
> It is specifically written for stateful codecs, i.e. those that can
> work on bitstream directly.
> 
>>
>> Ideally there would be a common section first describing the requirements for
>> all m2m devices, followed by an encoder and decoder section going into details
>> for those specific devices.
> 
> I wonder if we can say too much in general about "all m2m devices".
> The simple m2m devices (scalers, deinterlacers) do not have much in
> common with codecs that operate in a quite complicated manner (and so
> need all the things defined below).
> 
> This brings quite an interesting question of whether we can really
> call such simple m2m device "a V4L2 codec" as the original text of
> dev-codec.rst does. I guess it depends on the convention we agree on,
> but I personally have only heard the term "codec" in context of
> audio/video/etc. compression.

It's for historical reasons that this is called the "Codec Interface" in the
spec. I wouldn't mind at all if it was renamed to "Memory-to-Memory Interface".

But perhaps that's better done as a separate final patch.

> 
>>
>> I also think that we need an additional paragraph somewhere at the beginning
>> of the Codec Interface chapter that explains more clearly that OUTPUT buffers
>> send data to the hardware to be processed and that CAPTURE buffers contains
>> the processed data. It is always confusing for newcomers to understand that
>> in V4L2 this is seen from the point of view of the CPU.
> 
> I believe this is included in the glossary below, although using a
> slightly different wording that doesn't involve CPU.
> 
> [snip]
>>> +
>>> +EOS
>>> +   end of stream
>>> +
>>> +input height
>>> +   height in pixels for given input resolution
>>
>> 'input' is a confusing name. Because I think this refers to the resolution
>> set for the OUTPUT buffer. How about renaming this to 'source'?
>>
>> I.e.: an OUTPUT buffer contains the source data for the hardware. The capture
>> buffer contains the sink data from the hardware.
> 
> Yes, indeed, "source" sounds more logical.
> 
>>
>>> +
>>> +input resolution
>>> +   resolution in pixels of source frames being input
>>
>> "source resolution
>>         resolution in pixels of source frames passed"
>>
>>> +   to the encoder and subject to further cropping to the bounds of visible
>>> +   resolution
>>> +
>>> +input width
>>> +   width in pixels for given input resolution
>>> +
>>> +OUTPUT
>>> +   the source buffer queue, encoded bitstream for
>>> +   decoders, raw frames for encoders; ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
>>> +   ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``
>>> +
>>> +raw format
>>> +   uncompressed format containing raw pixel data (e.g.
>>> +   YUV, RGB formats)
>>> +
>>> +resume point
>>> +   a point in the bitstream from which decoding may
>>> +   start/continue, without any previous state/data present, e.g.: a
>>> +   keyframe (VPX) or SPS/PPS/IDR sequence (H.264); a resume point is
>>> +   required to start decode of a new stream, or to resume decoding after a
>>> +   seek;
>>> +
>>> +source buffer
>>> +   buffers allocated for source queue
>>
>> "OUTPUT buffers allocated..."
> 
> Ack.
> 
>>
>>> +
>>> +source queue
>>> +   queue containing buffers used for source data, i.e.
>>
>> Line suddenly ends.
>>
>> I'd say: "queue containing OUTPUT buffers"
> 
> Ack.
> 
> [snip]
>>> +Initialization sequence
>>> +-----------------------
>>> +
>>> +1. (optional) Enumerate supported OUTPUT formats and resolutions. See
>>> +   capability enumeration.
>>> +
>>> +2. Set a coded format on the source queue via :c:func:`VIDIOC_S_FMT`
>>> +
>>> +   a. Required fields:
>>> +
>>> +      i.   type = OUTPUT
>>> +
>>> +      ii.  fmt.pix_mp.pixelformat set to a coded format
>>> +
>>> +      iii. fmt.pix_mp.width, fmt.pix_mp.height only if cannot be
>>> +           parsed from the stream for the given coded format;
>>> +           ignored otherwise;
>>> +
>>> +   b. Return values:
>>> +
>>> +      i.  EINVAL: unsupported format.
>>> +
>>> +      ii. Others: per spec
>>> +
>>> +   .. note::
>>> +
>>> +      The driver must not adjust pixelformat, so if
>>> +      ``V4L2_PIX_FMT_H264`` is passed but only
>>> +      ``V4L2_PIX_FMT_H264_SLICE`` is supported, S_FMT will return
>>> +      -EINVAL. If both are acceptable by client, calling S_FMT for
>>> +      the other after one gets rejected may be required (or use
>>> +      :c:func:`VIDIOC_ENUM_FMT` to discover beforehand, see Capability
>>> +      enumeration).
>>
>> This needs to be documented in S_FMT as well.
>>
>> What will TRY_FMT do? Return EINVAL as well, or replace the pixelformat?
>>
>> Should this be a general rule for output devices that S_FMT (and perhaps TRY_FMT)
>> fail with EINVAL if the pixelformat is not supported? There is something to be
>> said for that.
> 
> I think this was covered by other reviewers already and I believe we
> should stick to the general semantics of TRY_/S_FMT, which are
> specified to never return error if unsupported values are given (and
> silently adjust to supported ones). I don't see any reason to make
> codecs different from that - userspace can just check if the pixel
> format matches what was set.
> 
> [snip]
>>> +5.  Begin parsing the stream for stream metadata via :c:func:`VIDIOC_STREAMON` on
>>> +    OUTPUT queue. This step allows the driver to parse/decode
>>> +    initial stream metadata until enough information to allocate
>>> +    CAPTURE buffers is found. This is indicated by the driver by
>>> +    sending a ``V4L2_EVENT_SOURCE_CHANGE`` event, which the client
>>> +    must handle.
>>> +
>>> +    a. Required fields: as per spec.
>>> +
>>> +    b. Return values: as per spec.
>>> +
>>> +    .. note::
>>> +
>>> +       Calling :c:func:`VIDIOC_REQBUFS`, :c:func:`VIDIOC_STREAMON`
>>> +       or :c:func:`VIDIOC_G_FMT` on the CAPTURE queue at this time is not
>>> +       allowed and must return EINVAL.
>>
>> I dislike EINVAL here. It is too generic. Also, the passed arguments can be
>> perfectly valid, you just aren't in the right state. EPERM might be better.
> 
> The problem of hardware that can't parse the resolution or software
> that wants to pre-allocate buffers was brought up in different
> replies. I think we might want to revise this in general, but I agree
> that EPERM sounds better than EINVAL in this context.
> 
>>
>>> +
>>> +6.  This step only applies for coded formats that contain resolution
>>> +    information in the stream.
>>> +    Continue queuing/dequeuing bitstream buffers to/from the
>>> +    OUTPUT queue via :c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`. The driver
>>> +    must keep processing and returning each buffer to the client
>>> +    until required metadata to send a ``V4L2_EVENT_SOURCE_CHANGE``
>>> +    for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION`` is
>>> +    found.
>>
>> This sentence is confusing. It's not clear what you mean here.
> 
> The point is that userspace needs to keep providing new bitstream data
> until the header can be parsed.
> 
>>
>>  There is no requirement to pass enough data for this to
>>> +    occur in the first buffer and the driver must be able to
>>> +    process any number
>>
>> Missing period at the end of the sentence.
>>
>>> +
>>> +    a. Required fields: as per spec.
>>> +
>>> +    b. Return values: as per spec.
>>> +
>>> +    c. If data in a buffer that triggers the event is required to decode
>>> +       the first frame, the driver must not return it to the client,
>>> +       but must retain it for further decoding.
>>> +
>>> +    d. Until the resolution source event is sent to the client, calling
>>> +       :c:func:`VIDIOC_G_FMT` on the CAPTURE queue must return -EINVAL.
>>
>> EPERM?
> 
> Ack. +/- the problem of userspace that wants to pre-allocate CAPTURE queue.
> 
> Although when I think of it now, such userspace would set coded
> resolution on OUTPUT queue and then driver could instantly signal
> source change event on CAPTURE queue even before the hardware finishes
> the parsing. If what the hardware parses doesn't match what the
> userspace set, yet another event would be signaled.
> 
>>
>>> +
>>> +    .. note::
>>> +
>>> +       No decoded frames are produced during this phase.
>>> +
>>> +7.  This step only applies for coded formats that contain resolution
>>
>> applies to  (same elsewhere)
>>
>>> +    information in the stream.
>>> +    Receive and handle ``V4L2_EVENT_SOURCE_CHANGE`` from the driver
>>> +    via :c:func:`VIDIOC_DQEVENT`. The driver must send this event once
>>> +    enough data is obtained from the stream to allocate CAPTURE
>>> +    buffers and to begin producing decoded frames.
>>> +
>>> +    a. Required fields:
>>> +
>>> +       i. type = ``V4L2_EVENT_SOURCE_CHANGE``
>>> +
>>> +    b. Return values: as per spec.
>>> +
>>> +    c. The driver must return u.src_change.changes =
>>> +       ``V4L2_EVENT_SRC_CH_RESOLUTION``.
>>> +
>>> +8.  This step only applies for coded formats that contain resolution
>>> +    information in the stream.
>>> +    Call :c:func:`VIDIOC_G_FMT` for CAPTURE queue to get format for the
>>> +    destination buffers parsed/decoded from the bitstream.
>>> +
>>> +    a. Required fields:
>>> +
>>> +       i. type = CAPTURE
>>> +
>>> +    b. Return values: as per spec.
>>> +
>>> +    c. Return fields:
>>> +
>>> +       i.   fmt.pix_mp.width, fmt.pix_mp.height: coded resolution
>>> +            for the decoded frames
>>> +
>>> +       ii.  fmt.pix_mp.pixelformat: default/required/preferred by
>>> +            driver pixelformat for decoded frames.
>>> +
>>> +       iii. num_planes: set to number of planes for pixelformat.
>>> +
>>> +       iv.  For each plane p = [0, num_planes-1]:
>>> +            plane_fmt[p].sizeimage, plane_fmt[p].bytesperline as
>>> +            per spec for coded resolution.
>>> +
>>> +    .. note::
>>> +
>>> +       Te value of pixelformat may be any pixel format supported,
>>> +       and must
>>> +       be supported for current stream, based on the information
>>> +       parsed from the stream and hardware capabilities. It is
>>> +       suggested that driver chooses the preferred/optimal format
>>> +       for given configuration. For example, a YUV format may be
>>> +       preferred over an RGB format, if additional conversion step
>>> +       would be required.
>>> +
>>> +9.  (optional) Enumerate CAPTURE formats via :c:func:`VIDIOC_ENUM_FMT` on
>>> +    CAPTURE queue.
>>> +    Once the stream information is parsed and known, the client
>>> +    may use this ioctl to discover which raw formats are supported
>>> +    for given stream and select on of them via :c:func:`VIDIOC_S_FMT`.
>>> +
>>> +    a. Fields/return values as per spec.
>>> +
>>> +    .. note::
>>> +
>>> +       The driver must return only formats supported for the
>>> +       current stream parsed in this initialization sequence, even
>>> +       if more formats may be supported by the driver in general.
>>> +       For example, a driver/hardware may support YUV and RGB
>>> +       formats for resolutions 1920x1088 and lower, but only YUV for
>>> +       higher resolutions (e.g. due to memory bandwidth
>>> +       limitations). After parsing a resolution of 1920x1088 or
>>> +       lower, :c:func:`VIDIOC_ENUM_FMT` may return a set of YUV and RGB
>>> +       pixelformats, but after parsing resolution higher than
>>> +       1920x1088, the driver must not return (unsupported for this
>>> +       resolution) RGB.
>>> +
>>> +       However, subsequent resolution change event
>>> +       triggered after discovering a resolution change within the
>>> +       same stream may switch the stream into a lower resolution;
>>> +       :c:func:`VIDIOC_ENUM_FMT` must return RGB formats again in that case.
>>> +
>>> +10.  (optional) Choose a different CAPTURE format than suggested via
>>> +     :c:func:`VIDIOC_S_FMT` on CAPTURE queue. It is possible for the client
>>> +     to choose a different format than selected/suggested by the
>>> +     driver in :c:func:`VIDIOC_G_FMT`.
>>> +
>>> +     a. Required fields:
>>> +
>>> +        i.  type = CAPTURE
>>> +
>>> +        ii. fmt.pix_mp.pixelformat set to a coded format
>>> +
>>> +     b. Return values:
>>> +
>>> +        i. EINVAL: unsupported format.
>>
>> Or replace it with a supported format. I'm inclined to do that instead of
>> returning EINVAL.
> 
> Agreed.
> 
>>
>>> +
>>> +     c. Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available formats
>>> +        after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to find
>>> +        out a set of allowed pixelformats for given configuration,
>>> +        but not required.
>>> +
>>> +11.  (optional) Acquire visible resolution via :c:func:`VIDIOC_G_SELECTION`.
>>> +
>>> +    a. Required fields:
>>> +
>>> +       i.  type = CAPTURE
>>> +
>>> +       ii. target = ``V4L2_SEL_TGT_CROP``
>>
>> I don't think this is the right selection target to use, but I think others
>> commented on that already.
> 
> Yes, Philipp brought this topic before and we had some further
> exchange on it, which I think would benefit from you taking a look. :)
> 
>>
>>> +
>>> +    b. Return values: per spec.
>>> +
>>> +    c. Return fields
>>> +
>>> +       i. r.left, r.top, r.width, r.height: visible rectangle; this must
>>> +          fit within coded resolution returned from :c:func:`VIDIOC_G_FMT`.
>>> +
>>> +12. (optional) Get minimum number of buffers required for CAPTURE queue
>>> +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
>>> +    more buffers than minimum required by hardware/format (see
>>> +    allocation).
>>> +
>>> +    a. Required fields:
>>> +
>>> +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
>>> +
>>> +    b. Return values: per spec.
>>> +
>>> +    c. Return fields:
>>> +
>>> +       i. value: minimum number of buffers required to decode the stream
>>> +          parsed in this initialization sequence.
>>> +
>>> +    .. note::
>>> +
>>> +       Note that the minimum number of buffers must be at least the
>>> +       number required to successfully decode the current stream.
>>> +       This may for example be the required DPB size for an H.264
>>
>> Is DPB in the glossary?
> 
> Need to add indeed.
> 
> [snip]
>>> +Seek
>>> +----
>>> +
>>> +Seek is controlled by the OUTPUT queue, as it is the source of bitstream
>>> +data. CAPTURE queue remains unchanged/unaffected.
>>> +
>>> +1. Stop the OUTPUT queue to begin the seek sequence via
>>> +   :c:func:`VIDIOC_STREAMOFF`.
>>> +
>>> +   a. Required fields:
>>> +
>>> +      i. type = OUTPUT
>>> +
>>> +   b. The driver must drop all the pending OUTPUT buffers and they are
>>> +      treated as returned to the client (as per spec).
>>> +
>>> +2. Restart the OUTPUT queue via :c:func:`VIDIOC_STREAMON`
>>> +
>>> +   a. Required fields:
>>> +
>>> +      i. type = OUTPUT
>>> +
>>> +   b. The driver must be put in a state after seek and be ready to
>>> +      accept new source bitstream buffers.
>>> +
>>> +3. Start queuing buffers to OUTPUT queue containing stream data after
>>> +   the seek until a suitable resume point is found.
>>> +
>>> +   .. note::
>>> +
>>> +      There is no requirement to begin queuing stream
>>> +      starting exactly from a resume point (e.g. SPS or a keyframe).
>>
>> SPS, keyframe: are they in the glossary?
> 
> Will add.
> 
> [snip]
>>> +Flush
>>> +-----
>>> +
>>> +Flush is the process of draining the CAPTURE queue of any remaining
>>> +buffers. After the flush sequence is complete, the client has received
>>> +all decoded frames for all OUTPUT buffers queued before the sequence was
>>> +started.
>>> +
>>> +1. Begin flush by issuing :c:func:`VIDIOC_DECODER_CMD`.
>>> +
>>> +   a. Required fields:
>>> +
>>> +      i. cmd = ``V4L2_DEC_CMD_STOP``
>>
>> Drivers should set the V4L2_DEC_CMD_STOP_IMMEDIATELY flag since I doubt any
>> m2m driver supports stopping at a specific pts.
> 
> The documentation says:
> 
> "If V4L2_DEC_CMD_STOP_IMMEDIATELY is set, then the decoder stops
> immediately (ignoring the pts value), otherwise it will keep decoding
> until timestamp >= pts or until the last of the pending data from its
> internal buffers was decoded."
> 
> also for the pts field:
> 
> "Stop playback at this pts or immediately if the playback is already
> past that timestamp. Leave to 0 if you want to stop after the last
> frame was decoded."
> 
> What we want the decoder to do here is to "keep decoding [...] until
> the last of the pending data from its internal buffers was decoded",
> which looks like something happening exactly without
> V4L2_DEC_CMD_STOP_IMMEDIATELY when pts is set to 0.
> 
>>
>> They should also support VIDIOC_DECODER_CMD_TRY!
> 
> Agreed.
> 
>>
>> You can probably make default implementations in v4l2-mem2mem.c since the only
>> thing that I expect is supported is the STOP command with the STOP_IMMEDIATELY
>> flag set.
> 
> Is there any useful case for STOP_IMMEDIATELY with m2m decoders? The
> only thing I can think of is some kind of power management trick that
> could stop the decoder until the client collect enough OUTPUT buffers
> to make it process in longer batch. Still, that could be done by the
> client just holding on with QBUF(OUTPUT) until enough data to fill the
> desired number of buffers is collected.

I don't follow. The idea of STOP_IMMEDIATELY is basically to stop by discarding any
pending OUTPUT buffers and an immediate EOS on the CAPTURE queue.

That's the idea at least. But this can of course also be done by just calling
STREAMOFF on both capture and output queues. So I agree that I am not sure if
it makes sense. However, you should specific that besides setting cmd to V4L2_DEC_CMD_STOP
you should also set pts to 0.

> 
> [snip]
>>> +Commit points
>>> +-------------
>>> +
>>> +Setting formats and allocating buffers triggers changes in the behavior
>>> +of the driver.
>>> +
>>> +1. Setting format on OUTPUT queue may change the set of formats
>>> +   supported/advertised on the CAPTURE queue. It also must change
>>> +   the format currently selected on CAPTURE queue if it is not
>>> +   supported by the newly selected OUTPUT format to a supported one.
>>> +
>>> +2. Enumerating formats on CAPTURE queue must only return CAPTURE formats
>>> +   supported for the OUTPUT format currently set.
>>> +
>>> +3. Setting/changing format on CAPTURE queue does not change formats
>>> +   available on OUTPUT queue.
>>
>> True.
>>
>>  An attempt to set CAPTURE format that
>>> +   is not supported for the currently selected OUTPUT format must
>>> +   result in an error (-EINVAL) from :c:func:`VIDIOC_S_FMT`.
>>
>> I'm not sure about that. I believe it is valid to replace it with the
>> first supported pixelformat. TRY_FMT certainly should do that.
> 
> As above, we probably should stay consistent with general semantics.
> 
> Best regards,
> Tomasz
> 

Regards,

	Hans
Tomasz Figa June 8, 2018, 10:42 a.m. UTC | #19
Hi Nicolas,

On Fri, Jun 8, 2018 at 2:31 AM Nicolas Dufresne <nicolas@ndufresne.ca> wrote:
>
> Thanks Tomasz for this work.
>
> The following is my first read review, please ignore my comments if
> they already have been mentioned by others or discussed, I'll catchup
> on the appropriate threads later on.

Thanks for review!

>
> Le mardi 05 juin 2018 à 19:33 +0900, Tomasz Figa a écrit :
> > Due to complexity of the video decoding process, the V4L2 drivers of
> > stateful decoder hardware require specific sequencies of V4L2 API calls
> > to be followed. These include capability enumeration, initialization,
> > decoding, seek, pause, dynamic resolution change, flush and end of
> > stream.
> >
> > Specifics of the above have been discussed during Media Workshops at
> > LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> > Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> > originated at those events was later implemented by the drivers we already
> > have merged in mainline, such as s5p-mfc or mtk-vcodec.
> >
> > The only thing missing was the real specification included as a part of
> > Linux Media documentation. Fix it now and document the decoder part of
> > the Codec API.
> >
> > Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> > ---
> >  Documentation/media/uapi/v4l/dev-codec.rst | 771 +++++++++++++++++++++
> >  Documentation/media/uapi/v4l/v4l2.rst      |  14 +-
> >  2 files changed, 784 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> > index c61e938bd8dc..0483b10c205e 100644
> > --- a/Documentation/media/uapi/v4l/dev-codec.rst
> > +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> > @@ -34,3 +34,774 @@ the codec and reprogram it whenever another file handler gets access.
> >  This is different from the usual video node behavior where the video
> >  properties are global to the device (i.e. changing something through one
> >  file handle is visible through another file handle).
> > +
> > +This interface is generally appropriate for hardware that does not
> > +require additional software involvement to parse/partially decode/manage
> > +the stream before/after processing in hardware.
> > +
> > +Input data to the Stream API are buffers containing unprocessed video
> > +stream (Annex-B H264/H265 stream, raw VP8/9 stream) only. The driver is
>
> We should probably use HEVC instead of H265, as this is the name we
> have picked for that format.

Ack.

>
> > +expected not to require any additional information from the client to
> > +process these buffers, and to return decoded frames on the CAPTURE queue
> > +in display order.
>
> It might confused some users with the fact that first buffer for non-
> bytestream formats is special and must contain only the headers (VP8/9
> and H264_NO_SC which is also known as H264 AVC, the format used in
> ISOMP4). Also, these formats must be framed by userspace, as it's not
> possible to divide the frames/nal later on. I would suggest to be a bit
> less strict in the introduction here.

I think we need to make a clear boundary between this stateful API and
the to-be-created stateless API. I agree, though, that the wording
might be a bit unfortunate and suggesting that userspace doesn't have
to do anything at all, just feet the buffers with bytes of bitstream.

As for VP8/9, I don't think it is true that the first buffer must
contain only the headers. As far as I can see, mtk-vcodec just keeps
the buffer in the queue for next run and s5p-mfc does the same for
H264 and H264_MVC, but currently has a bug (missing || clause in if)
that makes it not behave this way for VP8/9.

Pawel, any thoughts on this?

[snip]
> > +visible height
> > +   height for given visible resolution
>
> I do believe 'display width/height/resolution' is more common.
>

"Visible" sounds more common for me. :)

I guess we could cross reference one with another.

> > +
> > +visible resolution
> > +   stream resolution of the visible picture, in
> > +   pixels, to be used for display purposes; must be smaller or equal to
> > +   coded resolution;
> > +
> > +visible width
> > +   width for given visible resolution
> > +
> > +Decoder
> > +=======
> > +
> > +Querying capabilities
> > +---------------------
> > +
> > +1. To enumerate the set of coded formats supported by the driver, the
> > +   client uses :c:func:`VIDIOC_ENUM_FMT` for OUTPUT. The driver must always
> > +   return the full set of supported formats, irrespective of the
> > +   format set on the CAPTURE queue.
> > +
> > +2. To enumerate the set of supported raw formats, the client uses
> > +   :c:func:`VIDIOC_ENUM_FMT` for CAPTURE. The driver must return only the
> > +   formats supported for the format currently set on the OUTPUT
> > +   queue.
> > +   In order to enumerate raw formats supported by a given coded
> > +   format, the client must first set that coded format on the
> > +   OUTPUT queue and then enumerate the CAPTURE queue.
>
> As of today, GStreamer expects an initial state, before the first
> S_FMT(OUTPUT) that results in all possible formats regardless. Later
> on, after S_FMT(OUTPUT) + header buffers has been passed, a new
> enumeration is done, and is expected to return a subset (or the same
> list).

This kind of contradicts with the general principle of V4L2, which
says that there is some format set by default. With the above, that
would mean that e0 and e1 in the example below could be different.

e0 = VIDIOC_ENUM_FMT(CAPTURE)

G_FMT(OUTPUT, &x);
S_FMT(OUTPUT, &x);

e1 = VIDIOC_ENUM_FMT(CAPTURE)

> If a better output format then the one chosen by the driver is
> found, it will be tried, if not supported, it will simply keep the
> driver selected output format. This way, drivers don't need to do extra
> work if their output format is completely fixed by the input/headers.
> The only upstream driver that have this flexibility is CODA. To be
> fair, we don't in GStreamer need to know about the output format, it's
> simply exposed to fail earlier if users tries to connect to elements
> that are incompatible by nature. We could just remove that initial
> probing and it would still work as expected. I think probing all the
> output format is not that of a good idea, with the profiles and level
> it becomes all very complex.

It would simplify things a lot if we could agree on removing that
initial probing and restricting VIDIOC_ENUM_FMT() to until the
bitstream format is determined.

>
> > +
> > +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> > +   resolutions for a given format, passing its fourcc in
> > +   :c:type:`v4l2_frmivalenum` ``pixel_format``.
>
> Good thing this is a may, since it's all very complex and not that
> useful with the levels and profiles. Userspace can figure-out really if
> needed.

Right...

>
> > +
> > +   a. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for coded formats
> > +      must be maximums for given coded format for all supported raw
> > +      formats.
> > +
> > +   b. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for raw formats must
> > +      be maximums for given raw format for all supported coded
> > +      formats.
> > +
> > +   c. The client should derive the supported resolution for a
> > +      combination of coded+raw format by calculating the
> > +      intersection of resolutions returned from calls to
> > +      :c:func:`VIDIOC_ENUM_FRAMESIZES` for the given coded and raw formats.
> > +
> > +4. Supported profiles and levels for given format, if applicable, may be
> > +   queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> > +
> > +5. The client may use :c:func:`VIDIOC_ENUM_FRAMEINTERVALS` to enumerate maximum
> > +   supported framerates by the driver/hardware for a given
> > +   format+resolution combination.
>
> I think we'll need to add a section to help with that one. All drivers
> supports ranges in fps. Venus have this bug were it sets a range with a
> step of 1/1, but because we expose frame intervals instead of
> framerate, the result is not as expected. If you want an interval
> between 1 and 60 fps, that would be from 1/60s to 1/1s, there is no
> valid step that can be used, you are forced to use CONTINUOUS, or
> DISCRETE.

To be honest, I'm not sure what is the meaning of frame rate in case
of an m2m decoder. One thing that comes to my mind is performance
rating, but I wonder if this is something that we should be exposing
here, given that it would only apply to the case when there is only 1
decode instance running (since any further instances would degrade the
performance)...

>
> > +
> > +Initialization sequence
> > +-----------------------
> > +
> > +1. (optional) Enumerate supported OUTPUT formats and resolutions. See
> > +   capability enumeration.
> > +
> > +2. Set a coded format on the source queue via :c:func:`VIDIOC_S_FMT`
> > +
> > +   a. Required fields:
> > +
> > +      i.   type = OUTPUT
>
> In the introduction, maybe we could say that we use OUTPUT and CAPTURE
> to mean both format (with and without MPLANE ?).
>
> > +
> > +      ii.  fmt.pix_mp.pixelformat set to a coded format
> > +
> > +      iii. fmt.pix_mp.width, fmt.pix_mp.height only if cannot be
> > +           parsed from the stream for the given coded format;
> > +           ignored otherwise;
>
> GStreamer passes the display size, as the display size found in the
> bitstream maybe not match the selected display size by the container
> (e.g. ISOMP4/Matroska). I'm not sure what drivers endup doing, it was
> not really thought through, we later query the selection to know the
> display size. We could follow this new rule by not passing anything and
> then simply picking the smallest from bitstream display size and
> container display size. I'm just giving a reference of what existing
> userspace may be doing at the moment, as we'll have to care about
> breaking existing software when implementing this.

At least in case of H264, it's possible to have display size that
rounded up to nearest macroblocks doesn't give coded size, i.e. there
are some full not-displayed macroblocks. In such case, it wouldn't be
possible to recover coded size from width/height given on OUTPUT, so
I'd suggest defining them as coded size.

>
> > +
> > +   b. Return values:
> > +
> > +      i.  EINVAL: unsupported format.
> > +
> > +      ii. Others: per spec
> > +
> > +   .. note::
> > +
> > +      The driver must not adjust pixelformat, so if
> > +      ``V4L2_PIX_FMT_H264`` is passed but only
> > +      ``V4L2_PIX_FMT_H264_SLICE`` is supported, S_FMT will return
> > +      -EINVAL. If both are acceptable by client, calling S_FMT for
> > +      the other after one gets rejected may be required (or use
> > +      :c:func:`VIDIOC_ENUM_FMT` to discover beforehand, see Capability
> > +      enumeration).
>
> Ok, that's new, in GStreamer we validate that the format haven't been
> changed. Should be backward compatible though. What we don't do though
> is check back the OUTPUT format after setting the CAPTURE format, that
> would seem totally invalid. You mention that this isn't allowed later
> on, so that's great.
>
> > +
> > +3.  (optional) Get minimum number of buffers required for OUTPUT queue
> > +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> > +    more buffers than minimum required by hardware/format (see
> > +    allocation).
>
> I have never seen such restriction on a decoder, though it's optional
> here, so probably fine.

I don't think it's a restriction. I'd call it a hint that allows
userspace to allocate more buffers than strictly required and have
deeper queues. If there is a hardware restriction, it should be
imposed in vb2 .queue_setup() callback by altering the number of
buffers requested from REQBUFS.

>
> > +
> > +    a. Required fields:
> > +
> > +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> > +
> > +    b. Return values: per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i. value: required number of OUTPUT buffers for the currently set
> > +          format;
> > +
> > +4.  Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on OUTPUT
> > +    queue.
> > +
> > +    a. Required fields:
> > +
> > +       i.   count = n, where n > 0.
> > +
> > +       ii.  type = OUTPUT
> > +
> > +       iii. memory = as per spec
> > +
> > +    b. Return values: Per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i. count: adjusted to allocated number of buffers
> > +
> > +    d. The driver must adjust count to minimum of required number of
> > +       source buffers for given format and count passed. The client
> > +       must check this value after the ioctl returns to get the
> > +       number of buffers allocated.
> > +
> > +    .. note::
> > +
> > +       Passing count = 1 is useful for letting the driver choose
> > +       the minimum according to the selected format/hardware
> > +       requirements.
>
> This raises a question, should V4L2_CID_MIN_BUFFERS_FOR_OUTPUT really
> be the minimum, or min+1. Since REQBUFS is likely to allocate min+1 to
> be efficient ? Allocating just the minimum, means that the decoder will
> always be idle while the userspace is handling an output.

It technically still allows the decoding to continue, so I'd leave this as is.

>
> > +
> > +    .. note::
> > +
> > +       To allocate more than minimum number of buffers (for pipeline
> > +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT)`` to
> > +       get minimum number of buffers required by the driver/format,
> > +       and pass the obtained value plus the number of additional
> > +       buffers needed in count to :c:func:`VIDIOC_REQBUFS`.
> > +
> > +5.  Begin parsing the stream for stream metadata via :c:func:`VIDIOC_STREAMON` on
> > +    OUTPUT queue. This step allows the driver to parse/decode
> > +    initial stream metadata until enough information to allocate
> > +    CAPTURE buffers is found. This is indicated by the driver by
> > +    sending a ``V4L2_EVENT_SOURCE_CHANGE`` event, which the client
> > +    must handle.
>
> GStreamer still uses legacy path, expecting G_FMT to block if there is
> headers in the queue. Do we want to document this legacy method or not
> ?

Would it break really bad if G_FMT stopped blocking? IMHO the less
legacy behavior, the better, but I'm afraid we need to maintain
compatibility here...

>
> > +
> > +    a. Required fields: as per spec.
> > +
> > +    b. Return values: as per spec.
> > +
> > +    .. note::
> > +
> > +       Calling :c:func:`VIDIOC_REQBUFS`, :c:func:`VIDIOC_STREAMON`
> > +       or :c:func:`VIDIOC_G_FMT` on the CAPTURE queue at this time is not
> > +       allowed and must return EINVAL.
> > +
> > +6.  This step only applies for coded formats that contain resolution
> > +    information in the stream.
> > +    Continue queuing/dequeuing bitstream buffers to/from the
> > +    OUTPUT queue via :c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`. The driver
> > +    must keep processing and returning each buffer to the client
> > +    until required metadata to send a ``V4L2_EVENT_SOURCE_CHANGE``
> > +    for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION`` is
> > +    found. There is no requirement to pass enough data for this to
> > +    occur in the first buffer and the driver must be able to
> > +    process any number
> > +
> > +    a. Required fields: as per spec.
> > +
> > +    b. Return values: as per spec.
> > +
> > +    c. If data in a buffer that triggers the event is required to decode
> > +       the first frame, the driver must not return it to the client,
> > +       but must retain it for further decoding.
> > +
> > +    d. Until the resolution source event is sent to the client, calling
> > +       :c:func:`VIDIOC_G_FMT` on the CAPTURE queue must return -EINVAL.
> > +
> > +    .. note::
> > +
> > +       No decoded frames are produced during this phase.
> > +
> > +7.  This step only applies for coded formats that contain resolution
> > +    information in the stream.
> > +    Receive and handle ``V4L2_EVENT_SOURCE_CHANGE`` from the driver
> > +    via :c:func:`VIDIOC_DQEVENT`. The driver must send this event once
> > +    enough data is obtained from the stream to allocate CAPTURE
> > +    buffers and to begin producing decoded frames.
> > +
> > +    a. Required fields:
> > +
> > +       i. type = ``V4L2_EVENT_SOURCE_CHANGE``
> > +
> > +    b. Return values: as per spec.
> > +
> > +    c. The driver must return u.src_change.changes =
> > +       ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> > +
> > +8.  This step only applies for coded formats that contain resolution
> > +    information in the stream.
> > +    Call :c:func:`VIDIOC_G_FMT` for CAPTURE queue to get format for the
> > +    destination buffers parsed/decoded from the bitstream.
> > +
> > +    a. Required fields:
> > +
> > +       i. type = CAPTURE
> > +
> > +    b. Return values: as per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i.   fmt.pix_mp.width, fmt.pix_mp.height: coded resolution
> > +            for the decoded frames
> > +
> > +       ii.  fmt.pix_mp.pixelformat: default/required/preferred by
> > +            driver pixelformat for decoded frames.
> > +
> > +       iii. num_planes: set to number of planes for pixelformat.
> > +
> > +       iv.  For each plane p = [0, num_planes-1]:
> > +            plane_fmt[p].sizeimage, plane_fmt[p].bytesperline as
> > +            per spec for coded resolution.
> > +
> > +    .. note::
> > +
> > +       Te value of pixelformat may be any pixel format supported,
> > +       and must
> > +       be supported for current stream, based on the information
> > +       parsed from the stream and hardware capabilities. It is
> > +       suggested that driver chooses the preferred/optimal format
> > +       for given configuration. For example, a YUV format may be
> > +       preferred over an RGB format, if additional conversion step
> > +       would be required.
> > +
> > +9.  (optional) Enumerate CAPTURE formats via :c:func:`VIDIOC_ENUM_FMT` on
> > +    CAPTURE queue.
> > +    Once the stream information is parsed and known, the client
> > +    may use this ioctl to discover which raw formats are supported
> > +    for given stream and select on of them via :c:func:`VIDIOC_S_FMT`.
> > +
> > +    a. Fields/return values as per spec.
> > +
> > +    .. note::
> > +
> > +       The driver must return only formats supported for the
> > +       current stream parsed in this initialization sequence, even
> > +       if more formats may be supported by the driver in general.
> > +       For example, a driver/hardware may support YUV and RGB
> > +       formats for resolutions 1920x1088 and lower, but only YUV for
> > +       higher resolutions (e.g. due to memory bandwidth
> > +       limitations). After parsing a resolution of 1920x1088 or
> > +       lower, :c:func:`VIDIOC_ENUM_FMT` may return a set of YUV and RGB
> > +       pixelformats, but after parsing resolution higher than
> > +       1920x1088, the driver must not return (unsupported for this
> > +       resolution) RGB.
> > +
> > +       However, subsequent resolution change event
> > +       triggered after discovering a resolution change within the
> > +       same stream may switch the stream into a lower resolution;
> > +       :c:func:`VIDIOC_ENUM_FMT` must return RGB formats again in that case.
> > +
> > +10.  (optional) Choose a different CAPTURE format than suggested via
> > +     :c:func:`VIDIOC_S_FMT` on CAPTURE queue. It is possible for the client
> > +     to choose a different format than selected/suggested by the
> > +     driver in :c:func:`VIDIOC_G_FMT`.
> > +
> > +     a. Required fields:
> > +
> > +        i.  type = CAPTURE
> > +
> > +        ii. fmt.pix_mp.pixelformat set to a coded format
> > +
> > +     b. Return values:
> > +
> > +        i. EINVAL: unsupported format.
> > +
> > +     c. Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available formats
> > +        after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to find
> > +        out a set of allowed pixelformats for given configuration,
> > +        but not required.
> > +
> > +11.  (optional) Acquire visible resolution via :c:func:`VIDIOC_G_SELECTION`.
> > +
> > +    a. Required fields:
> > +
> > +       i.  type = CAPTURE
> > +
> > +       ii. target = ``V4L2_SEL_TGT_CROP``
> > +
> > +    b. Return values: per spec.
> > +
> > +    c. Return fields
> > +
> > +       i. r.left, r.top, r.width, r.height: visible rectangle; this must
> > +          fit within coded resolution returned from :c:func:`VIDIOC_G_FMT`.
> > +
> > +12. (optional) Get minimum number of buffers required for CAPTURE queue
> > +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> > +    more buffers than minimum required by hardware/format (see
> > +    allocation).
>
> Should not be optional if the driver have this restriction.

As before, this is not about hardware restrictions (which would be
handled at REQBUFS/queue_setup level), but an optimization hint for
userspace.

>
> > +
> > +    a. Required fields:
> > +
> > +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
> > +
> > +    b. Return values: per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i. value: minimum number of buffers required to decode the stream
> > +          parsed in this initialization sequence.
> > +
> > +    .. note::
> > +
> > +       Note that the minimum number of buffers must be at least the
> > +       number required to successfully decode the current stream.
> > +       This may for example be the required DPB size for an H.264
> > +       stream given the parsed stream configuration (resolution,
> > +       level).
> > +
> > +13. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS` on the
> > +    CAPTURE queue.
> > +
> > +    a. Required fields:
> > +
> > +       i.   count = n, where n > 0.
> > +
> > +       ii.  type = CAPTURE
> > +
> > +       iii. memory = as per spec
> > +
> > +    b. Return values: Per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i. count: adjusted to allocated number of buffers.
> > +
> > +    d. The driver must adjust count to minimum of required number of
> > +       destination buffers for given format and stream configuration
> > +       and the count passed. The client must check this value after
> > +       the ioctl returns to get the number of buffers allocated.
> > +
> > +    .. note::
> > +
> > +       Passing count = 1 is useful for letting the driver choose
> > +       the minimum.
> > +
> > +    .. note::
> > +
> > +       To allocate more than minimum number of buffers (for pipeline
> > +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE)`` to
> > +       get minimum number of buffers required, and pass the obtained
> > +       value plus the number of additional buffers needed in count
> > +       to :c:func:`VIDIOC_REQBUFS`.
> > +
> > +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
> > +
> > +    a. Required fields: as per spec.
> > +
> > +    b. Return values: as per spec.
> > +
> > +Decoding
> > +--------
> > +
> > +This state is reached after a successful initialization sequence. In
> > +this state, client queues and dequeues buffers to both queues via
> > +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, as per spec.
> > +
> > +Both queues operate independently. The client may queue and dequeue
> > +buffers to queues in any order and at any rate, also at a rate different
> > +for each queue. The client may queue buffers within the same queue in
> > +any order (V4L2 index-wise). It is recommended for the client to operate
> > +the queues independently for best performance.
> > +
> > +Source OUTPUT buffers must contain:
> > +
> > +-  H.264/AVC: one or more complete NALUs of an Annex B elementary
> > +   stream; one buffer does not have to contain enough data to decode
> > +   a frame;
> > +
> > +-  VP8/VP9: one or more complete frames.
> > +
> > +No direct relationship between source and destination buffers and the
> > +timing of buffers becoming available to dequeue should be assumed in the
> > +Stream API. Specifically:
> > +
> > +-  a buffer queued to OUTPUT queue may result in no buffers being
> > +   produced on the CAPTURE queue (e.g. if it does not contain
> > +   encoded data, or if only metadata syntax structures are present
> > +   in it), or one or more buffers produced on the CAPTURE queue (if
> > +   the encoded data contained more than one frame, or if returning a
> > +   decoded frame allowed the driver to return a frame that preceded
> > +   it in decode, but succeeded it in display order)
> > +
> > +-  a buffer queued to OUTPUT may result in a buffer being produced on
> > +   the CAPTURE queue later into decode process, and/or after
> > +   processing further OUTPUT buffers, or be returned out of order,
> > +   e.g. if display reordering is used
> > +
> > +-  buffers may become available on the CAPTURE queue without additional
> > +   buffers queued to OUTPUT (e.g. during flush or EOS)
>
> There is no mention of timestamp passing and
> V4L2_BUF_FLAG_TIMESTAMP_COPY. These though are rather important
> respectively to match decoded frames with appropriate metadata and to
> discard stored metadata from the userspace queue.
>
> Unlike the suggestion here, most decoder are frame base, it would be
> nice to check if this is an actual firmware limitation in certain
> cases.

I think those buffers becoming "available on the CAPTURE queue without
additional buffers queued to OUTPUT" would actually result from some
buffers queued to OUTPUT in the past, but being delayed perhaps due to
some nuances of given codec.

Pawel, any further insights?

>
> > +
> > +Seek
> > +----
> > +
> > +Seek is controlled by the OUTPUT queue, as it is the source of bitstream
> > +data. CAPTURE queue remains unchanged/unaffected.
> > +
> > +1. Stop the OUTPUT queue to begin the seek sequence via
> > +   :c:func:`VIDIOC_STREAMOFF`.
> > +
> > +   a. Required fields:
> > +
> > +      i. type = OUTPUT
> > +
> > +   b. The driver must drop all the pending OUTPUT buffers and they are
> > +      treated as returned to the client (as per spec).
> > +
> > +2. Restart the OUTPUT queue via :c:func:`VIDIOC_STREAMON`
> > +
> > +   a. Required fields:
> > +
> > +      i. type = OUTPUT
> > +
> > +   b. The driver must be put in a state after seek and be ready to
> > +      accept new source bitstream buffers.
> > +
> > +3. Start queuing buffers to OUTPUT queue containing stream data after
> > +   the seek until a suitable resume point is found.
> > +
> > +   .. note::
> > +
> > +      There is no requirement to begin queuing stream
> > +      starting exactly from a resume point (e.g. SPS or a keyframe).
> > +      The driver must handle any data queued and must keep processing
> > +      the queued buffers until it finds a suitable resume point.
> > +      While looking for a resume point, the driver processes OUTPUT
> > +      buffers and returns them to the client without producing any
> > +      decoded frames.
>
> I have some doubts that this actually works. What you describe here is
> a flush/reset seqeuence. The drivers I have worked with totally forgets
> about their state after STREAMOFF on any queues. The initialization
> process need to happen again. Though, adding this support with just
> resetting the OUTPUT queue should be backward compatible. GStreamer
> always STREAMOFF on both sides.

I believe this should work with s5p-mfc and mtk-vcodec, +/- the
general bugginess of s5p-mfc stop_streaming handling.

>
> > +
> > +4. After a resume point is found, the driver will start returning
> > +   CAPTURE buffers with decoded frames.
> > +
> > +   .. note::
> > +
> > +      There is no precise specification for CAPTURE queue of when it
> > +      will start producing buffers containing decoded data from
> > +      buffers queued after the seek, as it operates independently
> > +      from OUTPUT queue.
>
> Also, in practice it is totally un-reliable to start from random point.
>  Some decoder will produce corrupted frame, some will wait, you never
> known. Seek code in ffmpeg, gstreamer, vlc, etc. always pick a good
> sync point. Then marks the extra as "decode only", hence the need for
> matching input/output for metadata, and drops the extra.

Are you sure that drivers of codecs which produce corrupted frames
couldn't be fixed to catch such (by some error signaled from hw) and
not return to userspace?

> > +
> > +      -  The driver is allowed to and may return a number of remaining CAPTURE
> > +         buffers containing decoded frames from before the seek after the
> > +         seek sequence (STREAMOFF-STREAMON) is performed.
>
> This is not a proper seek. That's probably why we also streamoff the
> capture queue to get rid of these ancient buffers. This seems only
> useful if you are trying to do seamless seeking (aka non flushing
> seek), which is a very niche use case.

Good point. When user moves the seek bar in the player, we don't care
about the already queued buffers anymore, we just want to start
decoding from the new point as fast as possible.

Pawel, any thoughts?

>
> > +
> > +      -  The driver is also allowed to and may not return all decoded frames
> > +         queued but not decode before the seek sequence was initiated.
> > +         E.g. for an OUTPUT queue sequence: QBUF(A), QBUF(B),
> > +         STREAMOFF(OUT), STREAMON(OUT), QBUF(G), QBUF(H), any of the
> > +         following results on the CAPTURE queue is allowed: {A’, B’, G’,
> > +         H’}, {A’, G’, H’}, {G’, H’}.
> > +
> > +Pause
> > +-----
> > +
> > +In order to pause, the client should just cease queuing buffers onto the
> > +OUTPUT queue. This is different from the general V4L2 API definition of
> > +pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue. Without
> > +source bitstream data, there is not data to process and the hardware
> > +remains idle. Conversely, using :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue
> > +indicates a seek, which 1) drops all buffers in flight and 2) after a
> > +subsequent :c:func:`VIDIOC_STREAMON` will look for and only continue from a
> > +resume point. This is usually undesirable for pause. The
> > +STREAMOFF-STREAMON sequence is intended for seeking.
> > +
> > +Similarly, CAPTURE queue should remain streaming as well, as the
> > +STREAMOFF-STREAMON sequence on it is intended solely for changing buffer
> > +sets
> > +
> > +Dynamic resolution change
> > +-------------------------
> > +
> > +When driver encounters a resolution change in the stream, the dynamic
> > +resolution change sequence is started.
> > +
> > +1.  On encountering a resolution change in the stream. The driver must
> > +    first process and decode all remaining buffers from before the
> > +    resolution change point.
> > +
> > +2.  After all buffers containing decoded frames from before the
> > +    resolution change point are ready to be dequeued on the
> > +    CAPTURE queue, the driver sends a ``V4L2_EVENT_SOURCE_CHANGE``
> > +    event for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> > +    The last buffer from before the change must be marked with
> > +    :c:type:`v4l2_buffer` ``flags`` flag ``V4L2_BUF_FLAG_LAST`` as in the flush
> > +    sequence.
> > +
> > +    .. note::
> > +
> > +       Any attempts to dequeue more buffers beyond the buffer marked
> > +       with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
> > +       :c:func:`VIDIOC_DQBUF`.
> > +
> > +3.  After dequeuing all remaining buffers from the CAPTURE queue, the
> > +    client must call :c:func:`VIDIOC_STREAMOFF` on the CAPTURE queue. The
> > +    OUTPUT queue remains streaming (calling STREAMOFF on it would
> > +    trigger a seek).
> > +    Until STREAMOFF is called on the CAPTURE queue (acknowledging
> > +    the event), the driver operates as if the resolution hasn’t
> > +    changed yet, i.e. :c:func:`VIDIOC_G_FMT`, etc. return previous
> > +    resolution.
>
> It's a bit more complicated, if the resolution goes bigger, the encoded
> buffer size needed to fit a full frame may be bidger. Reallocation of
> the output queue may be needed. In some memory constraint device, we'll
> also want to reallocate if it's going smaller. FFMPEG implement
> something clever for selected the size, CODA driver does the same but
> in the driver. It's a bit of a mess.
>
> In the long term, for gapless change (specially with CMA) we might want
> to support using larger buffer in the CAPTURE queue to avoid
> reallocation. OMX supports this.

Agreed. Although I wonder if all hardware can equally support it
(s5p-mfc does, though).

>
> > +
> > +4.  The client frees the buffers on the CAPTURE queue using
> > +    :c:func:`VIDIOC_REQBUFS`.
> > +
> > +    a. Required fields:
> > +
> > +       i.   count = 0
> > +
> > +       ii.  type = CAPTURE
> > +
> > +       iii. memory = as per spec
> > +
> > +5.  The client calls :c:func:`VIDIOC_G_FMT` for CAPTURE to get the new format
> > +    information.
> > +    This is identical to calling :c:func:`VIDIOC_G_FMT` after
> > +    ``V4L2_EVENT_SRC_CH_RESOLUTION`` in the initialization
> > +    sequence and should be handled similarly.
> > +
> > +    .. note::
> > +
> > +       It is allowed for the driver not to support the same
> > +       pixelformat as previously used (before the resolution change)
> > +       for the new resolution. The driver must select a default
> > +       supported pixelformat and return it from :c:func:`VIDIOC_G_FMT`, and
> > +       client must take note of it.
> > +
> > +6.  (optional) The client is allowed to enumerate available formats and
> > +    select a different one than currently chosen (returned via
> > +    :c:func:`VIDIOC_G_FMT)`. This is identical to a corresponding step in
> > +    the initialization sequence.
> > +
> > +7.  (optional) The client acquires visible resolution as in
> > +    initialization sequence.
> > +
> > +8.  (optional) The client acquires minimum number of buffers as in
> > +    initialization sequence.
> > +
> > +9.  The client allocates a new set of buffers for the CAPTURE queue via
> > +    :c:func:`VIDIOC_REQBUFS`. This is identical to a corresponding step in
> > +    the initialization sequence.
> > +
> > +10. The client resumes decoding by issuing :c:func:`VIDIOC_STREAMON` on the
> > +    CAPTURE queue.
> > +
> > +During the resolution change sequence, the OUTPUT queue must remain
> > +streaming. Calling :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue will initiate seek.
> > +
> > +The OUTPUT queue operates separately from the CAPTURE queue for the
> > +duration of the entire resolution change sequence. It is allowed (and
> > +recommended for best performance and simplcity) for the client to keep
> > +queuing/dequeuing buffers from/to OUTPUT queue even while processing
> > +this sequence.
> > +
> > +.. note::
> > +
> > +   It is also possible for this sequence to be triggered without
> > +   change in resolution if a different number of CAPTURE buffers is
> > +   required in order to continue decoding the stream.
>
> Perhaps the driver should be queried for the new display resolution
> through G_SELECTION ?
>

That's true. Actually, I guess that there might be a case when coded
resolution doesn't change but visible changes.

> > +
> > +Flush
> > +-----
> > +
> > +Flush is the process of draining the CAPTURE queue of any remaining
>
> Ok, call this Drain if it's the process of draining, it's really
> confusing as GStreamer makes a distinction between flush (getting rid
> of, like a reset) and draining (which involved displaying the
> remaining, but stop producing new data).

Drain sounds good to me.

>
> > +buffers. After the flush sequence is complete, the client has received
> > +all decoded frames for all OUTPUT buffers queued before the sequence was
> > +started.
> > +
> > +1. Begin flush by issuing :c:func:`VIDIOC_DECODER_CMD`.
> > +
> > +   a. Required fields:
> > +
> > +      i. cmd = ``V4L2_DEC_CMD_STOP``
> > +
> > +2. The driver must process and decode as normal all OUTPUT buffers
> > +   queued by the client before the :c:func:`VIDIOC_DECODER_CMD` was
> > +   issued.
> > +   Any operations triggered as a result of processing these
> > +   buffers (including the initialization and resolution change
> > +   sequences) must be processed as normal by both the driver and
> > +   the client before proceeding with the flush sequence.
> > +
> > +3. Once all OUTPUT buffers queued before ``V4L2_DEC_CMD_STOP`` are
> > +   processed:
> > +
> > +   a. If the CAPTURE queue is streaming, once all decoded frames (if
> > +      any) are ready to be dequeued on the CAPTURE queue, the
> > +      driver must send a ``V4L2_EVENT_EOS``. The driver must also
>
> I have never used the EOS event, and I bet many drivers don't implement
> it, why is that a must ?

Coda, venus, s5p-mfc do. The event is defined for this purpose, so it
should be consistently supported.

>
> > +      set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer` ``flags`` field on the
>
> In MFC we don't know, so we use the other method, EPIPE, there will be
> no FLAG_LAST on MFC, it's just not possible with that firmware. So
> FLAG_LAST is preferred, EPIPE is the fallback.

That's not true. s5p-mfc marks last buffer with V4L2_BUF_FLAG_LAST:
https://elixir.bootlin.com/linux/latest/source/drivers/media/platform/s5p-mfc/s5p_mfc.c#L223

We use MFC in production on Exynos 5250 and 5420 devices and it seems
to work fine.

>
> > +      buffer on the CAPTURE queue containing the last frame (if
> > +      any) produced as a result of processing the OUTPUT buffers
> > +      queued before ``V4L2_DEC_CMD_STOP``. If no more frames are
> > +      left to be returned at the point of handling
> > +      ``V4L2_DEC_CMD_STOP``, the driver must return an empty buffer
> > +      (with :c:type:`v4l2_buffer` ``bytesused`` = 0) as the last buffer with
> > +      ``V4L2_BUF_FLAG_LAST`` set instead.
> > +      Any attempts to dequeue more buffers beyond the buffer
> > +      marked with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE
> > +      error from :c:func:`VIDIOC_DQBUF`.
> > +
> > +   b. If the CAPTURE queue is NOT streaming, no action is necessary for
> > +      CAPTURE queue and the driver must send a ``V4L2_EVENT_EOS``
> > +      immediately after all OUTPUT buffers in question have been
> > +      processed.
> > +
> > +4. To resume, client may issue ``V4L2_DEC_CMD_START``.
> > +
> > +End of stream
> > +-------------
> > +
> > +When an explicit end of stream is encountered by the driver in the
> > +stream, it must send a ``V4L2_EVENT_EOS`` to the client after all frames
> > +are decoded and ready to be dequeued on the CAPTURE queue, with the
> > +:c:type:`v4l2_buffer` ``flags`` set to ``V4L2_BUF_FLAG_LAST``. This behavior is
> > +identical to the flush sequence as if triggered by the client via
> > +``V4L2_DEC_CMD_STOP``.
>
> I never heard of such a thing as an implicit EOS, can you elaborate ?

I think this is about EOS coming from the stream versus EOS coming
from userspace (triggered by CMD_STOP).

>
> > +
> > +Commit points
> > +-------------
> > +
> > +Setting formats and allocating buffers triggers changes in the behavior
> > +of the driver.
> > +
> > +1. Setting format on OUTPUT queue may change the set of formats
> > +   supported/advertised on the CAPTURE queue. It also must change
> > +   the format currently selected on CAPTURE queue if it is not
> > +   supported by the newly selected OUTPUT format to a supported one.
> > +
> > +2. Enumerating formats on CAPTURE queue must only return CAPTURE formats
> > +   supported for the OUTPUT format currently set.
> > +
> > +3. Setting/changing format on CAPTURE queue does not change formats
> > +   available on OUTPUT queue. An attempt to set CAPTURE format that
> > +   is not supported for the currently selected OUTPUT format must
> > +   result in an error (-EINVAL) from :c:func:`VIDIOC_S_FMT`.
>
> That's great clarification !

As per other replies, we actually should implement the standard
semantics of TRY_/S_FMT and have the driver adjust to a supported
setting.

Best regards,
Tomasz
Stanimir Varbanov June 14, 2018, 12:34 p.m. UTC | #20
Hi Tomasz,


On 06/05/2018 01:33 PM, Tomasz Figa wrote:
> Due to complexity of the video decoding process, the V4L2 drivers of
> stateful decoder hardware require specific sequencies of V4L2 API calls
> to be followed. These include capability enumeration, initialization,
> decoding, seek, pause, dynamic resolution change, flush and end of
> stream.
> 
> Specifics of the above have been discussed during Media Workshops at
> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> originated at those events was later implemented by the drivers we already
> have merged in mainline, such as s5p-mfc or mtk-vcodec.
> 
> The only thing missing was the real specification included as a part of
> Linux Media documentation. Fix it now and document the decoder part of
> the Codec API.
> 
> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> ---
>  Documentation/media/uapi/v4l/dev-codec.rst | 771 +++++++++++++++++++++
>  Documentation/media/uapi/v4l/v4l2.rst      |  14 +-
>  2 files changed, 784 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> index c61e938bd8dc..0483b10c205e 100644
> --- a/Documentation/media/uapi/v4l/dev-codec.rst
> +++ b/Documentation/media/uapi/v4l/dev-codec.rst

<snip>

> +Initialization sequence
> +-----------------------
> +
> +1. (optional) Enumerate supported OUTPUT formats and resolutions. See
> +   capability enumeration.
> +
> +2. Set a coded format on the source queue via :c:func:`VIDIOC_S_FMT`
> +
> +   a. Required fields:
> +
> +      i.   type = OUTPUT
> +
> +      ii.  fmt.pix_mp.pixelformat set to a coded format
> +
> +      iii. fmt.pix_mp.width, fmt.pix_mp.height only if cannot be
> +           parsed from the stream for the given coded format;
> +           ignored otherwise;

Can we say that if width != 0 and height != 0 then the user knows the
real coded resolution? And vise versa if width/height are both zero the
driver should parse the stream metadata?

Also what about fmt.pix_mp.plane_fmt.sizeimage, as per spec (S_FMT) this
field should be filled with correct image size? If the coded
width/height is zero sizeimage will be unknown. I think we have two
options, the user fill sizeimage with bigger enough size or the driver
has to have some default size.

> +
> +   b. Return values:
> +
> +      i.  EINVAL: unsupported format.
> +
> +      ii. Others: per spec
> +
> +   .. note::
> +
> +      The driver must not adjust pixelformat, so if
> +      ``V4L2_PIX_FMT_H264`` is passed but only
> +      ``V4L2_PIX_FMT_H264_SLICE`` is supported, S_FMT will return
> +      -EINVAL. If both are acceptable by client, calling S_FMT for
> +      the other after one gets rejected may be required (or use
> +      :c:func:`VIDIOC_ENUM_FMT` to discover beforehand, see Capability
> +      enumeration).
> +
> +3.  (optional) Get minimum number of buffers required for OUTPUT queue
> +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> +    more buffers than minimum required by hardware/format (see
> +    allocation).
> +
> +    a. Required fields:
> +
> +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> +
> +    b. Return values: per spec.
> +
> +    c. Return fields:
> +
> +       i. value: required number of OUTPUT buffers for the currently set
> +          format;
> +
> +4.  Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on OUTPUT
> +    queue.
> +
> +    a. Required fields:
> +
> +       i.   count = n, where n > 0.
> +
> +       ii.  type = OUTPUT
> +
> +       iii. memory = as per spec
> +
> +    b. Return values: Per spec.
> +
> +    c. Return fields:
> +
> +       i. count: adjusted to allocated number of buffers
> +
> +    d. The driver must adjust count to minimum of required number of
> +       source buffers for given format and count passed. The client
> +       must check this value after the ioctl returns to get the
> +       number of buffers allocated.
> +
> +    .. note::
> +
> +       Passing count = 1 is useful for letting the driver choose
> +       the minimum according to the selected format/hardware
> +       requirements.
> +
> +    .. note::
> +
> +       To allocate more than minimum number of buffers (for pipeline
> +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT)`` to
> +       get minimum number of buffers required by the driver/format,
> +       and pass the obtained value plus the number of additional
> +       buffers needed in count to :c:func:`VIDIOC_REQBUFS`.
> +
> +5.  Begin parsing the stream for stream metadata via :c:func:`VIDIOC_STREAMON` on
> +    OUTPUT queue. This step allows the driver to parse/decode
> +    initial stream metadata until enough information to allocate
> +    CAPTURE buffers is found. This is indicated by the driver by
> +    sending a ``V4L2_EVENT_SOURCE_CHANGE`` event, which the client
> +    must handle.
> +
> +    a. Required fields: as per spec.
> +
> +    b. Return values: as per spec.
> +
> +    .. note::
> +
> +       Calling :c:func:`VIDIOC_REQBUFS`, :c:func:`VIDIOC_STREAMON`
> +       or :c:func:`VIDIOC_G_FMT` on the CAPTURE queue at this time is not
> +       allowed and must return EINVAL.
> +
> +6.  This step only applies for coded formats that contain resolution
> +    information in the stream.

maybe an example of such coded formats will be good to have.

> +    Continue queuing/dequeuing bitstream buffers to/from the
> +    OUTPUT queue via :c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`. The driver
> +    must keep processing and returning each buffer to the client
> +    until required metadata to send a ``V4L2_EVENT_SOURCE_CHANGE``
> +    for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION`` is
> +    found. There is no requirement to pass enough data for this to
> +    occur in the first buffer and the driver must be able to
> +    process any number
> +
> +    a. Required fields: as per spec.
> +
> +    b. Return values: as per spec.
> +
> +    c. If data in a buffer that triggers the event is required to decode
> +       the first frame, the driver must not return it to the client,
> +       but must retain it for further decoding.
> +
> +    d. Until the resolution source event is sent to the client, calling
> +       :c:func:`VIDIOC_G_FMT` on the CAPTURE queue must return -EINVAL.
> +
> +    .. note::
> +
> +       No decoded frames are produced during this phase.
> +

<snip>
Nicolas Dufresne June 14, 2018, 1:56 p.m. UTC | #21
Le jeudi 14 juin 2018 à 15:34 +0300, Stanimir Varbanov a écrit :
> Hi Tomasz,
> 
> 
> On 06/05/2018 01:33 PM, Tomasz Figa wrote:
> > Due to complexity of the video decoding process, the V4L2 drivers of
> > stateful decoder hardware require specific sequencies of V4L2 API calls
> > to be followed. These include capability enumeration, initialization,
> > decoding, seek, pause, dynamic resolution change, flush and end of
> > stream.
> > 
> > Specifics of the above have been discussed during Media Workshops at
> > LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> > Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> > originated at those events was later implemented by the drivers we already
> > have merged in mainline, such as s5p-mfc or mtk-vcodec.
> > 
> > The only thing missing was the real specification included as a part of
> > Linux Media documentation. Fix it now and document the decoder part of
> > the Codec API.
> > 
> > Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> > ---
> >  Documentation/media/uapi/v4l/dev-codec.rst | 771 +++++++++++++++++++++
> >  Documentation/media/uapi/v4l/v4l2.rst      |  14 +-
> >  2 files changed, 784 insertions(+), 1 deletion(-)
> > 
> > diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> > index c61e938bd8dc..0483b10c205e 100644
> > --- a/Documentation/media/uapi/v4l/dev-codec.rst
> > +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> 
> <snip>
> 
> > +Initialization sequence
> > +-----------------------
> > +
> > +1. (optional) Enumerate supported OUTPUT formats and resolutions. See
> > +   capability enumeration.
> > +
> > +2. Set a coded format on the source queue via :c:func:`VIDIOC_S_FMT`
> > +
> > +   a. Required fields:
> > +
> > +      i.   type = OUTPUT
> > +
> > +      ii.  fmt.pix_mp.pixelformat set to a coded format
> > +
> > +      iii. fmt.pix_mp.width, fmt.pix_mp.height only if cannot be
> > +           parsed from the stream for the given coded format;
> > +           ignored otherwise;
> 
> Can we say that if width != 0 and height != 0 then the user knows the
> real coded resolution? And vise versa if width/height are both zero the
> driver should parse the stream metadata?

The driver always need to parse the stream metadata, since there could
be an x,y offset too, it's not just right/bottom cropping. And then
G_SELECTION is required.

> 
> Also what about fmt.pix_mp.plane_fmt.sizeimage, as per spec (S_FMT) this
> field should be filled with correct image size? If the coded
> width/height is zero sizeimage will be unknown. I think we have two
> options, the user fill sizeimage with bigger enough size or the driver
> has to have some default size.
> 
> > +
> > +   b. Return values:
> > +
> > +      i.  EINVAL: unsupported format.
> > +
> > +      ii. Others: per spec
> > +
> > +   .. note::
> > +
> > +      The driver must not adjust pixelformat, so if
> > +      ``V4L2_PIX_FMT_H264`` is passed but only
> > +      ``V4L2_PIX_FMT_H264_SLICE`` is supported, S_FMT will return
> > +      -EINVAL. If both are acceptable by client, calling S_FMT for
> > +      the other after one gets rejected may be required (or use
> > +      :c:func:`VIDIOC_ENUM_FMT` to discover beforehand, see Capability
> > +      enumeration).
> > +
> > +3.  (optional) Get minimum number of buffers required for OUTPUT queue
> > +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> > +    more buffers than minimum required by hardware/format (see
> > +    allocation).
> > +
> > +    a. Required fields:
> > +
> > +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> > +
> > +    b. Return values: per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i. value: required number of OUTPUT buffers for the currently set
> > +          format;
> > +
> > +4.  Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on OUTPUT
> > +    queue.
> > +
> > +    a. Required fields:
> > +
> > +       i.   count = n, where n > 0.
> > +
> > +       ii.  type = OUTPUT
> > +
> > +       iii. memory = as per spec
> > +
> > +    b. Return values: Per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i. count: adjusted to allocated number of buffers
> > +
> > +    d. The driver must adjust count to minimum of required number of
> > +       source buffers for given format and count passed. The client
> > +       must check this value after the ioctl returns to get the
> > +       number of buffers allocated.
> > +
> > +    .. note::
> > +
> > +       Passing count = 1 is useful for letting the driver choose
> > +       the minimum according to the selected format/hardware
> > +       requirements.
> > +
> > +    .. note::
> > +
> > +       To allocate more than minimum number of buffers (for pipeline
> > +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT)`` to
> > +       get minimum number of buffers required by the driver/format,
> > +       and pass the obtained value plus the number of additional
> > +       buffers needed in count to :c:func:`VIDIOC_REQBUFS`.
> > +
> > +5.  Begin parsing the stream for stream metadata via :c:func:`VIDIOC_STREAMON` on
> > +    OUTPUT queue. This step allows the driver to parse/decode
> > +    initial stream metadata until enough information to allocate
> > +    CAPTURE buffers is found. This is indicated by the driver by
> > +    sending a ``V4L2_EVENT_SOURCE_CHANGE`` event, which the client
> > +    must handle.
> > +
> > +    a. Required fields: as per spec.
> > +
> > +    b. Return values: as per spec.
> > +
> > +    .. note::
> > +
> > +       Calling :c:func:`VIDIOC_REQBUFS`, :c:func:`VIDIOC_STREAMON`
> > +       or :c:func:`VIDIOC_G_FMT` on the CAPTURE queue at this time is not
> > +       allowed and must return EINVAL.
> > +
> > +6.  This step only applies for coded formats that contain resolution
> > +    information in the stream.
> 
> maybe an example of such coded formats will be good to have.
> 
> > +    Continue queuing/dequeuing bitstream buffers to/from the
> > +    OUTPUT queue via :c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`. The driver
> > +    must keep processing and returning each buffer to the client
> > +    until required metadata to send a ``V4L2_EVENT_SOURCE_CHANGE``
> > +    for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION`` is
> > +    found. There is no requirement to pass enough data for this to
> > +    occur in the first buffer and the driver must be able to
> > +    process any number
> > +
> > +    a. Required fields: as per spec.
> > +
> > +    b. Return values: as per spec.
> > +
> > +    c. If data in a buffer that triggers the event is required to decode
> > +       the first frame, the driver must not return it to the client,
> > +       but must retain it for further decoding.
> > +
> > +    d. Until the resolution source event is sent to the client, calling
> > +       :c:func:`VIDIOC_G_FMT` on the CAPTURE queue must return -EINVAL.
> > +
> > +    .. note::
> > +
> > +       No decoded frames are produced during this phase.
> > +
> 
> <snip>
>
Tomasz Figa June 15, 2018, 8:02 a.m. UTC | #22
Hi Stanimir,

On Thu, Jun 14, 2018 at 9:34 PM Stanimir Varbanov
<stanimir.varbanov@linaro.org> wrote:
>
> Hi Tomasz,
>
>
> On 06/05/2018 01:33 PM, Tomasz Figa wrote:
> > Due to complexity of the video decoding process, the V4L2 drivers of
> > stateful decoder hardware require specific sequencies of V4L2 API calls
> > to be followed. These include capability enumeration, initialization,
> > decoding, seek, pause, dynamic resolution change, flush and end of
> > stream.
> >
> > Specifics of the above have been discussed during Media Workshops at
> > LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> > Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> > originated at those events was later implemented by the drivers we already
> > have merged in mainline, such as s5p-mfc or mtk-vcodec.
> >
> > The only thing missing was the real specification included as a part of
> > Linux Media documentation. Fix it now and document the decoder part of
> > the Codec API.
> >
> > Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> > ---
> >  Documentation/media/uapi/v4l/dev-codec.rst | 771 +++++++++++++++++++++
> >  Documentation/media/uapi/v4l/v4l2.rst      |  14 +-
> >  2 files changed, 784 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> > index c61e938bd8dc..0483b10c205e 100644
> > --- a/Documentation/media/uapi/v4l/dev-codec.rst
> > +++ b/Documentation/media/uapi/v4l/dev-codec.rst
>
> <snip>
>
> > +Initialization sequence
> > +-----------------------
> > +
> > +1. (optional) Enumerate supported OUTPUT formats and resolutions. See
> > +   capability enumeration.
> > +
> > +2. Set a coded format on the source queue via :c:func:`VIDIOC_S_FMT`
> > +
> > +   a. Required fields:
> > +
> > +      i.   type = OUTPUT
> > +
> > +      ii.  fmt.pix_mp.pixelformat set to a coded format
> > +
> > +      iii. fmt.pix_mp.width, fmt.pix_mp.height only if cannot be
> > +           parsed from the stream for the given coded format;
> > +           ignored otherwise;
>
> Can we say that if width != 0 and height != 0 then the user knows the
> real coded resolution? And vise versa if width/height are both zero the
> driver should parse the stream metadata?
>
> Also what about fmt.pix_mp.plane_fmt.sizeimage, as per spec (S_FMT) this
> field should be filled with correct image size? If the coded
> width/height is zero sizeimage will be unknown. I think we have two
> options, the user fill sizeimage with bigger enough size or the driver
> has to have some default size.

First of all, thanks for review!

It's a bit more tricky, because not all hardware may permit the
resolution of CAPTURE buffers, based on what userspace set on OUTPUT
queue.

I'd say that the hardware should always parse these data from the
stream, if it has such ability. If it parses, it should update the
OUTPUT format and, if CAPTURE format as set by userspace is not
compatible with HW requirements, it should adjust CAPTURE format
appropriately. It would then send a source change event, mandating the
userspace to read the new format.

That would be still compatible with old userspace (GStreamer), since
on hardware it used to work, the resulting CAPTURE format would be
compatible with hardware.

As for sizeimage on OUTPUT, it doesn't really make much sense, because
OUTPUT queue is fed with compressed bitstream. Existing drivers accept
this coming from userspace. If there is a specific HW requirement
(e.g. constant buffer size or at least N bytes), the driver should
adjust it appropriately on S_FMT then.

>
> > +
> > +   b. Return values:
> > +
> > +      i.  EINVAL: unsupported format.
> > +
> > +      ii. Others: per spec
> > +
> > +   .. note::
> > +
> > +      The driver must not adjust pixelformat, so if
> > +      ``V4L2_PIX_FMT_H264`` is passed but only
> > +      ``V4L2_PIX_FMT_H264_SLICE`` is supported, S_FMT will return
> > +      -EINVAL. If both are acceptable by client, calling S_FMT for
> > +      the other after one gets rejected may be required (or use
> > +      :c:func:`VIDIOC_ENUM_FMT` to discover beforehand, see Capability
> > +      enumeration).
> > +
> > +3.  (optional) Get minimum number of buffers required for OUTPUT queue
> > +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> > +    more buffers than minimum required by hardware/format (see
> > +    allocation).
> > +
> > +    a. Required fields:
> > +
> > +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> > +
> > +    b. Return values: per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i. value: required number of OUTPUT buffers for the currently set
> > +          format;
> > +
> > +4.  Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on OUTPUT
> > +    queue.
> > +
> > +    a. Required fields:
> > +
> > +       i.   count = n, where n > 0.
> > +
> > +       ii.  type = OUTPUT
> > +
> > +       iii. memory = as per spec
> > +
> > +    b. Return values: Per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i. count: adjusted to allocated number of buffers
> > +
> > +    d. The driver must adjust count to minimum of required number of
> > +       source buffers for given format and count passed. The client
> > +       must check this value after the ioctl returns to get the
> > +       number of buffers allocated.
> > +
> > +    .. note::
> > +
> > +       Passing count = 1 is useful for letting the driver choose
> > +       the minimum according to the selected format/hardware
> > +       requirements.
> > +
> > +    .. note::
> > +
> > +       To allocate more than minimum number of buffers (for pipeline
> > +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT)`` to
> > +       get minimum number of buffers required by the driver/format,
> > +       and pass the obtained value plus the number of additional
> > +       buffers needed in count to :c:func:`VIDIOC_REQBUFS`.
> > +
> > +5.  Begin parsing the stream for stream metadata via :c:func:`VIDIOC_STREAMON` on
> > +    OUTPUT queue. This step allows the driver to parse/decode
> > +    initial stream metadata until enough information to allocate
> > +    CAPTURE buffers is found. This is indicated by the driver by
> > +    sending a ``V4L2_EVENT_SOURCE_CHANGE`` event, which the client
> > +    must handle.
> > +
> > +    a. Required fields: as per spec.
> > +
> > +    b. Return values: as per spec.
> > +
> > +    .. note::
> > +
> > +       Calling :c:func:`VIDIOC_REQBUFS`, :c:func:`VIDIOC_STREAMON`
> > +       or :c:func:`VIDIOC_G_FMT` on the CAPTURE queue at this time is not
> > +       allowed and must return EINVAL.
> > +
> > +6.  This step only applies for coded formats that contain resolution
> > +    information in the stream.
>
> maybe an example of such coded formats will be good to have.

I think we should make it more like "for coded formats that the
hardware is able to parse resolution information from the stream".
Obviously we can still list formats, which include such information,
but we should add a note saying that such capability is
hardware-specific.

Best regards,
Tomasz
Nicolas Dufresne Sept. 11, 2018, 2:26 a.m. UTC | #23
Le mardi 05 juin 2018 à 19:33 +0900, Tomasz Figa a écrit :
> Due to complexity of the video decoding process, the V4L2 drivers of
> stateful decoder hardware require specific sequencies of V4L2 API calls
> to be followed. These include capability enumeration, initialization,
> decoding, seek, pause, dynamic resolution change, flush and end of
> stream.
> 
> Specifics of the above have been discussed during Media Workshops at
> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> originated at those events was later implemented by the drivers we already
> have merged in mainline, such as s5p-mfc or mtk-vcodec.
> 
> The only thing missing was the real specification included as a part of
> Linux Media documentation. Fix it now and document the decoder part of
> the Codec API.
> 
> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> ---
>  Documentation/media/uapi/v4l/dev-codec.rst | 771 +++++++++++++++++++++
>  Documentation/media/uapi/v4l/v4l2.rst      |  14 +-
>  2 files changed, 784 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> index c61e938bd8dc..0483b10c205e 100644
> --- a/Documentation/media/uapi/v4l/dev-codec.rst
> +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> @@ -34,3 +34,774 @@ the codec and reprogram it whenever another file handler gets access.
>  This is different from the usual video node behavior where the video
>  properties are global to the device (i.e. changing something through one
>  file handle is visible through another file handle).
> +
> +This interface is generally appropriate for hardware that does not
> +require additional software involvement to parse/partially decode/manage
> +the stream before/after processing in hardware.
> +
> +Input data to the Stream API are buffers containing unprocessed video
> +stream (Annex-B H264/H265 stream, raw VP8/9 stream) only. The driver is
> +expected not to require any additional information from the client to
> +process these buffers, and to return decoded frames on the CAPTURE queue
> +in display order.
> +
> +Performing software parsing, processing etc. of the stream in the driver
> +in order to support stream API is strongly discouraged. In such case use
> +of Stateless Codec Interface (in development) is preferred.
> +
> +Conventions and notation used in this document
> +==============================================
> +
> +1. The general V4L2 API rules apply if not specified in this document
> +   otherwise.
> +
> +2. The meaning of words “must”, “may”, “should”, etc. is as per RFC
> +   2119.
> +
> +3. All steps not marked “optional” are required.
> +
> +4. :c:func:`VIDIOC_G_EXT_CTRLS`, :c:func:`VIDIOC_S_EXT_CTRLS` may be used interchangeably with
> +   :c:func:`VIDIOC_G_CTRL`, :c:func:`VIDIOC_S_CTRL`, unless specified otherwise.
> +
> +5. Single-plane API (see spec) and applicable structures may be used
> +   interchangeably with Multi-plane API, unless specified otherwise.
> +
> +6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
> +   [0..2]: i = 0, 1, 2.
> +
> +7. For OUTPUT buffer A, A’ represents a buffer on the CAPTURE queue
> +   containing data (decoded or encoded frame/stream) that resulted
> +   from processing buffer A.
> +
> +Glossary
> +========
> +
> +CAPTURE
> +   the destination buffer queue, decoded frames for
> +   decoders, encoded bitstream for encoders;
> +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE`` or
> +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``
> +
> +client
> +   application client communicating with the driver
> +   implementing this API
> +
> +coded format
> +   encoded/compressed video bitstream format (e.g.
> +   H.264, VP8, etc.); see raw format; this is not equivalent to fourcc
> +   (V4L2 pixelformat), as each coded format may be supported by multiple
> +   fourccs (e.g. ``V4L2_PIX_FMT_H264``, ``V4L2_PIX_FMT_H264_SLICE``, etc.)
> +
> +coded height
> +   height for given coded resolution
> +
> +coded resolution
> +   stream resolution in pixels aligned to codec
> +   format and hardware requirements; see also visible resolution
> +
> +coded width
> +   width for given coded resolution
> +
> +decode order
> +   the order in which frames are decoded; may differ
> +   from display (output) order if frame reordering (B frames) is active in
> +   the stream; OUTPUT buffers must be queued in decode order; for frame
> +   API, CAPTURE buffers must be returned by the driver in decode order;
> +
> +display order
> +   the order in which frames must be displayed
> +   (outputted); for stream API, CAPTURE buffers must be returned by the
> +   driver in display order;
> +
> +EOS
> +   end of stream
> +
> +input height
> +   height in pixels for given input resolution
> +
> +input resolution
> +   resolution in pixels of source frames being input
> +   to the encoder and subject to further cropping to the bounds of visible
> +   resolution
> +
> +input width
> +   width in pixels for given input resolution
> +
> +OUTPUT
> +   the source buffer queue, encoded bitstream for
> +   decoders, raw frames for encoders; ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
> +   ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``
> +
> +raw format
> +   uncompressed format containing raw pixel data (e.g.
> +   YUV, RGB formats)
> +
> +resume point
> +   a point in the bitstream from which decoding may
> +   start/continue, without any previous state/data present, e.g.: a
> +   keyframe (VPX) or SPS/PPS/IDR sequence (H.264); a resume point is
> +   required to start decode of a new stream, or to resume decoding after a
> +   seek;
> +
> +source buffer
> +   buffers allocated for source queue
> +
> +source queue
> +   queue containing buffers used for source data, i.e.
> +
> +visible height
> +   height for given visible resolution
> +
> +visible resolution
> +   stream resolution of the visible picture, in
> +   pixels, to be used for display purposes; must be smaller or equal to
> +   coded resolution;
> +
> +visible width
> +   width for given visible resolution
> +
> +Decoder
> +=======
> +
> +Querying capabilities
> +---------------------
> +
> +1. To enumerate the set of coded formats supported by the driver, the
> +   client uses :c:func:`VIDIOC_ENUM_FMT` for OUTPUT. The driver must always
> +   return the full set of supported formats, irrespective of the
> +   format set on the CAPTURE queue.
> +
> +2. To enumerate the set of supported raw formats, the client uses
> +   :c:func:`VIDIOC_ENUM_FMT` for CAPTURE. The driver must return only the
> +   formats supported for the format currently set on the OUTPUT
> +   queue.
> +   In order to enumerate raw formats supported by a given coded
> +   format, the client must first set that coded format on the
> +   OUTPUT queue and then enumerate the CAPTURE queue.
> +
> +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> +   resolutions for a given format, passing its fourcc in
> +   :c:type:`v4l2_frmivalenum` ``pixel_format``.
> +
> +   a. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for coded formats
> +      must be maximums for given coded format for all supported raw
> +      formats.
> +
> +   b. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for raw formats must
> +      be maximums for given raw format for all supported coded
> +      formats.
> +
> +   c. The client should derive the supported resolution for a
> +      combination of coded+raw format by calculating the
> +      intersection of resolutions returned from calls to
> +      :c:func:`VIDIOC_ENUM_FRAMESIZES` for the given coded and raw formats.
> +
> +4. Supported profiles and levels for given format, if applicable, may be
> +   queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> +
> +5. The client may use :c:func:`VIDIOC_ENUM_FRAMEINTERVALS` to enumerate maximum
> +   supported framerates by the driver/hardware for a given
> +   format+resolution combination.
> +
> +Initialization sequence
> +-----------------------
> +
> +1. (optional) Enumerate supported OUTPUT formats and resolutions. See
> +   capability enumeration.
> +
> +2. Set a coded format on the source queue via :c:func:`VIDIOC_S_FMT`
> +
> +   a. Required fields:
> +
> +      i.   type = OUTPUT
> +
> +      ii.  fmt.pix_mp.pixelformat set to a coded format
> +
> +      iii. fmt.pix_mp.width, fmt.pix_mp.height only if cannot be
> +           parsed from the stream for the given coded format;
> +           ignored otherwise;
> +
> +   b. Return values:
> +
> +      i.  EINVAL: unsupported format.
> +
> +      ii. Others: per spec
> +
> +   .. note::
> +
> +      The driver must not adjust pixelformat, so if
> +      ``V4L2_PIX_FMT_H264`` is passed but only
> +      ``V4L2_PIX_FMT_H264_SLICE`` is supported, S_FMT will return
> +      -EINVAL. If both are acceptable by client, calling S_FMT for
> +      the other after one gets rejected may be required (or use
> +      :c:func:`VIDIOC_ENUM_FMT` to discover beforehand, see Capability
> +      enumeration).
> +
> +3.  (optional) Get minimum number of buffers required for OUTPUT queue
> +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> +    more buffers than minimum required by hardware/format (see
> +    allocation).
> +
> +    a. Required fields:
> +
> +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> +
> +    b. Return values: per spec.
> +
> +    c. Return fields:
> +
> +       i. value: required number of OUTPUT buffers for the currently set
> +          format;
> +
> +4.  Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on OUTPUT
> +    queue.
> +
> +    a. Required fields:
> +
> +       i.   count = n, where n > 0.
> +
> +       ii.  type = OUTPUT
> +
> +       iii. memory = as per spec
> +
> +    b. Return values: Per spec.
> +
> +    c. Return fields:
> +
> +       i. count: adjusted to allocated number of buffers
> +
> +    d. The driver must adjust count to minimum of required number of
> +       source buffers for given format and count passed. The client
> +       must check this value after the ioctl returns to get the
> +       number of buffers allocated.
> +
> +    .. note::
> +
> +       Passing count = 1 is useful for letting the driver choose
> +       the minimum according to the selected format/hardware
> +       requirements.
> +
> +    .. note::
> +
> +       To allocate more than minimum number of buffers (for pipeline
> +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT)`` to
> +       get minimum number of buffers required by the driver/format,
> +       and pass the obtained value plus the number of additional
> +       buffers needed in count to :c:func:`VIDIOC_REQBUFS`.
> +
> +5.  Begin parsing the stream for stream metadata via :c:func:`VIDIOC_STREAMON` on
> +    OUTPUT queue. This step allows the driver to parse/decode
> +    initial stream metadata until enough information to allocate
> +    CAPTURE buffers is found. This is indicated by the driver by
> +    sending a ``V4L2_EVENT_SOURCE_CHANGE`` event, which the client
> +    must handle.
> +
> +    a. Required fields: as per spec.
> +
> +    b. Return values: as per spec.
> +
> +    .. note::
> +
> +       Calling :c:func:`VIDIOC_REQBUFS`, :c:func:`VIDIOC_STREAMON`
> +       or :c:func:`VIDIOC_G_FMT` on the CAPTURE queue at this time is not
> +       allowed and must return EINVAL.
> +
> +6.  This step only applies for coded formats that contain resolution
> +    information in the stream.
> +    Continue queuing/dequeuing bitstream buffers to/from the
> +    OUTPUT queue via :c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`. The driver
> +    must keep processing and returning each buffer to the client
> +    until required metadata to send a ``V4L2_EVENT_SOURCE_CHANGE``
> +    for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION`` is
> +    found. There is no requirement to pass enough data for this to
> +    occur in the first buffer and the driver must be able to
> +    process any number
> +
> +    a. Required fields: as per spec.
> +
> +    b. Return values: as per spec.
> +
> +    c. If data in a buffer that triggers the event is required to decode
> +       the first frame, the driver must not return it to the client,
> +       but must retain it for further decoding.
> +
> +    d. Until the resolution source event is sent to the client, calling
> +       :c:func:`VIDIOC_G_FMT` on the CAPTURE queue must return -EINVAL.
> +
> +    .. note::
> +
> +       No decoded frames are produced during this phase.
> +
> +7.  This step only applies for coded formats that contain resolution
> +    information in the stream.
> +    Receive and handle ``V4L2_EVENT_SOURCE_CHANGE`` from the driver
> +    via :c:func:`VIDIOC_DQEVENT`. The driver must send this event once
> +    enough data is obtained from the stream to allocate CAPTURE
> +    buffers and to begin producing decoded frames.
> +
> +    a. Required fields:
> +
> +       i. type = ``V4L2_EVENT_SOURCE_CHANGE``
> +
> +    b. Return values: as per spec.
> +
> +    c. The driver must return u.src_change.changes =
> +       ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> +
> +8.  This step only applies for coded formats that contain resolution
> +    information in the stream.
> +    Call :c:func:`VIDIOC_G_FMT` for CAPTURE queue to get format for the
> +    destination buffers parsed/decoded from the bitstream.
> +
> +    a. Required fields:
> +
> +       i. type = CAPTURE
> +
> +    b. Return values: as per spec.
> +
> +    c. Return fields:
> +
> +       i.   fmt.pix_mp.width, fmt.pix_mp.height: coded resolution
> +            for the decoded frames
> +
> +       ii.  fmt.pix_mp.pixelformat: default/required/preferred by
> +            driver pixelformat for decoded frames.
> +
> +       iii. num_planes: set to number of planes for pixelformat.
> +
> +       iv.  For each plane p = [0, num_planes-1]:
> +            plane_fmt[p].sizeimage, plane_fmt[p].bytesperline as
> +            per spec for coded resolution.
> +
> +    .. note::
> +
> +       Te value of pixelformat may be any pixel format supported,
> +       and must
> +       be supported for current stream, based on the information
> +       parsed from the stream and hardware capabilities. It is
> +       suggested that driver chooses the preferred/optimal format
> +       for given configuration. For example, a YUV format may be
> +       preferred over an RGB format, if additional conversion step
> +       would be required.
> +
> +9.  (optional) Enumerate CAPTURE formats via :c:func:`VIDIOC_ENUM_FMT` on
> +    CAPTURE queue.
> +    Once the stream information is parsed and known, the client
> +    may use this ioctl to discover which raw formats are supported
> +    for given stream and select on of them via :c:func:`VIDIOC_S_FMT`.
> +
> +    a. Fields/return values as per spec.
> +
> +    .. note::
> +
> +       The driver must return only formats supported for the
> +       current stream parsed in this initialization sequence, even
> +       if more formats may be supported by the driver in general.
> +       For example, a driver/hardware may support YUV and RGB
> +       formats for resolutions 1920x1088 and lower, but only YUV for
> +       higher resolutions (e.g. due to memory bandwidth
> +       limitations). After parsing a resolution of 1920x1088 or
> +       lower, :c:func:`VIDIOC_ENUM_FMT` may return a set of YUV and RGB
> +       pixelformats, but after parsing resolution higher than
> +       1920x1088, the driver must not return (unsupported for this
> +       resolution) RGB.
> +
> +       However, subsequent resolution change event
> +       triggered after discovering a resolution change within the
> +       same stream may switch the stream into a lower resolution;
> +       :c:func:`VIDIOC_ENUM_FMT` must return RGB formats again in that case.
> +
> +10.  (optional) Choose a different CAPTURE format than suggested via
> +     :c:func:`VIDIOC_S_FMT` on CAPTURE queue. It is possible for the client
> +     to choose a different format than selected/suggested by the
> +     driver in :c:func:`VIDIOC_G_FMT`.
> +
> +     a. Required fields:
> +
> +        i.  type = CAPTURE
> +
> +        ii. fmt.pix_mp.pixelformat set to a coded format
> +
> +     b. Return values:
> +
> +        i. EINVAL: unsupported format.
> +
> +     c. Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available formats
> +        after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to find
> +        out a set of allowed pixelformats for given configuration,
> +        but not required.
> +
> +11.  (optional) Acquire visible resolution via :c:func:`VIDIOC_G_SELECTION`.
> +
> +    a. Required fields:
> +
> +       i.  type = CAPTURE
> +
> +       ii. target = ``V4L2_SEL_TGT_CROP``
> +
> +    b. Return values: per spec.
> +
> +    c. Return fields
> +
> +       i. r.left, r.top, r.width, r.height: visible rectangle; this must
> +          fit within coded resolution returned from :c:func:`VIDIOC_G_FMT`.
> +
> +12. (optional) Get minimum number of buffers required for CAPTURE queue
> +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> +    more buffers than minimum required by hardware/format (see
> +    allocation).
> +
> +    a. Required fields:
> +
> +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
> +
> +    b. Return values: per spec.
> +
> +    c. Return fields:
> +
> +       i. value: minimum number of buffers required to decode the stream
> +          parsed in this initialization sequence.
> +
> +    .. note::
> +
> +       Note that the minimum number of buffers must be at least the
> +       number required to successfully decode the current stream.
> +       This may for example be the required DPB size for an H.264
> +       stream given the parsed stream configuration (resolution,
> +       level).
> +
> +13. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS` on the
> +    CAPTURE queue.
> +
> +    a. Required fields:
> +
> +       i.   count = n, where n > 0.
> +
> +       ii.  type = CAPTURE
> +
> +       iii. memory = as per spec
> +
> +    b. Return values: Per spec.
> +
> +    c. Return fields:
> +
> +       i. count: adjusted to allocated number of buffers.
> +
> +    d. The driver must adjust count to minimum of required number of
> +       destination buffers for given format and stream configuration
> +       and the count passed. The client must check this value after
> +       the ioctl returns to get the number of buffers allocated.
> +
> +    .. note::
> +
> +       Passing count = 1 is useful for letting the driver choose
> +       the minimum.
> +
> +    .. note::
> +
> +       To allocate more than minimum number of buffers (for pipeline
> +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE)`` to
> +       get minimum number of buffers required, and pass the obtained
> +       value plus the number of additional buffers needed in count
> +       to :c:func:`VIDIOC_REQBUFS`.
> +
> +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
> +
> +    a. Required fields: as per spec.
> +
> +    b. Return values: as per spec.
> +
> +Decoding
> +--------
> +
> +This state is reached after a successful initialization sequence. In
> +this state, client queues and dequeues buffers to both queues via
> +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, as per spec.
> +
> +Both queues operate independently. The client may queue and dequeue
> +buffers to queues in any order and at any rate, also at a rate different
> +for each queue. The client may queue buffers within the same queue in
> +any order (V4L2 index-wise). It is recommended for the client to operate
> +the queues independently for best performance.
> +
> +Source OUTPUT buffers must contain:
> +
> +-  H.264/AVC: one or more complete NALUs of an Annex B elementary
> +   stream; one buffer does not have to contain enough data to decode
> +   a frame;
> +
> +-  VP8/VP9: one or more complete frames.
> +
> +No direct relationship between source and destination buffers and the
> +timing of buffers becoming available to dequeue should be assumed in the
> +Stream API. Specifically:
> +
> +-  a buffer queued to OUTPUT queue may result in no buffers being
> +   produced on the CAPTURE queue (e.g. if it does not contain
> +   encoded data, or if only metadata syntax structures are present
> +   in it), or one or more buffers produced on the CAPTURE queue (if
> +   the encoded data contained more than one frame, or if returning a
> +   decoded frame allowed the driver to return a frame that preceded
> +   it in decode, but succeeded it in display order)
> +
> +-  a buffer queued to OUTPUT may result in a buffer being produced on
> +   the CAPTURE queue later into decode process, and/or after
> +   processing further OUTPUT buffers, or be returned out of order,
> +   e.g. if display reordering is used
> +
> +-  buffers may become available on the CAPTURE queue without additional
> +   buffers queued to OUTPUT (e.g. during flush or EOS)
> +
> +Seek
> +----
> +
> +Seek is controlled by the OUTPUT queue, as it is the source of bitstream
> +data. CAPTURE queue remains unchanged/unaffected.
> +
> +1. Stop the OUTPUT queue to begin the seek sequence via
> +   :c:func:`VIDIOC_STREAMOFF`.
> +
> +   a. Required fields:
> +
> +      i. type = OUTPUT
> +
> +   b. The driver must drop all the pending OUTPUT buffers and they are
> +      treated as returned to the client (as per spec).
> +
> +2. Restart the OUTPUT queue via :c:func:`VIDIOC_STREAMON`
> +
> +   a. Required fields:
> +
> +      i. type = OUTPUT
> +
> +   b. The driver must be put in a state after seek and be ready to
> +      accept new source bitstream buffers.
> +
> +3. Start queuing buffers to OUTPUT queue containing stream data after
> +   the seek until a suitable resume point is found.
> +
> +   .. note::
> +
> +      There is no requirement to begin queuing stream
> +      starting exactly from a resume point (e.g. SPS or a keyframe).
> +      The driver must handle any data queued and must keep processing
> +      the queued buffers until it finds a suitable resume point.
> +      While looking for a resume point, the driver processes OUTPUT
> +      buffers and returns them to the client without producing any
> +      decoded frames.
> +
> +4. After a resume point is found, the driver will start returning
> +   CAPTURE buffers with decoded frames.
> +
> +   .. note::
> +
> +      There is no precise specification for CAPTURE queue of when it
> +      will start producing buffers containing decoded data from
> +      buffers queued after the seek, as it operates independently
> +      from OUTPUT queue.
> +
> +      -  The driver is allowed to and may return a number of remaining CAPTURE
> +         buffers containing decoded frames from before the seek after the
> +         seek sequence (STREAMOFF-STREAMON) is performed.
> +
> +      -  The driver is also allowed to and may not return all decoded frames
> +         queued but not decode before the seek sequence was initiated.
> +         E.g. for an OUTPUT queue sequence: QBUF(A), QBUF(B),
> +         STREAMOFF(OUT), STREAMON(OUT), QBUF(G), QBUF(H), any of the
> +         following results on the CAPTURE queue is allowed: {A’, B’, G’,
> +         H’}, {A’, G’, H’}, {G’, H’}.
> +
> +Pause
> +-----
> +
> +In order to pause, the client should just cease queuing buffers onto the
> +OUTPUT queue. This is different from the general V4L2 API definition of
> +pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue. Without
> +source bitstream data, there is not data to process and the hardware
> +remains idle. Conversely, using :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue
> +indicates a seek, which 1) drops all buffers in flight and 2) after a
> +subsequent :c:func:`VIDIOC_STREAMON` will look for and only continue from a
> +resume point. This is usually undesirable for pause. The
> +STREAMOFF-STREAMON sequence is intended for seeking.
> +
> +Similarly, CAPTURE queue should remain streaming as well, as the
> +STREAMOFF-STREAMON sequence on it is intended solely for changing buffer
> +sets
> +
> +Dynamic resolution change
> +-------------------------
> +
> +When driver encounters a resolution change in the stream, the dynamic
> +resolution change sequence is started.
> +
> +1.  On encountering a resolution change in the stream. The driver must
> +    first process and decode all remaining buffers from before the
> +    resolution change point.
> +
> +2.  After all buffers containing decoded frames from before the
> +    resolution change point are ready to be dequeued on the
> +    CAPTURE queue, the driver sends a ``V4L2_EVENT_SOURCE_CHANGE``
> +    event for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> +    The last buffer from before the change must be marked with
> +    :c:type:`v4l2_buffer` ``flags`` flag ``V4L2_BUF_FLAG_LAST`` as in the flush
> +    sequence.
> +
> +    .. note::
> +
> +       Any attempts to dequeue more buffers beyond the buffer marked
> +       with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
> +       :c:func:`VIDIOC_DQBUF`.
> +
> +3.  After dequeuing all remaining buffers from the CAPTURE queue, the
> +    client must call :c:func:`VIDIOC_STREAMOFF` on the CAPTURE queue. The
> +    OUTPUT queue remains streaming (calling STREAMOFF on it would
> +    trigger a seek).
> +    Until STREAMOFF is called on the CAPTURE queue (acknowledging
> +    the event), the driver operates as if the resolution hasn’t
> +    changed yet, i.e. :c:func:`VIDIOC_G_FMT`, etc. return previous
> +    resolution.
> +
> +4.  The client frees the buffers on the CAPTURE queue using
> +    :c:func:`VIDIOC_REQBUFS`.
> +
> +    a. Required fields:
> +
> +       i.   count = 0
> +
> +       ii.  type = CAPTURE
> +
> +       iii. memory = as per spec
> +
> +5.  The client calls :c:func:`VIDIOC_G_FMT` for CAPTURE to get the new format
> +    information.
> +    This is identical to calling :c:func:`VIDIOC_G_FMT` after
> +    ``V4L2_EVENT_SRC_CH_RESOLUTION`` in the initialization
> +    sequence and should be handled similarly.
> +
> +    .. note::
> +
> +       It is allowed for the driver not to support the same
> +       pixelformat as previously used (before the resolution change)
> +       for the new resolution. The driver must select a default
> +       supported pixelformat and return it from :c:func:`VIDIOC_G_FMT`, and
> +       client must take note of it.
> +
> +6.  (optional) The client is allowed to enumerate available formats and
> +    select a different one than currently chosen (returned via
> +    :c:func:`VIDIOC_G_FMT)`. This is identical to a corresponding step in
> +    the initialization sequence.
> +
> +7.  (optional) The client acquires visible resolution as in
> +    initialization sequence.
> +
> +8.  (optional) The client acquires minimum number of buffers as in
> +    initialization sequence.
> +
> +9.  The client allocates a new set of buffers for the CAPTURE queue via
> +    :c:func:`VIDIOC_REQBUFS`. This is identical to a corresponding step in
> +    the initialization sequence.
> +
> +10. The client resumes decoding by issuing :c:func:`VIDIOC_STREAMON` on the
> +    CAPTURE queue.
> +
> +During the resolution change sequence, the OUTPUT queue must remain
> +streaming. Calling :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue will initiate seek.
> +
> +The OUTPUT queue operates separately from the CAPTURE queue for the
> +duration of the entire resolution change sequence. It is allowed (and
> +recommended for best performance and simplcity) for the client to keep
> +queuing/dequeuing buffers from/to OUTPUT queue even while processing
> +this sequence.
> +
> +.. note::
> +
> +   It is also possible for this sequence to be triggered without
> +   change in resolution if a different number of CAPTURE buffers is
> +   required in order to continue decoding the stream.
> +
> +Flush
> +-----
> +
> +Flush is the process of draining the CAPTURE queue of any remaining
> +buffers. After the flush sequence is complete, the client has received
> +all decoded frames for all OUTPUT buffers queued before the sequence was
> +started.
> +
> +1. Begin flush by issuing :c:func:`VIDIOC_DECODER_CMD`.
> +
> +   a. Required fields:
> +
> +      i. cmd = ``V4L2_DEC_CMD_STOP``
> +
> +2. The driver must process and decode as normal all OUTPUT buffers
> +   queued by the client before the :c:func:`VIDIOC_DECODER_CMD` was
> +   issued.
> +   Any operations triggered as a result of processing these
> +   buffers (including the initialization and resolution change
> +   sequences) must be processed as normal by both the driver and
> +   the client before proceeding with the flush sequence.
> +
> +3. Once all OUTPUT buffers queued before ``V4L2_DEC_CMD_STOP`` are
> +   processed:
> +
> +   a. If the CAPTURE queue is streaming, once all decoded frames (if
> +      any) are ready to be dequeued on the CAPTURE queue, the
> +      driver must send a ``V4L2_EVENT_EOS``. The driver must also
> +      set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer` ``flags`` field on the
> +      buffer on the CAPTURE queue containing the last frame (if
> +      any) produced as a result of processing the OUTPUT buffers
> +      queued before ``V4L2_DEC_CMD_STOP``. If no more frames are
> +      left to be returned at the point of handling
> +      ``V4L2_DEC_CMD_STOP``, the driver must return an empty buffer

Sorry to come late, I didn't notice this detail before. Why do we need
this empty buffer special case here ? Why can't we unblock the queue
with -EPIPE, which is an already a supported special case ? This could
even be handled by the m2m framework.

> +      (with :c:type:`v4l2_buffer` ``bytesused`` = 0) as the last buffer with
> +      ``V4L2_BUF_FLAG_LAST`` set instead.
> +      Any attempts to dequeue more buffers beyond the buffer
> +      marked with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE
> +      error from :c:func:`VIDIOC_DQBUF`.
> +
> +   b. If the CAPTURE queue is NOT streaming, no action is necessary for
> +      CAPTURE queue and the driver must send a ``V4L2_EVENT_EOS``
> +      immediately after all OUTPUT buffers in question have been
> +      processed.
> +
> +4. To resume, client may issue ``V4L2_DEC_CMD_START``.
> +
> +End of stream
> +-------------
> +
> +When an explicit end of stream is encountered by the driver in the
> +stream, it must send a ``V4L2_EVENT_EOS`` to the client after all frames
> +are decoded and ready to be dequeued on the CAPTURE queue, with the
> +:c:type:`v4l2_buffer` ``flags`` set to ``V4L2_BUF_FLAG_LAST``. This behavior is
> +identical to the flush sequence as if triggered by the client via
> +``V4L2_DEC_CMD_STOP``.
> +
> +Commit points
> +-------------
> +
> +Setting formats and allocating buffers triggers changes in the behavior
> +of the driver.
> +
> +1. Setting format on OUTPUT queue may change the set of formats
> +   supported/advertised on the CAPTURE queue. It also must change
> +   the format currently selected on CAPTURE queue if it is not
> +   supported by the newly selected OUTPUT format to a supported one.
> +
> +2. Enumerating formats on CAPTURE queue must only return CAPTURE formats
> +   supported for the OUTPUT format currently set.
> +
> +3. Setting/changing format on CAPTURE queue does not change formats
> +   available on OUTPUT queue. An attempt to set CAPTURE format that
> +   is not supported for the currently selected OUTPUT format must
> +   result in an error (-EINVAL) from :c:func:`VIDIOC_S_FMT`.
> +
> +4. Enumerating formats on OUTPUT queue always returns a full set of
> +   supported formats, irrespective of the current format selected on
> +   CAPTURE queue.
> +
> +5. After allocating buffers on the OUTPUT queue, it is not possible to
> +   change format on it.
> +
> +To summarize, setting formats and allocation must always start with the
> +OUTPUT queue and the OUTPUT queue is the master that governs the set of
> +supported formats for the CAPTURE queue.
> diff --git a/Documentation/media/uapi/v4l/v4l2.rst b/Documentation/media/uapi/v4l/v4l2.rst
> index b89e5621ae69..563d5b861d1c 100644
> --- a/Documentation/media/uapi/v4l/v4l2.rst
> +++ b/Documentation/media/uapi/v4l/v4l2.rst
> @@ -53,6 +53,10 @@ Authors, in alphabetical order:
>  
>    - Original author of the V4L2 API and documentation.
>  
> +- Figa, Tomasz <tfiga@chromium.org>
> +
> +  - Documented parts of the V4L2 (stateful) Codec Interface. Migrated from Google Docs to kernel documentation.
> +
>  - H Schimek, Michael <mschimek@gmx.at>
>  
>    - Original author of the V4L2 API and documentation.
> @@ -65,6 +69,10 @@ Authors, in alphabetical order:
>  
>    - Designed and documented the multi-planar API.
>  
> +- Osciak, Pawel <posciak@chromium.org>
> +
> +  - Documented the V4L2 (stateful) Codec Interface.
> +
>  - Palosaari, Antti <crope@iki.fi>
>  
>    - SDR API.
> @@ -85,7 +93,7 @@ Authors, in alphabetical order:
>  
>    - Designed and documented the VIDIOC_LOG_STATUS ioctl, the extended control ioctls, major parts of the sliced VBI API, the MPEG encoder and decoder APIs and the DV Timings API.
>  
> -**Copyright** |copy| 1999-2016: Bill Dirks, Michael H. Schimek, Hans Verkuil, Martin Rubli, Andy Walls, Muralidharan Karicheri, Mauro Carvalho Chehab, Pawel Osciak, Sakari Ailus & Antti Palosaari.
> +**Copyright** |copy| 1999-2018: Bill Dirks, Michael H. Schimek, Hans Verkuil, Martin Rubli, Andy Walls, Muralidharan Karicheri, Mauro Carvalho Chehab, Pawel Osciak, Sakari Ailus & Antti Palosaari, Tomasz Figa.
>  
>  Except when explicitly stated as GPL, programming examples within this
>  part can be used and distributed without restrictions.
> @@ -94,6 +102,10 @@ part can be used and distributed without restrictions.
>  Revision History
>  ****************
>  
> +:revision: TBD / TBD (*tf*)
> +
> +Add specification of V4L2 Codec Interface UAPI.
> +
>  :revision: 4.10 / 2016-07-15 (*rr*)
>  
>  Introduce HSV formats.
Tomasz Figa Sept. 11, 2018, 3:10 a.m. UTC | #24
On Tue, Sep 11, 2018 at 11:27 AM Nicolas Dufresne <nicolas@ndufresne.ca> wrote:
>
> Le mardi 05 juin 2018 à 19:33 +0900, Tomasz Figa a écrit :
> > Due to complexity of the video decoding process, the V4L2 drivers of
> > stateful decoder hardware require specific sequencies of V4L2 API calls
> > to be followed. These include capability enumeration, initialization,
> > decoding, seek, pause, dynamic resolution change, flush and end of
> > stream.
> >
> > Specifics of the above have been discussed during Media Workshops at
> > LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> > Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> > originated at those events was later implemented by the drivers we already
> > have merged in mainline, such as s5p-mfc or mtk-vcodec.
> >
> > The only thing missing was the real specification included as a part of
> > Linux Media documentation. Fix it now and document the decoder part of
> > the Codec API.
> >
> > Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> > ---
> >  Documentation/media/uapi/v4l/dev-codec.rst | 771 +++++++++++++++++++++
> >  Documentation/media/uapi/v4l/v4l2.rst      |  14 +-
> >  2 files changed, 784 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> > index c61e938bd8dc..0483b10c205e 100644
> > --- a/Documentation/media/uapi/v4l/dev-codec.rst
> > +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> > @@ -34,3 +34,774 @@ the codec and reprogram it whenever another file handler gets access.
> >  This is different from the usual video node behavior where the video
> >  properties are global to the device (i.e. changing something through one
> >  file handle is visible through another file handle).
> > +
> > +This interface is generally appropriate for hardware that does not
> > +require additional software involvement to parse/partially decode/manage
> > +the stream before/after processing in hardware.
> > +
> > +Input data to the Stream API are buffers containing unprocessed video
> > +stream (Annex-B H264/H265 stream, raw VP8/9 stream) only. The driver is
> > +expected not to require any additional information from the client to
> > +process these buffers, and to return decoded frames on the CAPTURE queue
> > +in display order.
> > +
> > +Performing software parsing, processing etc. of the stream in the driver
> > +in order to support stream API is strongly discouraged. In such case use
> > +of Stateless Codec Interface (in development) is preferred.
> > +
> > +Conventions and notation used in this document
> > +==============================================
> > +
> > +1. The general V4L2 API rules apply if not specified in this document
> > +   otherwise.
> > +
> > +2. The meaning of words “must”, “may”, “should”, etc. is as per RFC
> > +   2119.
> > +
> > +3. All steps not marked “optional” are required.
> > +
> > +4. :c:func:`VIDIOC_G_EXT_CTRLS`, :c:func:`VIDIOC_S_EXT_CTRLS` may be used interchangeably with
> > +   :c:func:`VIDIOC_G_CTRL`, :c:func:`VIDIOC_S_CTRL`, unless specified otherwise.
> > +
> > +5. Single-plane API (see spec) and applicable structures may be used
> > +   interchangeably with Multi-plane API, unless specified otherwise.
> > +
> > +6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
> > +   [0..2]: i = 0, 1, 2.
> > +
> > +7. For OUTPUT buffer A, A’ represents a buffer on the CAPTURE queue
> > +   containing data (decoded or encoded frame/stream) that resulted
> > +   from processing buffer A.
> > +
> > +Glossary
> > +========
> > +
> > +CAPTURE
> > +   the destination buffer queue, decoded frames for
> > +   decoders, encoded bitstream for encoders;
> > +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE`` or
> > +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``
> > +
> > +client
> > +   application client communicating with the driver
> > +   implementing this API
> > +
> > +coded format
> > +   encoded/compressed video bitstream format (e.g.
> > +   H.264, VP8, etc.); see raw format; this is not equivalent to fourcc
> > +   (V4L2 pixelformat), as each coded format may be supported by multiple
> > +   fourccs (e.g. ``V4L2_PIX_FMT_H264``, ``V4L2_PIX_FMT_H264_SLICE``, etc.)
> > +
> > +coded height
> > +   height for given coded resolution
> > +
> > +coded resolution
> > +   stream resolution in pixels aligned to codec
> > +   format and hardware requirements; see also visible resolution
> > +
> > +coded width
> > +   width for given coded resolution
> > +
> > +decode order
> > +   the order in which frames are decoded; may differ
> > +   from display (output) order if frame reordering (B frames) is active in
> > +   the stream; OUTPUT buffers must be queued in decode order; for frame
> > +   API, CAPTURE buffers must be returned by the driver in decode order;
> > +
> > +display order
> > +   the order in which frames must be displayed
> > +   (outputted); for stream API, CAPTURE buffers must be returned by the
> > +   driver in display order;
> > +
> > +EOS
> > +   end of stream
> > +
> > +input height
> > +   height in pixels for given input resolution
> > +
> > +input resolution
> > +   resolution in pixels of source frames being input
> > +   to the encoder and subject to further cropping to the bounds of visible
> > +   resolution
> > +
> > +input width
> > +   width in pixels for given input resolution
> > +
> > +OUTPUT
> > +   the source buffer queue, encoded bitstream for
> > +   decoders, raw frames for encoders; ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
> > +   ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``
> > +
> > +raw format
> > +   uncompressed format containing raw pixel data (e.g.
> > +   YUV, RGB formats)
> > +
> > +resume point
> > +   a point in the bitstream from which decoding may
> > +   start/continue, without any previous state/data present, e.g.: a
> > +   keyframe (VPX) or SPS/PPS/IDR sequence (H.264); a resume point is
> > +   required to start decode of a new stream, or to resume decoding after a
> > +   seek;
> > +
> > +source buffer
> > +   buffers allocated for source queue
> > +
> > +source queue
> > +   queue containing buffers used for source data, i.e.
> > +
> > +visible height
> > +   height for given visible resolution
> > +
> > +visible resolution
> > +   stream resolution of the visible picture, in
> > +   pixels, to be used for display purposes; must be smaller or equal to
> > +   coded resolution;
> > +
> > +visible width
> > +   width for given visible resolution
> > +
> > +Decoder
> > +=======
> > +
> > +Querying capabilities
> > +---------------------
> > +
> > +1. To enumerate the set of coded formats supported by the driver, the
> > +   client uses :c:func:`VIDIOC_ENUM_FMT` for OUTPUT. The driver must always
> > +   return the full set of supported formats, irrespective of the
> > +   format set on the CAPTURE queue.
> > +
> > +2. To enumerate the set of supported raw formats, the client uses
> > +   :c:func:`VIDIOC_ENUM_FMT` for CAPTURE. The driver must return only the
> > +   formats supported for the format currently set on the OUTPUT
> > +   queue.
> > +   In order to enumerate raw formats supported by a given coded
> > +   format, the client must first set that coded format on the
> > +   OUTPUT queue and then enumerate the CAPTURE queue.
> > +
> > +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> > +   resolutions for a given format, passing its fourcc in
> > +   :c:type:`v4l2_frmivalenum` ``pixel_format``.
> > +
> > +   a. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for coded formats
> > +      must be maximums for given coded format for all supported raw
> > +      formats.
> > +
> > +   b. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for raw formats must
> > +      be maximums for given raw format for all supported coded
> > +      formats.
> > +
> > +   c. The client should derive the supported resolution for a
> > +      combination of coded+raw format by calculating the
> > +      intersection of resolutions returned from calls to
> > +      :c:func:`VIDIOC_ENUM_FRAMESIZES` for the given coded and raw formats.
> > +
> > +4. Supported profiles and levels for given format, if applicable, may be
> > +   queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> > +
> > +5. The client may use :c:func:`VIDIOC_ENUM_FRAMEINTERVALS` to enumerate maximum
> > +   supported framerates by the driver/hardware for a given
> > +   format+resolution combination.
> > +
> > +Initialization sequence
> > +-----------------------
> > +
> > +1. (optional) Enumerate supported OUTPUT formats and resolutions. See
> > +   capability enumeration.
> > +
> > +2. Set a coded format on the source queue via :c:func:`VIDIOC_S_FMT`
> > +
> > +   a. Required fields:
> > +
> > +      i.   type = OUTPUT
> > +
> > +      ii.  fmt.pix_mp.pixelformat set to a coded format
> > +
> > +      iii. fmt.pix_mp.width, fmt.pix_mp.height only if cannot be
> > +           parsed from the stream for the given coded format;
> > +           ignored otherwise;
> > +
> > +   b. Return values:
> > +
> > +      i.  EINVAL: unsupported format.
> > +
> > +      ii. Others: per spec
> > +
> > +   .. note::
> > +
> > +      The driver must not adjust pixelformat, so if
> > +      ``V4L2_PIX_FMT_H264`` is passed but only
> > +      ``V4L2_PIX_FMT_H264_SLICE`` is supported, S_FMT will return
> > +      -EINVAL. If both are acceptable by client, calling S_FMT for
> > +      the other after one gets rejected may be required (or use
> > +      :c:func:`VIDIOC_ENUM_FMT` to discover beforehand, see Capability
> > +      enumeration).
> > +
> > +3.  (optional) Get minimum number of buffers required for OUTPUT queue
> > +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> > +    more buffers than minimum required by hardware/format (see
> > +    allocation).
> > +
> > +    a. Required fields:
> > +
> > +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> > +
> > +    b. Return values: per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i. value: required number of OUTPUT buffers for the currently set
> > +          format;
> > +
> > +4.  Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on OUTPUT
> > +    queue.
> > +
> > +    a. Required fields:
> > +
> > +       i.   count = n, where n > 0.
> > +
> > +       ii.  type = OUTPUT
> > +
> > +       iii. memory = as per spec
> > +
> > +    b. Return values: Per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i. count: adjusted to allocated number of buffers
> > +
> > +    d. The driver must adjust count to minimum of required number of
> > +       source buffers for given format and count passed. The client
> > +       must check this value after the ioctl returns to get the
> > +       number of buffers allocated.
> > +
> > +    .. note::
> > +
> > +       Passing count = 1 is useful for letting the driver choose
> > +       the minimum according to the selected format/hardware
> > +       requirements.
> > +
> > +    .. note::
> > +
> > +       To allocate more than minimum number of buffers (for pipeline
> > +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT)`` to
> > +       get minimum number of buffers required by the driver/format,
> > +       and pass the obtained value plus the number of additional
> > +       buffers needed in count to :c:func:`VIDIOC_REQBUFS`.
> > +
> > +5.  Begin parsing the stream for stream metadata via :c:func:`VIDIOC_STREAMON` on
> > +    OUTPUT queue. This step allows the driver to parse/decode
> > +    initial stream metadata until enough information to allocate
> > +    CAPTURE buffers is found. This is indicated by the driver by
> > +    sending a ``V4L2_EVENT_SOURCE_CHANGE`` event, which the client
> > +    must handle.
> > +
> > +    a. Required fields: as per spec.
> > +
> > +    b. Return values: as per spec.
> > +
> > +    .. note::
> > +
> > +       Calling :c:func:`VIDIOC_REQBUFS`, :c:func:`VIDIOC_STREAMON`
> > +       or :c:func:`VIDIOC_G_FMT` on the CAPTURE queue at this time is not
> > +       allowed and must return EINVAL.
> > +
> > +6.  This step only applies for coded formats that contain resolution
> > +    information in the stream.
> > +    Continue queuing/dequeuing bitstream buffers to/from the
> > +    OUTPUT queue via :c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`. The driver
> > +    must keep processing and returning each buffer to the client
> > +    until required metadata to send a ``V4L2_EVENT_SOURCE_CHANGE``
> > +    for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION`` is
> > +    found. There is no requirement to pass enough data for this to
> > +    occur in the first buffer and the driver must be able to
> > +    process any number
> > +
> > +    a. Required fields: as per spec.
> > +
> > +    b. Return values: as per spec.
> > +
> > +    c. If data in a buffer that triggers the event is required to decode
> > +       the first frame, the driver must not return it to the client,
> > +       but must retain it for further decoding.
> > +
> > +    d. Until the resolution source event is sent to the client, calling
> > +       :c:func:`VIDIOC_G_FMT` on the CAPTURE queue must return -EINVAL.
> > +
> > +    .. note::
> > +
> > +       No decoded frames are produced during this phase.
> > +
> > +7.  This step only applies for coded formats that contain resolution
> > +    information in the stream.
> > +    Receive and handle ``V4L2_EVENT_SOURCE_CHANGE`` from the driver
> > +    via :c:func:`VIDIOC_DQEVENT`. The driver must send this event once
> > +    enough data is obtained from the stream to allocate CAPTURE
> > +    buffers and to begin producing decoded frames.
> > +
> > +    a. Required fields:
> > +
> > +       i. type = ``V4L2_EVENT_SOURCE_CHANGE``
> > +
> > +    b. Return values: as per spec.
> > +
> > +    c. The driver must return u.src_change.changes =
> > +       ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> > +
> > +8.  This step only applies for coded formats that contain resolution
> > +    information in the stream.
> > +    Call :c:func:`VIDIOC_G_FMT` for CAPTURE queue to get format for the
> > +    destination buffers parsed/decoded from the bitstream.
> > +
> > +    a. Required fields:
> > +
> > +       i. type = CAPTURE
> > +
> > +    b. Return values: as per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i.   fmt.pix_mp.width, fmt.pix_mp.height: coded resolution
> > +            for the decoded frames
> > +
> > +       ii.  fmt.pix_mp.pixelformat: default/required/preferred by
> > +            driver pixelformat for decoded frames.
> > +
> > +       iii. num_planes: set to number of planes for pixelformat.
> > +
> > +       iv.  For each plane p = [0, num_planes-1]:
> > +            plane_fmt[p].sizeimage, plane_fmt[p].bytesperline as
> > +            per spec for coded resolution.
> > +
> > +    .. note::
> > +
> > +       Te value of pixelformat may be any pixel format supported,
> > +       and must
> > +       be supported for current stream, based on the information
> > +       parsed from the stream and hardware capabilities. It is
> > +       suggested that driver chooses the preferred/optimal format
> > +       for given configuration. For example, a YUV format may be
> > +       preferred over an RGB format, if additional conversion step
> > +       would be required.
> > +
> > +9.  (optional) Enumerate CAPTURE formats via :c:func:`VIDIOC_ENUM_FMT` on
> > +    CAPTURE queue.
> > +    Once the stream information is parsed and known, the client
> > +    may use this ioctl to discover which raw formats are supported
> > +    for given stream and select on of them via :c:func:`VIDIOC_S_FMT`.
> > +
> > +    a. Fields/return values as per spec.
> > +
> > +    .. note::
> > +
> > +       The driver must return only formats supported for the
> > +       current stream parsed in this initialization sequence, even
> > +       if more formats may be supported by the driver in general.
> > +       For example, a driver/hardware may support YUV and RGB
> > +       formats for resolutions 1920x1088 and lower, but only YUV for
> > +       higher resolutions (e.g. due to memory bandwidth
> > +       limitations). After parsing a resolution of 1920x1088 or
> > +       lower, :c:func:`VIDIOC_ENUM_FMT` may return a set of YUV and RGB
> > +       pixelformats, but after parsing resolution higher than
> > +       1920x1088, the driver must not return (unsupported for this
> > +       resolution) RGB.
> > +
> > +       However, subsequent resolution change event
> > +       triggered after discovering a resolution change within the
> > +       same stream may switch the stream into a lower resolution;
> > +       :c:func:`VIDIOC_ENUM_FMT` must return RGB formats again in that case.
> > +
> > +10.  (optional) Choose a different CAPTURE format than suggested via
> > +     :c:func:`VIDIOC_S_FMT` on CAPTURE queue. It is possible for the client
> > +     to choose a different format than selected/suggested by the
> > +     driver in :c:func:`VIDIOC_G_FMT`.
> > +
> > +     a. Required fields:
> > +
> > +        i.  type = CAPTURE
> > +
> > +        ii. fmt.pix_mp.pixelformat set to a coded format
> > +
> > +     b. Return values:
> > +
> > +        i. EINVAL: unsupported format.
> > +
> > +     c. Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available formats
> > +        after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to find
> > +        out a set of allowed pixelformats for given configuration,
> > +        but not required.
> > +
> > +11.  (optional) Acquire visible resolution via :c:func:`VIDIOC_G_SELECTION`.
> > +
> > +    a. Required fields:
> > +
> > +       i.  type = CAPTURE
> > +
> > +       ii. target = ``V4L2_SEL_TGT_CROP``
> > +
> > +    b. Return values: per spec.
> > +
> > +    c. Return fields
> > +
> > +       i. r.left, r.top, r.width, r.height: visible rectangle; this must
> > +          fit within coded resolution returned from :c:func:`VIDIOC_G_FMT`.
> > +
> > +12. (optional) Get minimum number of buffers required for CAPTURE queue
> > +    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
> > +    more buffers than minimum required by hardware/format (see
> > +    allocation).
> > +
> > +    a. Required fields:
> > +
> > +       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
> > +
> > +    b. Return values: per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i. value: minimum number of buffers required to decode the stream
> > +          parsed in this initialization sequence.
> > +
> > +    .. note::
> > +
> > +       Note that the minimum number of buffers must be at least the
> > +       number required to successfully decode the current stream.
> > +       This may for example be the required DPB size for an H.264
> > +       stream given the parsed stream configuration (resolution,
> > +       level).
> > +
> > +13. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS` on the
> > +    CAPTURE queue.
> > +
> > +    a. Required fields:
> > +
> > +       i.   count = n, where n > 0.
> > +
> > +       ii.  type = CAPTURE
> > +
> > +       iii. memory = as per spec
> > +
> > +    b. Return values: Per spec.
> > +
> > +    c. Return fields:
> > +
> > +       i. count: adjusted to allocated number of buffers.
> > +
> > +    d. The driver must adjust count to minimum of required number of
> > +       destination buffers for given format and stream configuration
> > +       and the count passed. The client must check this value after
> > +       the ioctl returns to get the number of buffers allocated.
> > +
> > +    .. note::
> > +
> > +       Passing count = 1 is useful for letting the driver choose
> > +       the minimum.
> > +
> > +    .. note::
> > +
> > +       To allocate more than minimum number of buffers (for pipeline
> > +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE)`` to
> > +       get minimum number of buffers required, and pass the obtained
> > +       value plus the number of additional buffers needed in count
> > +       to :c:func:`VIDIOC_REQBUFS`.
> > +
> > +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
> > +
> > +    a. Required fields: as per spec.
> > +
> > +    b. Return values: as per spec.
> > +
> > +Decoding
> > +--------
> > +
> > +This state is reached after a successful initialization sequence. In
> > +this state, client queues and dequeues buffers to both queues via
> > +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, as per spec.
> > +
> > +Both queues operate independently. The client may queue and dequeue
> > +buffers to queues in any order and at any rate, also at a rate different
> > +for each queue. The client may queue buffers within the same queue in
> > +any order (V4L2 index-wise). It is recommended for the client to operate
> > +the queues independently for best performance.
> > +
> > +Source OUTPUT buffers must contain:
> > +
> > +-  H.264/AVC: one or more complete NALUs of an Annex B elementary
> > +   stream; one buffer does not have to contain enough data to decode
> > +   a frame;
> > +
> > +-  VP8/VP9: one or more complete frames.
> > +
> > +No direct relationship between source and destination buffers and the
> > +timing of buffers becoming available to dequeue should be assumed in the
> > +Stream API. Specifically:
> > +
> > +-  a buffer queued to OUTPUT queue may result in no buffers being
> > +   produced on the CAPTURE queue (e.g. if it does not contain
> > +   encoded data, or if only metadata syntax structures are present
> > +   in it), or one or more buffers produced on the CAPTURE queue (if
> > +   the encoded data contained more than one frame, or if returning a
> > +   decoded frame allowed the driver to return a frame that preceded
> > +   it in decode, but succeeded it in display order)
> > +
> > +-  a buffer queued to OUTPUT may result in a buffer being produced on
> > +   the CAPTURE queue later into decode process, and/or after
> > +   processing further OUTPUT buffers, or be returned out of order,
> > +   e.g. if display reordering is used
> > +
> > +-  buffers may become available on the CAPTURE queue without additional
> > +   buffers queued to OUTPUT (e.g. during flush or EOS)
> > +
> > +Seek
> > +----
> > +
> > +Seek is controlled by the OUTPUT queue, as it is the source of bitstream
> > +data. CAPTURE queue remains unchanged/unaffected.
> > +
> > +1. Stop the OUTPUT queue to begin the seek sequence via
> > +   :c:func:`VIDIOC_STREAMOFF`.
> > +
> > +   a. Required fields:
> > +
> > +      i. type = OUTPUT
> > +
> > +   b. The driver must drop all the pending OUTPUT buffers and they are
> > +      treated as returned to the client (as per spec).
> > +
> > +2. Restart the OUTPUT queue via :c:func:`VIDIOC_STREAMON`
> > +
> > +   a. Required fields:
> > +
> > +      i. type = OUTPUT
> > +
> > +   b. The driver must be put in a state after seek and be ready to
> > +      accept new source bitstream buffers.
> > +
> > +3. Start queuing buffers to OUTPUT queue containing stream data after
> > +   the seek until a suitable resume point is found.
> > +
> > +   .. note::
> > +
> > +      There is no requirement to begin queuing stream
> > +      starting exactly from a resume point (e.g. SPS or a keyframe).
> > +      The driver must handle any data queued and must keep processing
> > +      the queued buffers until it finds a suitable resume point.
> > +      While looking for a resume point, the driver processes OUTPUT
> > +      buffers and returns them to the client without producing any
> > +      decoded frames.
> > +
> > +4. After a resume point is found, the driver will start returning
> > +   CAPTURE buffers with decoded frames.
> > +
> > +   .. note::
> > +
> > +      There is no precise specification for CAPTURE queue of when it
> > +      will start producing buffers containing decoded data from
> > +      buffers queued after the seek, as it operates independently
> > +      from OUTPUT queue.
> > +
> > +      -  The driver is allowed to and may return a number of remaining CAPTURE
> > +         buffers containing decoded frames from before the seek after the
> > +         seek sequence (STREAMOFF-STREAMON) is performed.
> > +
> > +      -  The driver is also allowed to and may not return all decoded frames
> > +         queued but not decode before the seek sequence was initiated.
> > +         E.g. for an OUTPUT queue sequence: QBUF(A), QBUF(B),
> > +         STREAMOFF(OUT), STREAMON(OUT), QBUF(G), QBUF(H), any of the
> > +         following results on the CAPTURE queue is allowed: {A’, B’, G’,
> > +         H’}, {A’, G’, H’}, {G’, H’}.
> > +
> > +Pause
> > +-----
> > +
> > +In order to pause, the client should just cease queuing buffers onto the
> > +OUTPUT queue. This is different from the general V4L2 API definition of
> > +pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue. Without
> > +source bitstream data, there is not data to process and the hardware
> > +remains idle. Conversely, using :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue
> > +indicates a seek, which 1) drops all buffers in flight and 2) after a
> > +subsequent :c:func:`VIDIOC_STREAMON` will look for and only continue from a
> > +resume point. This is usually undesirable for pause. The
> > +STREAMOFF-STREAMON sequence is intended for seeking.
> > +
> > +Similarly, CAPTURE queue should remain streaming as well, as the
> > +STREAMOFF-STREAMON sequence on it is intended solely for changing buffer
> > +sets
> > +
> > +Dynamic resolution change
> > +-------------------------
> > +
> > +When driver encounters a resolution change in the stream, the dynamic
> > +resolution change sequence is started.
> > +
> > +1.  On encountering a resolution change in the stream. The driver must
> > +    first process and decode all remaining buffers from before the
> > +    resolution change point.
> > +
> > +2.  After all buffers containing decoded frames from before the
> > +    resolution change point are ready to be dequeued on the
> > +    CAPTURE queue, the driver sends a ``V4L2_EVENT_SOURCE_CHANGE``
> > +    event for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> > +    The last buffer from before the change must be marked with
> > +    :c:type:`v4l2_buffer` ``flags`` flag ``V4L2_BUF_FLAG_LAST`` as in the flush
> > +    sequence.
> > +
> > +    .. note::
> > +
> > +       Any attempts to dequeue more buffers beyond the buffer marked
> > +       with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
> > +       :c:func:`VIDIOC_DQBUF`.
> > +
> > +3.  After dequeuing all remaining buffers from the CAPTURE queue, the
> > +    client must call :c:func:`VIDIOC_STREAMOFF` on the CAPTURE queue. The
> > +    OUTPUT queue remains streaming (calling STREAMOFF on it would
> > +    trigger a seek).
> > +    Until STREAMOFF is called on the CAPTURE queue (acknowledging
> > +    the event), the driver operates as if the resolution hasn’t
> > +    changed yet, i.e. :c:func:`VIDIOC_G_FMT`, etc. return previous
> > +    resolution.
> > +
> > +4.  The client frees the buffers on the CAPTURE queue using
> > +    :c:func:`VIDIOC_REQBUFS`.
> > +
> > +    a. Required fields:
> > +
> > +       i.   count = 0
> > +
> > +       ii.  type = CAPTURE
> > +
> > +       iii. memory = as per spec
> > +
> > +5.  The client calls :c:func:`VIDIOC_G_FMT` for CAPTURE to get the new format
> > +    information.
> > +    This is identical to calling :c:func:`VIDIOC_G_FMT` after
> > +    ``V4L2_EVENT_SRC_CH_RESOLUTION`` in the initialization
> > +    sequence and should be handled similarly.
> > +
> > +    .. note::
> > +
> > +       It is allowed for the driver not to support the same
> > +       pixelformat as previously used (before the resolution change)
> > +       for the new resolution. The driver must select a default
> > +       supported pixelformat and return it from :c:func:`VIDIOC_G_FMT`, and
> > +       client must take note of it.
> > +
> > +6.  (optional) The client is allowed to enumerate available formats and
> > +    select a different one than currently chosen (returned via
> > +    :c:func:`VIDIOC_G_FMT)`. This is identical to a corresponding step in
> > +    the initialization sequence.
> > +
> > +7.  (optional) The client acquires visible resolution as in
> > +    initialization sequence.
> > +
> > +8.  (optional) The client acquires minimum number of buffers as in
> > +    initialization sequence.
> > +
> > +9.  The client allocates a new set of buffers for the CAPTURE queue via
> > +    :c:func:`VIDIOC_REQBUFS`. This is identical to a corresponding step in
> > +    the initialization sequence.
> > +
> > +10. The client resumes decoding by issuing :c:func:`VIDIOC_STREAMON` on the
> > +    CAPTURE queue.
> > +
> > +During the resolution change sequence, the OUTPUT queue must remain
> > +streaming. Calling :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue will initiate seek.
> > +
> > +The OUTPUT queue operates separately from the CAPTURE queue for the
> > +duration of the entire resolution change sequence. It is allowed (and
> > +recommended for best performance and simplcity) for the client to keep
> > +queuing/dequeuing buffers from/to OUTPUT queue even while processing
> > +this sequence.
> > +
> > +.. note::
> > +
> > +   It is also possible for this sequence to be triggered without
> > +   change in resolution if a different number of CAPTURE buffers is
> > +   required in order to continue decoding the stream.
> > +
> > +Flush
> > +-----
> > +
> > +Flush is the process of draining the CAPTURE queue of any remaining
> > +buffers. After the flush sequence is complete, the client has received
> > +all decoded frames for all OUTPUT buffers queued before the sequence was
> > +started.
> > +
> > +1. Begin flush by issuing :c:func:`VIDIOC_DECODER_CMD`.
> > +
> > +   a. Required fields:
> > +
> > +      i. cmd = ``V4L2_DEC_CMD_STOP``
> > +
> > +2. The driver must process and decode as normal all OUTPUT buffers
> > +   queued by the client before the :c:func:`VIDIOC_DECODER_CMD` was
> > +   issued.
> > +   Any operations triggered as a result of processing these
> > +   buffers (including the initialization and resolution change
> > +   sequences) must be processed as normal by both the driver and
> > +   the client before proceeding with the flush sequence.
> > +
> > +3. Once all OUTPUT buffers queued before ``V4L2_DEC_CMD_STOP`` are
> > +   processed:
> > +
> > +   a. If the CAPTURE queue is streaming, once all decoded frames (if
> > +      any) are ready to be dequeued on the CAPTURE queue, the
> > +      driver must send a ``V4L2_EVENT_EOS``. The driver must also
> > +      set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer` ``flags`` field on the
> > +      buffer on the CAPTURE queue containing the last frame (if
> > +      any) produced as a result of processing the OUTPUT buffers
> > +      queued before ``V4L2_DEC_CMD_STOP``. If no more frames are
> > +      left to be returned at the point of handling
> > +      ``V4L2_DEC_CMD_STOP``, the driver must return an empty buffer
>
> Sorry to come late, I didn't notice this detail before. Why do we need
> this empty buffer special case here ? Why can't we unblock the queue
> with -EPIPE, which is an already a supported special case ? This could
> even be handled by the m2m framework.

I feel like that would be _at_least_ inconsistent, because sometimes
DQBUF would return a buffer with V4L2_BUF_FLAG_LAST and sometimes it
would fail with -EPIPE. If we want to change this to -EPIPE, then it
would probably make much more sense to just have the userspace always
detect the last buffer by -EPIPE, without caring about
V4L2_BUF_FLAG_LAST.

Still, this empty buffer IMHO simplifies both userspace and driver
implementation, since the former can just dequeue buffers until
V4L2_BUF_FLAG_LAST is found, while the latter doesn't need to do
tricky synchronization dances to mark the last in-flight buffer,
possibly already being processed by hardware, as last, since it can
just return next one empty. I wouldn't call it a special case, since
it actually unifies the handling.

Note that -EPIPE is already handled by vb2 and it's triggered by
dequeuing a buffer with V4L2_BUF_FLAG_LAST. If the driver doesn't have
any further data to return to userspace and it couldn't return an
empty buffer, the -EPIPE mechanism would never trigger.

Best regards,
Tomasz
diff mbox

Patch

diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
index c61e938bd8dc..0483b10c205e 100644
--- a/Documentation/media/uapi/v4l/dev-codec.rst
+++ b/Documentation/media/uapi/v4l/dev-codec.rst
@@ -34,3 +34,774 @@  the codec and reprogram it whenever another file handler gets access.
 This is different from the usual video node behavior where the video
 properties are global to the device (i.e. changing something through one
 file handle is visible through another file handle).
+
+This interface is generally appropriate for hardware that does not
+require additional software involvement to parse/partially decode/manage
+the stream before/after processing in hardware.
+
+Input data to the Stream API are buffers containing unprocessed video
+stream (Annex-B H264/H265 stream, raw VP8/9 stream) only. The driver is
+expected not to require any additional information from the client to
+process these buffers, and to return decoded frames on the CAPTURE queue
+in display order.
+
+Performing software parsing, processing etc. of the stream in the driver
+in order to support stream API is strongly discouraged. In such case use
+of Stateless Codec Interface (in development) is preferred.
+
+Conventions and notation used in this document
+==============================================
+
+1. The general V4L2 API rules apply if not specified in this document
+   otherwise.
+
+2. The meaning of words “must”, “may”, “should”, etc. is as per RFC
+   2119.
+
+3. All steps not marked “optional” are required.
+
+4. :c:func:`VIDIOC_G_EXT_CTRLS`, :c:func:`VIDIOC_S_EXT_CTRLS` may be used interchangeably with
+   :c:func:`VIDIOC_G_CTRL`, :c:func:`VIDIOC_S_CTRL`, unless specified otherwise.
+
+5. Single-plane API (see spec) and applicable structures may be used
+   interchangeably with Multi-plane API, unless specified otherwise.
+
+6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
+   [0..2]: i = 0, 1, 2.
+
+7. For OUTPUT buffer A, A’ represents a buffer on the CAPTURE queue
+   containing data (decoded or encoded frame/stream) that resulted
+   from processing buffer A.
+
+Glossary
+========
+
+CAPTURE
+   the destination buffer queue, decoded frames for
+   decoders, encoded bitstream for encoders;
+   ``V4L2_BUF_TYPE_VIDEO_CAPTURE`` or
+   ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``
+
+client
+   application client communicating with the driver
+   implementing this API
+
+coded format
+   encoded/compressed video bitstream format (e.g.
+   H.264, VP8, etc.); see raw format; this is not equivalent to fourcc
+   (V4L2 pixelformat), as each coded format may be supported by multiple
+   fourccs (e.g. ``V4L2_PIX_FMT_H264``, ``V4L2_PIX_FMT_H264_SLICE``, etc.)
+
+coded height
+   height for given coded resolution
+
+coded resolution
+   stream resolution in pixels aligned to codec
+   format and hardware requirements; see also visible resolution
+
+coded width
+   width for given coded resolution
+
+decode order
+   the order in which frames are decoded; may differ
+   from display (output) order if frame reordering (B frames) is active in
+   the stream; OUTPUT buffers must be queued in decode order; for frame
+   API, CAPTURE buffers must be returned by the driver in decode order;
+
+display order
+   the order in which frames must be displayed
+   (outputted); for stream API, CAPTURE buffers must be returned by the
+   driver in display order;
+
+EOS
+   end of stream
+
+input height
+   height in pixels for given input resolution
+
+input resolution
+   resolution in pixels of source frames being input
+   to the encoder and subject to further cropping to the bounds of visible
+   resolution
+
+input width
+   width in pixels for given input resolution
+
+OUTPUT
+   the source buffer queue, encoded bitstream for
+   decoders, raw frames for encoders; ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
+   ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``
+
+raw format
+   uncompressed format containing raw pixel data (e.g.
+   YUV, RGB formats)
+
+resume point
+   a point in the bitstream from which decoding may
+   start/continue, without any previous state/data present, e.g.: a
+   keyframe (VPX) or SPS/PPS/IDR sequence (H.264); a resume point is
+   required to start decode of a new stream, or to resume decoding after a
+   seek;
+
+source buffer
+   buffers allocated for source queue
+
+source queue
+   queue containing buffers used for source data, i.e.
+
+visible height
+   height for given visible resolution
+
+visible resolution
+   stream resolution of the visible picture, in
+   pixels, to be used for display purposes; must be smaller or equal to
+   coded resolution;
+
+visible width
+   width for given visible resolution
+
+Decoder
+=======
+
+Querying capabilities
+---------------------
+
+1. To enumerate the set of coded formats supported by the driver, the
+   client uses :c:func:`VIDIOC_ENUM_FMT` for OUTPUT. The driver must always
+   return the full set of supported formats, irrespective of the
+   format set on the CAPTURE queue.
+
+2. To enumerate the set of supported raw formats, the client uses
+   :c:func:`VIDIOC_ENUM_FMT` for CAPTURE. The driver must return only the
+   formats supported for the format currently set on the OUTPUT
+   queue.
+   In order to enumerate raw formats supported by a given coded
+   format, the client must first set that coded format on the
+   OUTPUT queue and then enumerate the CAPTURE queue.
+
+3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
+   resolutions for a given format, passing its fourcc in
+   :c:type:`v4l2_frmivalenum` ``pixel_format``.
+
+   a. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for coded formats
+      must be maximums for given coded format for all supported raw
+      formats.
+
+   b. Values returned from :c:func:`VIDIOC_ENUM_FRAMESIZES` for raw formats must
+      be maximums for given raw format for all supported coded
+      formats.
+
+   c. The client should derive the supported resolution for a
+      combination of coded+raw format by calculating the
+      intersection of resolutions returned from calls to
+      :c:func:`VIDIOC_ENUM_FRAMESIZES` for the given coded and raw formats.
+
+4. Supported profiles and levels for given format, if applicable, may be
+   queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
+
+5. The client may use :c:func:`VIDIOC_ENUM_FRAMEINTERVALS` to enumerate maximum
+   supported framerates by the driver/hardware for a given
+   format+resolution combination.
+
+Initialization sequence
+-----------------------
+
+1. (optional) Enumerate supported OUTPUT formats and resolutions. See
+   capability enumeration.
+
+2. Set a coded format on the source queue via :c:func:`VIDIOC_S_FMT`
+
+   a. Required fields:
+
+      i.   type = OUTPUT
+
+      ii.  fmt.pix_mp.pixelformat set to a coded format
+
+      iii. fmt.pix_mp.width, fmt.pix_mp.height only if cannot be
+           parsed from the stream for the given coded format;
+           ignored otherwise;
+
+   b. Return values:
+
+      i.  EINVAL: unsupported format.
+
+      ii. Others: per spec
+
+   .. note::
+
+      The driver must not adjust pixelformat, so if
+      ``V4L2_PIX_FMT_H264`` is passed but only
+      ``V4L2_PIX_FMT_H264_SLICE`` is supported, S_FMT will return
+      -EINVAL. If both are acceptable by client, calling S_FMT for
+      the other after one gets rejected may be required (or use
+      :c:func:`VIDIOC_ENUM_FMT` to discover beforehand, see Capability
+      enumeration).
+
+3.  (optional) Get minimum number of buffers required for OUTPUT queue
+    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
+    more buffers than minimum required by hardware/format (see
+    allocation).
+
+    a. Required fields:
+
+       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
+
+    b. Return values: per spec.
+
+    c. Return fields:
+
+       i. value: required number of OUTPUT buffers for the currently set
+          format;
+
+4.  Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on OUTPUT
+    queue.
+
+    a. Required fields:
+
+       i.   count = n, where n > 0.
+
+       ii.  type = OUTPUT
+
+       iii. memory = as per spec
+
+    b. Return values: Per spec.
+
+    c. Return fields:
+
+       i. count: adjusted to allocated number of buffers
+
+    d. The driver must adjust count to minimum of required number of
+       source buffers for given format and count passed. The client
+       must check this value after the ioctl returns to get the
+       number of buffers allocated.
+
+    .. note::
+
+       Passing count = 1 is useful for letting the driver choose
+       the minimum according to the selected format/hardware
+       requirements.
+
+    .. note::
+
+       To allocate more than minimum number of buffers (for pipeline
+       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT)`` to
+       get minimum number of buffers required by the driver/format,
+       and pass the obtained value plus the number of additional
+       buffers needed in count to :c:func:`VIDIOC_REQBUFS`.
+
+5.  Begin parsing the stream for stream metadata via :c:func:`VIDIOC_STREAMON` on
+    OUTPUT queue. This step allows the driver to parse/decode
+    initial stream metadata until enough information to allocate
+    CAPTURE buffers is found. This is indicated by the driver by
+    sending a ``V4L2_EVENT_SOURCE_CHANGE`` event, which the client
+    must handle.
+
+    a. Required fields: as per spec.
+
+    b. Return values: as per spec.
+
+    .. note::
+
+       Calling :c:func:`VIDIOC_REQBUFS`, :c:func:`VIDIOC_STREAMON`
+       or :c:func:`VIDIOC_G_FMT` on the CAPTURE queue at this time is not
+       allowed and must return EINVAL.
+
+6.  This step only applies for coded formats that contain resolution
+    information in the stream.
+    Continue queuing/dequeuing bitstream buffers to/from the
+    OUTPUT queue via :c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`. The driver
+    must keep processing and returning each buffer to the client
+    until required metadata to send a ``V4L2_EVENT_SOURCE_CHANGE``
+    for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION`` is
+    found. There is no requirement to pass enough data for this to
+    occur in the first buffer and the driver must be able to
+    process any number
+
+    a. Required fields: as per spec.
+
+    b. Return values: as per spec.
+
+    c. If data in a buffer that triggers the event is required to decode
+       the first frame, the driver must not return it to the client,
+       but must retain it for further decoding.
+
+    d. Until the resolution source event is sent to the client, calling
+       :c:func:`VIDIOC_G_FMT` on the CAPTURE queue must return -EINVAL.
+
+    .. note::
+
+       No decoded frames are produced during this phase.
+
+7.  This step only applies for coded formats that contain resolution
+    information in the stream.
+    Receive and handle ``V4L2_EVENT_SOURCE_CHANGE`` from the driver
+    via :c:func:`VIDIOC_DQEVENT`. The driver must send this event once
+    enough data is obtained from the stream to allocate CAPTURE
+    buffers and to begin producing decoded frames.
+
+    a. Required fields:
+
+       i. type = ``V4L2_EVENT_SOURCE_CHANGE``
+
+    b. Return values: as per spec.
+
+    c. The driver must return u.src_change.changes =
+       ``V4L2_EVENT_SRC_CH_RESOLUTION``.
+
+8.  This step only applies for coded formats that contain resolution
+    information in the stream.
+    Call :c:func:`VIDIOC_G_FMT` for CAPTURE queue to get format for the
+    destination buffers parsed/decoded from the bitstream.
+
+    a. Required fields:
+
+       i. type = CAPTURE
+
+    b. Return values: as per spec.
+
+    c. Return fields:
+
+       i.   fmt.pix_mp.width, fmt.pix_mp.height: coded resolution
+            for the decoded frames
+
+       ii.  fmt.pix_mp.pixelformat: default/required/preferred by
+            driver pixelformat for decoded frames.
+
+       iii. num_planes: set to number of planes for pixelformat.
+
+       iv.  For each plane p = [0, num_planes-1]:
+            plane_fmt[p].sizeimage, plane_fmt[p].bytesperline as
+            per spec for coded resolution.
+
+    .. note::
+
+       Te value of pixelformat may be any pixel format supported,
+       and must
+       be supported for current stream, based on the information
+       parsed from the stream and hardware capabilities. It is
+       suggested that driver chooses the preferred/optimal format
+       for given configuration. For example, a YUV format may be
+       preferred over an RGB format, if additional conversion step
+       would be required.
+
+9.  (optional) Enumerate CAPTURE formats via :c:func:`VIDIOC_ENUM_FMT` on
+    CAPTURE queue.
+    Once the stream information is parsed and known, the client
+    may use this ioctl to discover which raw formats are supported
+    for given stream and select on of them via :c:func:`VIDIOC_S_FMT`.
+
+    a. Fields/return values as per spec.
+
+    .. note::
+
+       The driver must return only formats supported for the
+       current stream parsed in this initialization sequence, even
+       if more formats may be supported by the driver in general.
+       For example, a driver/hardware may support YUV and RGB
+       formats for resolutions 1920x1088 and lower, but only YUV for
+       higher resolutions (e.g. due to memory bandwidth
+       limitations). After parsing a resolution of 1920x1088 or
+       lower, :c:func:`VIDIOC_ENUM_FMT` may return a set of YUV and RGB
+       pixelformats, but after parsing resolution higher than
+       1920x1088, the driver must not return (unsupported for this
+       resolution) RGB.
+
+       However, subsequent resolution change event
+       triggered after discovering a resolution change within the
+       same stream may switch the stream into a lower resolution;
+       :c:func:`VIDIOC_ENUM_FMT` must return RGB formats again in that case.
+
+10.  (optional) Choose a different CAPTURE format than suggested via
+     :c:func:`VIDIOC_S_FMT` on CAPTURE queue. It is possible for the client
+     to choose a different format than selected/suggested by the
+     driver in :c:func:`VIDIOC_G_FMT`.
+
+     a. Required fields:
+
+        i.  type = CAPTURE
+
+        ii. fmt.pix_mp.pixelformat set to a coded format
+
+     b. Return values:
+
+        i. EINVAL: unsupported format.
+
+     c. Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available formats
+        after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to find
+        out a set of allowed pixelformats for given configuration,
+        but not required.
+
+11.  (optional) Acquire visible resolution via :c:func:`VIDIOC_G_SELECTION`.
+
+    a. Required fields:
+
+       i.  type = CAPTURE
+
+       ii. target = ``V4L2_SEL_TGT_CROP``
+
+    b. Return values: per spec.
+
+    c. Return fields
+
+       i. r.left, r.top, r.width, r.height: visible rectangle; this must
+          fit within coded resolution returned from :c:func:`VIDIOC_G_FMT`.
+
+12. (optional) Get minimum number of buffers required for CAPTURE queue
+    via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to use
+    more buffers than minimum required by hardware/format (see
+    allocation).
+
+    a. Required fields:
+
+       i. id = ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
+
+    b. Return values: per spec.
+
+    c. Return fields:
+
+       i. value: minimum number of buffers required to decode the stream
+          parsed in this initialization sequence.
+
+    .. note::
+
+       Note that the minimum number of buffers must be at least the
+       number required to successfully decode the current stream.
+       This may for example be the required DPB size for an H.264
+       stream given the parsed stream configuration (resolution,
+       level).
+
+13. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS` on the
+    CAPTURE queue.
+
+    a. Required fields:
+
+       i.   count = n, where n > 0.
+
+       ii.  type = CAPTURE
+
+       iii. memory = as per spec
+
+    b. Return values: Per spec.
+
+    c. Return fields:
+
+       i. count: adjusted to allocated number of buffers.
+
+    d. The driver must adjust count to minimum of required number of
+       destination buffers for given format and stream configuration
+       and the count passed. The client must check this value after
+       the ioctl returns to get the number of buffers allocated.
+
+    .. note::
+
+       Passing count = 1 is useful for letting the driver choose
+       the minimum.
+
+    .. note::
+
+       To allocate more than minimum number of buffers (for pipeline
+       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE)`` to
+       get minimum number of buffers required, and pass the obtained
+       value plus the number of additional buffers needed in count
+       to :c:func:`VIDIOC_REQBUFS`.
+
+14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
+
+    a. Required fields: as per spec.
+
+    b. Return values: as per spec.
+
+Decoding
+--------
+
+This state is reached after a successful initialization sequence. In
+this state, client queues and dequeues buffers to both queues via
+:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, as per spec.
+
+Both queues operate independently. The client may queue and dequeue
+buffers to queues in any order and at any rate, also at a rate different
+for each queue. The client may queue buffers within the same queue in
+any order (V4L2 index-wise). It is recommended for the client to operate
+the queues independently for best performance.
+
+Source OUTPUT buffers must contain:
+
+-  H.264/AVC: one or more complete NALUs of an Annex B elementary
+   stream; one buffer does not have to contain enough data to decode
+   a frame;
+
+-  VP8/VP9: one or more complete frames.
+
+No direct relationship between source and destination buffers and the
+timing of buffers becoming available to dequeue should be assumed in the
+Stream API. Specifically:
+
+-  a buffer queued to OUTPUT queue may result in no buffers being
+   produced on the CAPTURE queue (e.g. if it does not contain
+   encoded data, or if only metadata syntax structures are present
+   in it), or one or more buffers produced on the CAPTURE queue (if
+   the encoded data contained more than one frame, or if returning a
+   decoded frame allowed the driver to return a frame that preceded
+   it in decode, but succeeded it in display order)
+
+-  a buffer queued to OUTPUT may result in a buffer being produced on
+   the CAPTURE queue later into decode process, and/or after
+   processing further OUTPUT buffers, or be returned out of order,
+   e.g. if display reordering is used
+
+-  buffers may become available on the CAPTURE queue without additional
+   buffers queued to OUTPUT (e.g. during flush or EOS)
+
+Seek
+----
+
+Seek is controlled by the OUTPUT queue, as it is the source of bitstream
+data. CAPTURE queue remains unchanged/unaffected.
+
+1. Stop the OUTPUT queue to begin the seek sequence via
+   :c:func:`VIDIOC_STREAMOFF`.
+
+   a. Required fields:
+
+      i. type = OUTPUT
+
+   b. The driver must drop all the pending OUTPUT buffers and they are
+      treated as returned to the client (as per spec).
+
+2. Restart the OUTPUT queue via :c:func:`VIDIOC_STREAMON`
+
+   a. Required fields:
+
+      i. type = OUTPUT
+
+   b. The driver must be put in a state after seek and be ready to
+      accept new source bitstream buffers.
+
+3. Start queuing buffers to OUTPUT queue containing stream data after
+   the seek until a suitable resume point is found.
+
+   .. note::
+
+      There is no requirement to begin queuing stream
+      starting exactly from a resume point (e.g. SPS or a keyframe).
+      The driver must handle any data queued and must keep processing
+      the queued buffers until it finds a suitable resume point.
+      While looking for a resume point, the driver processes OUTPUT
+      buffers and returns them to the client without producing any
+      decoded frames.
+
+4. After a resume point is found, the driver will start returning
+   CAPTURE buffers with decoded frames.
+
+   .. note::
+
+      There is no precise specification for CAPTURE queue of when it
+      will start producing buffers containing decoded data from
+      buffers queued after the seek, as it operates independently
+      from OUTPUT queue.
+
+      -  The driver is allowed to and may return a number of remaining CAPTURE
+         buffers containing decoded frames from before the seek after the
+         seek sequence (STREAMOFF-STREAMON) is performed.
+
+      -  The driver is also allowed to and may not return all decoded frames
+         queued but not decode before the seek sequence was initiated.
+         E.g. for an OUTPUT queue sequence: QBUF(A), QBUF(B),
+         STREAMOFF(OUT), STREAMON(OUT), QBUF(G), QBUF(H), any of the
+         following results on the CAPTURE queue is allowed: {A’, B’, G’,
+         H’}, {A’, G’, H’}, {G’, H’}.
+
+Pause
+-----
+
+In order to pause, the client should just cease queuing buffers onto the
+OUTPUT queue. This is different from the general V4L2 API definition of
+pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue. Without
+source bitstream data, there is not data to process and the hardware
+remains idle. Conversely, using :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue
+indicates a seek, which 1) drops all buffers in flight and 2) after a
+subsequent :c:func:`VIDIOC_STREAMON` will look for and only continue from a
+resume point. This is usually undesirable for pause. The
+STREAMOFF-STREAMON sequence is intended for seeking.
+
+Similarly, CAPTURE queue should remain streaming as well, as the
+STREAMOFF-STREAMON sequence on it is intended solely for changing buffer
+sets
+
+Dynamic resolution change
+-------------------------
+
+When driver encounters a resolution change in the stream, the dynamic
+resolution change sequence is started.
+
+1.  On encountering a resolution change in the stream. The driver must
+    first process and decode all remaining buffers from before the
+    resolution change point.
+
+2.  After all buffers containing decoded frames from before the
+    resolution change point are ready to be dequeued on the
+    CAPTURE queue, the driver sends a ``V4L2_EVENT_SOURCE_CHANGE``
+    event for source change type ``V4L2_EVENT_SRC_CH_RESOLUTION``.
+    The last buffer from before the change must be marked with
+    :c:type:`v4l2_buffer` ``flags`` flag ``V4L2_BUF_FLAG_LAST`` as in the flush
+    sequence.
+
+    .. note::
+
+       Any attempts to dequeue more buffers beyond the buffer marked
+       with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
+       :c:func:`VIDIOC_DQBUF`.
+
+3.  After dequeuing all remaining buffers from the CAPTURE queue, the
+    client must call :c:func:`VIDIOC_STREAMOFF` on the CAPTURE queue. The
+    OUTPUT queue remains streaming (calling STREAMOFF on it would
+    trigger a seek).
+    Until STREAMOFF is called on the CAPTURE queue (acknowledging
+    the event), the driver operates as if the resolution hasn’t
+    changed yet, i.e. :c:func:`VIDIOC_G_FMT`, etc. return previous
+    resolution.
+
+4.  The client frees the buffers on the CAPTURE queue using
+    :c:func:`VIDIOC_REQBUFS`.
+
+    a. Required fields:
+
+       i.   count = 0
+
+       ii.  type = CAPTURE
+
+       iii. memory = as per spec
+
+5.  The client calls :c:func:`VIDIOC_G_FMT` for CAPTURE to get the new format
+    information.
+    This is identical to calling :c:func:`VIDIOC_G_FMT` after
+    ``V4L2_EVENT_SRC_CH_RESOLUTION`` in the initialization
+    sequence and should be handled similarly.
+
+    .. note::
+
+       It is allowed for the driver not to support the same
+       pixelformat as previously used (before the resolution change)
+       for the new resolution. The driver must select a default
+       supported pixelformat and return it from :c:func:`VIDIOC_G_FMT`, and
+       client must take note of it.
+
+6.  (optional) The client is allowed to enumerate available formats and
+    select a different one than currently chosen (returned via
+    :c:func:`VIDIOC_G_FMT)`. This is identical to a corresponding step in
+    the initialization sequence.
+
+7.  (optional) The client acquires visible resolution as in
+    initialization sequence.
+
+8.  (optional) The client acquires minimum number of buffers as in
+    initialization sequence.
+
+9.  The client allocates a new set of buffers for the CAPTURE queue via
+    :c:func:`VIDIOC_REQBUFS`. This is identical to a corresponding step in
+    the initialization sequence.
+
+10. The client resumes decoding by issuing :c:func:`VIDIOC_STREAMON` on the
+    CAPTURE queue.
+
+During the resolution change sequence, the OUTPUT queue must remain
+streaming. Calling :c:func:`VIDIOC_STREAMOFF` on OUTPUT queue will initiate seek.
+
+The OUTPUT queue operates separately from the CAPTURE queue for the
+duration of the entire resolution change sequence. It is allowed (and
+recommended for best performance and simplcity) for the client to keep
+queuing/dequeuing buffers from/to OUTPUT queue even while processing
+this sequence.
+
+.. note::
+
+   It is also possible for this sequence to be triggered without
+   change in resolution if a different number of CAPTURE buffers is
+   required in order to continue decoding the stream.
+
+Flush
+-----
+
+Flush is the process of draining the CAPTURE queue of any remaining
+buffers. After the flush sequence is complete, the client has received
+all decoded frames for all OUTPUT buffers queued before the sequence was
+started.
+
+1. Begin flush by issuing :c:func:`VIDIOC_DECODER_CMD`.
+
+   a. Required fields:
+
+      i. cmd = ``V4L2_DEC_CMD_STOP``
+
+2. The driver must process and decode as normal all OUTPUT buffers
+   queued by the client before the :c:func:`VIDIOC_DECODER_CMD` was
+   issued.
+   Any operations triggered as a result of processing these
+   buffers (including the initialization and resolution change
+   sequences) must be processed as normal by both the driver and
+   the client before proceeding with the flush sequence.
+
+3. Once all OUTPUT buffers queued before ``V4L2_DEC_CMD_STOP`` are
+   processed:
+
+   a. If the CAPTURE queue is streaming, once all decoded frames (if
+      any) are ready to be dequeued on the CAPTURE queue, the
+      driver must send a ``V4L2_EVENT_EOS``. The driver must also
+      set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer` ``flags`` field on the
+      buffer on the CAPTURE queue containing the last frame (if
+      any) produced as a result of processing the OUTPUT buffers
+      queued before ``V4L2_DEC_CMD_STOP``. If no more frames are
+      left to be returned at the point of handling
+      ``V4L2_DEC_CMD_STOP``, the driver must return an empty buffer
+      (with :c:type:`v4l2_buffer` ``bytesused`` = 0) as the last buffer with
+      ``V4L2_BUF_FLAG_LAST`` set instead.
+      Any attempts to dequeue more buffers beyond the buffer
+      marked with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE
+      error from :c:func:`VIDIOC_DQBUF`.
+
+   b. If the CAPTURE queue is NOT streaming, no action is necessary for
+      CAPTURE queue and the driver must send a ``V4L2_EVENT_EOS``
+      immediately after all OUTPUT buffers in question have been
+      processed.
+
+4. To resume, client may issue ``V4L2_DEC_CMD_START``.
+
+End of stream
+-------------
+
+When an explicit end of stream is encountered by the driver in the
+stream, it must send a ``V4L2_EVENT_EOS`` to the client after all frames
+are decoded and ready to be dequeued on the CAPTURE queue, with the
+:c:type:`v4l2_buffer` ``flags`` set to ``V4L2_BUF_FLAG_LAST``. This behavior is
+identical to the flush sequence as if triggered by the client via
+``V4L2_DEC_CMD_STOP``.
+
+Commit points
+-------------
+
+Setting formats and allocating buffers triggers changes in the behavior
+of the driver.
+
+1. Setting format on OUTPUT queue may change the set of formats
+   supported/advertised on the CAPTURE queue. It also must change
+   the format currently selected on CAPTURE queue if it is not
+   supported by the newly selected OUTPUT format to a supported one.
+
+2. Enumerating formats on CAPTURE queue must only return CAPTURE formats
+   supported for the OUTPUT format currently set.
+
+3. Setting/changing format on CAPTURE queue does not change formats
+   available on OUTPUT queue. An attempt to set CAPTURE format that
+   is not supported for the currently selected OUTPUT format must
+   result in an error (-EINVAL) from :c:func:`VIDIOC_S_FMT`.
+
+4. Enumerating formats on OUTPUT queue always returns a full set of
+   supported formats, irrespective of the current format selected on
+   CAPTURE queue.
+
+5. After allocating buffers on the OUTPUT queue, it is not possible to
+   change format on it.
+
+To summarize, setting formats and allocation must always start with the
+OUTPUT queue and the OUTPUT queue is the master that governs the set of
+supported formats for the CAPTURE queue.
diff --git a/Documentation/media/uapi/v4l/v4l2.rst b/Documentation/media/uapi/v4l/v4l2.rst
index b89e5621ae69..563d5b861d1c 100644
--- a/Documentation/media/uapi/v4l/v4l2.rst
+++ b/Documentation/media/uapi/v4l/v4l2.rst
@@ -53,6 +53,10 @@  Authors, in alphabetical order:
 
   - Original author of the V4L2 API and documentation.
 
+- Figa, Tomasz <tfiga@chromium.org>
+
+  - Documented parts of the V4L2 (stateful) Codec Interface. Migrated from Google Docs to kernel documentation.
+
 - H Schimek, Michael <mschimek@gmx.at>
 
   - Original author of the V4L2 API and documentation.
@@ -65,6 +69,10 @@  Authors, in alphabetical order:
 
   - Designed and documented the multi-planar API.
 
+- Osciak, Pawel <posciak@chromium.org>
+
+  - Documented the V4L2 (stateful) Codec Interface.
+
 - Palosaari, Antti <crope@iki.fi>
 
   - SDR API.
@@ -85,7 +93,7 @@  Authors, in alphabetical order:
 
   - Designed and documented the VIDIOC_LOG_STATUS ioctl, the extended control ioctls, major parts of the sliced VBI API, the MPEG encoder and decoder APIs and the DV Timings API.
 
-**Copyright** |copy| 1999-2016: Bill Dirks, Michael H. Schimek, Hans Verkuil, Martin Rubli, Andy Walls, Muralidharan Karicheri, Mauro Carvalho Chehab, Pawel Osciak, Sakari Ailus & Antti Palosaari.
+**Copyright** |copy| 1999-2018: Bill Dirks, Michael H. Schimek, Hans Verkuil, Martin Rubli, Andy Walls, Muralidharan Karicheri, Mauro Carvalho Chehab, Pawel Osciak, Sakari Ailus & Antti Palosaari, Tomasz Figa.
 
 Except when explicitly stated as GPL, programming examples within this
 part can be used and distributed without restrictions.
@@ -94,6 +102,10 @@  part can be used and distributed without restrictions.
 Revision History
 ****************
 
+:revision: TBD / TBD (*tf*)
+
+Add specification of V4L2 Codec Interface UAPI.
+
 :revision: 4.10 / 2016-07-15 (*rr*)
 
 Introduce HSV formats.