Message ID | 20190703122849.6316-2-boris.brezillon@collabora.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | media: uapi: h264: First batch of adjusments | expand |
Hi Boris, Paul, On Wed, 2019-07-03 at 14:28 +0200, Boris Brezillon wrote: > Looks like some stateless decoders expect slices to be prefixed with > ANNEX B start codes (they most likely do some kind of bitstream parsing > and/or need that to delimit slices when doing per frame decoding). > Since skipping those start codes for dummy stateless decoders (those > expecting all params to be passed through controls) should be pretty > easy, let's mandate that all slices be prepended with ANNEX B start > codes. > > If we ever need to support AVC headers, we can add a new menu control > to select the type of NAL header to use. > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> > --- > Changes in v3: > * Add Paul's R-b > > Changes in v2: > * None > --- > Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > index 7a1947f5be96..3ae1367806cf 100644 > --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > @@ -1726,6 +1726,7 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type - > :ref:`h264`, section 7.4.3 "Slice Header Semantics". For further > documentation, refer to the above specification, unless there is > an explicit comment stating otherwise. > + All slices should be prepended with an ANNEX B start code. > Currently, the H264 slice V4L2_PIX_FMT_H264_SLICE_RAW, is specified to _not_ contain the ANNEX B start code. As you know, this is used in the cedrus driver, which is not expecting the start code. What's the plan regarding that? Thanks, Ezequiel
On Fri, 05 Jul 2019 13:40:03 -0300 Ezequiel Garcia <ezequiel@collabora.com> wrote: > Hi Boris, Paul, > > On Wed, 2019-07-03 at 14:28 +0200, Boris Brezillon wrote: > > Looks like some stateless decoders expect slices to be prefixed with > > ANNEX B start codes (they most likely do some kind of bitstream parsing > > and/or need that to delimit slices when doing per frame decoding). > > Since skipping those start codes for dummy stateless decoders (those > > expecting all params to be passed through controls) should be pretty > > easy, let's mandate that all slices be prepended with ANNEX B start > > codes. > > > > If we ever need to support AVC headers, we can add a new menu control > > to select the type of NAL header to use. > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > > Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> > > --- > > Changes in v3: > > * Add Paul's R-b > > > > Changes in v2: > > * None > > --- > > Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > index 7a1947f5be96..3ae1367806cf 100644 > > --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > @@ -1726,6 +1726,7 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type - > > :ref:`h264`, section 7.4.3 "Slice Header Semantics". For further > > documentation, refer to the above specification, unless there is > > an explicit comment stating otherwise. > > + All slices should be prepended with an ANNEX B start code. > > > > Currently, the H264 slice V4L2_PIX_FMT_H264_SLICE_RAW, > is specified to _not_ contain the ANNEX B start code. Yep, we should provably rename the format. > > As you know, this is used in the cedrus driver, which is not > expecting the start code. > > What's the plan regarding that? I had a few patches modifying the cedrus driver accordingly, but I dropped them in v2. I guess we can fix the driver once we've settled on the uAPI changes.
On Fri, 5 Jul 2019 19:16:18 +0200 Boris Brezillon <boris.brezillon@collabora.com> wrote: > On Fri, 05 Jul 2019 13:40:03 -0300 > Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > Hi Boris, Paul, > > > > On Wed, 2019-07-03 at 14:28 +0200, Boris Brezillon wrote: > > > Looks like some stateless decoders expect slices to be prefixed with > > > ANNEX B start codes (they most likely do some kind of bitstream parsing > > > and/or need that to delimit slices when doing per frame decoding). > > > Since skipping those start codes for dummy stateless decoders (those > > > expecting all params to be passed through controls) should be pretty > > > easy, let's mandate that all slices be prepended with ANNEX B start > > > codes. > > > > > > If we ever need to support AVC headers, we can add a new menu control > > > to select the type of NAL header to use. > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > > > Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> > > > --- > > > Changes in v3: > > > * Add Paul's R-b > > > > > > Changes in v2: > > > * None > > > --- > > > Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > > index 7a1947f5be96..3ae1367806cf 100644 > > > --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > > +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > > @@ -1726,6 +1726,7 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type - > > > :ref:`h264`, section 7.4.3 "Slice Header Semantics". For further > > > documentation, refer to the above specification, unless there is > > > an explicit comment stating otherwise. > > > + All slices should be prepended with an ANNEX B start code. > > > > > > > Currently, the H264 slice V4L2_PIX_FMT_H264_SLICE_RAW, > > is specified to _not_ contain the ANNEX B start code. > > Yep, we should provably rename the format. Paul, are you okay with this rename? s/V4L2_PIX_FMT_H264_SLICE_RAW/V4L2_PIX_FMT_H264_SLICE/ or s/V4L2_PIX_FMT_H264_SLICE_RAW/V4L2_PIX_FMT_H264_SLICE_ANNEXB/ I'd also to discuss some concerns Ezequiel and I have regarding this change. Some (most?) codec have alignment constraints on the buffer they pass to the HW. For HW that support Annex B parsing, that's no problem because the start of the buffer will be aligned on a page (I'm assuming page alignment should cover 99% of the alignment constraints). But HW that need to skip the start code will have to pass a non-aligned buffer (annex B start code is 3 byte long). Paul looked at the Cedrus driver and thinks it can be handled correctly thanks to the VE_H264_VLD_OFFSET field (which encodes an offset in bit), but I fear this might be a problem on other HW. So, I'm asking again, are we sure we want to handle the raw (no start code) and annex-b cases using the same pixel format? If we do, what's the plan to address those potential alignment constraints? Should we provide a way for userspace to define where the start-code ends so it can align things properly (annex B can be extended with extra 00 bytes at the beginning)? If we do that, that means userspace has to know about those alignment constraints, or take something big enough. Another option would be to use a bounce buffer when things are not aligned properly. I'd really like to get feedback on those points before sending a v4.
Hi, On Wed 03 Jul 19, 14:28, Boris Brezillon wrote: > Looks like some stateless decoders expect slices to be prefixed with > ANNEX B start codes (they most likely do some kind of bitstream parsing > and/or need that to delimit slices when doing per frame decoding). > Since skipping those start codes for dummy stateless decoders (those > expecting all params to be passed through controls) should be pretty > easy, let's mandate that all slices be prepended with ANNEX B start > codes. > > If we ever need to support AVC headers, we can add a new menu control > to select the type of NAL header to use. After thinking a bit about this, I'd rather be in favor of having offsets in the control structures rather than forcing the start code type or having a dedicated control for that, as I've mentionned on the other patch. What do you think? Cheers, Paul > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> > --- > Changes in v3: > * Add Paul's R-b > > Changes in v2: > * None > --- > Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > index 7a1947f5be96..3ae1367806cf 100644 > --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > @@ -1726,6 +1726,7 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type - > :ref:`h264`, section 7.4.3 "Slice Header Semantics". For further > documentation, refer to the above specification, unless there is > an explicit comment stating otherwise. > + All slices should be prepended with an ANNEX B start code. > > .. note:: > > -- > 2.21.0 >
Hi, On Thu 25 Jul 19, 08:42, Boris Brezillon wrote: > On Fri, 5 Jul 2019 19:16:18 +0200 > Boris Brezillon <boris.brezillon@collabora.com> wrote: > > > On Fri, 05 Jul 2019 13:40:03 -0300 > > Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > > > Hi Boris, Paul, > > > > > > On Wed, 2019-07-03 at 14:28 +0200, Boris Brezillon wrote: > > > > Looks like some stateless decoders expect slices to be prefixed with > > > > ANNEX B start codes (they most likely do some kind of bitstream parsing > > > > and/or need that to delimit slices when doing per frame decoding). > > > > Since skipping those start codes for dummy stateless decoders (those > > > > expecting all params to be passed through controls) should be pretty > > > > easy, let's mandate that all slices be prepended with ANNEX B start > > > > codes. > > > > > > > > If we ever need to support AVC headers, we can add a new menu control > > > > to select the type of NAL header to use. > > > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > > > > Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> > > > > --- > > > > Changes in v3: > > > > * Add Paul's R-b > > > > > > > > Changes in v2: > > > > * None > > > > --- > > > > Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > > > index 7a1947f5be96..3ae1367806cf 100644 > > > > --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > > > +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > > > @@ -1726,6 +1726,7 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type - > > > > :ref:`h264`, section 7.4.3 "Slice Header Semantics". For further > > > > documentation, refer to the above specification, unless there is > > > > an explicit comment stating otherwise. > > > > + All slices should be prepended with an ANNEX B start code. > > > > > > > > > > Currently, the H264 slice V4L2_PIX_FMT_H264_SLICE_RAW, > > > is specified to _not_ contain the ANNEX B start code. > > > > Yep, we should provably rename the format. > > Paul, are you okay with this rename? Sorry for the very long response time here, I've had a hard time getting back into the context of all this. > s/V4L2_PIX_FMT_H264_SLICE_RAW/V4L2_PIX_FMT_H264_SLICE/ > > or > > s/V4L2_PIX_FMT_H264_SLICE_RAW/V4L2_PIX_FMT_H264_SLICE_ANNEXB/ I'd be in favor of the former (V4L2_PIX_FMT_H264_SLICE) and passing offsets to the beinning and after the start code. That would be more flexible, but one downside could be decoders that some decoders only take a specific start code. On the other hand I don't think that having one pixel format for each type of start code would be very reasonable, so I'd rather see an offset for now and perhaps a menu control later if needed to specify which types of start codes are supported. > I'd also to discuss some concerns Ezequiel and I have regarding this > change. Some (most?) codec have alignment constraints on the buffer > they pass to the HW. For HW that support Annex B parsing, that's no > problem because the start of the buffer will be aligned on a page (I'm > assuming page alignment should cover 99% of the alignment constraints). > But HW that need to skip the start code will have to pass a non-aligned > buffer (annex B start code is 3 byte long). > Paul looked at the Cedrus driver and thinks it can be handled correctly > thanks to the VE_H264_VLD_OFFSET field (which encodes an offset in bit), > but I fear this might be a problem on other HW. > > So, I'm asking again, are we sure we want to handle the raw (no start > code) and annex-b cases using the same pixel format? If we do, what's > the plan to address those potential alignment constraints? Should > we provide a way for userspace to define where the start-code ends so it > can align things properly (annex B can be extended with extra 00 > bytes at the beginning)? If we do that, that means userspace has to > know about those alignment constraints, or take something big enough. > Another option would be to use a bounce buffer when things are not > aligned properly. > > I'd really like to get feedback on those points before sending a v4. Mhh I don't really know what would be best for handling that. Either way, I don't see how more pixel formats would really help solve the issue, so I'm still in favor of one. Having a control that specifies an alignment constraint for the slice beginning could work (as long as we make it optional, although userspace should be required to abide by it when it is present). I guess it's not such a high price to pay for a unified codec interface :) Cheers, Paul
On Thu, 2019-07-25 at 21:36 +0200, Paul Kocialkowski wrote: > Hi, > > On Thu 25 Jul 19, 08:42, Boris Brezillon wrote: > > On Fri, 5 Jul 2019 19:16:18 +0200 > > Boris Brezillon <boris.brezillon@collabora.com> wrote: > > > > > On Fri, 05 Jul 2019 13:40:03 -0300 > > > Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > > > > > Hi Boris, Paul, > > > > > > > > On Wed, 2019-07-03 at 14:28 +0200, Boris Brezillon wrote: > > > > > Looks like some stateless decoders expect slices to be prefixed with > > > > > ANNEX B start codes (they most likely do some kind of bitstream parsing > > > > > and/or need that to delimit slices when doing per frame decoding). > > > > > Since skipping those start codes for dummy stateless decoders (those > > > > > expecting all params to be passed through controls) should be pretty > > > > > easy, let's mandate that all slices be prepended with ANNEX B start > > > > > codes. > > > > > > > > > > If we ever need to support AVC headers, we can add a new menu control > > > > > to select the type of NAL header to use. > > > > > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > > > > > Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> > > > > > --- > > > > > Changes in v3: > > > > > * Add Paul's R-b > > > > > > > > > > Changes in v2: > > > > > * None > > > > > --- > > > > > Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 1 + > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > > > > index 7a1947f5be96..3ae1367806cf 100644 > > > > > --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > > > > +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > > > > @@ -1726,6 +1726,7 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type - > > > > > :ref:`h264`, section 7.4.3 "Slice Header Semantics". For further > > > > > documentation, refer to the above specification, unless there is > > > > > an explicit comment stating otherwise. > > > > > + All slices should be prepended with an ANNEX B start code. > > > > > > > > > > > > > Currently, the H264 slice V4L2_PIX_FMT_H264_SLICE_RAW, > > > > is specified to _not_ contain the ANNEX B start code. > > > > > > Yep, we should provably rename the format. > > > > Paul, are you okay with this rename? > > Sorry for the very long response time here, I've had a hard time getting back > into the context of all this. > > > s/V4L2_PIX_FMT_H264_SLICE_RAW/V4L2_PIX_FMT_H264_SLICE/ > > > > or > > > > s/V4L2_PIX_FMT_H264_SLICE_RAW/V4L2_PIX_FMT_H264_SLICE_ANNEXB/ > > I'd be in favor of the former (V4L2_PIX_FMT_H264_SLICE) and passing offsets > to the beinning and after the start code. That would be more flexible, but one > downside could be decoders that some decoders only take a specific start code. > > On the other hand I don't think that having one pixel format for each type of > start code would be very reasonable, so I'd rather see an offset for now and > perhaps a menu control later if needed to specify which types of start codes are > supported. > If I am reading the spec correctly, Annex B start code is specified to always be the 3-byte start code: 0x000001. The first NAL of a frame may have an additional 0x00, which effectively means the start code of the first NAL of a frame _can_ be 4-byte 0x00000001, in addition to the 3-byte 0x000001. In other words, there aren't multiple Annex B type of start codes, and only two options for the format of the slice: NAL units with or without a start code. Therefore, I can't see any point in having this offset. > > I'd also to discuss some concerns Ezequiel and I have regarding this > > change. Some (most?) codec have alignment constraints on the buffer > > they pass to the HW. For HW that support Annex B parsing, that's no > > problem because the start of the buffer will be aligned on a page (I'm > > assuming page alignment should cover 99% of the alignment constraints). > > But HW that need to skip the start code will have to pass a non-aligned > > buffer (annex B start code is 3 byte long). > > Paul looked at the Cedrus driver and thinks it can be handled correctly > > thanks to the VE_H264_VLD_OFFSET field (which encodes an offset in bit), > > but I fear this might be a problem on other HW. > > > > So, I'm asking again, are we sure we want to handle the raw (no start > > code) and annex-b cases using the same pixel format? If we do, what's > > the plan to address those potential alignment constraints? Should > > we provide a way for userspace to define where the start-code ends so it > > can align things properly (annex B can be extended with extra 00 > > bytes at the beginning)? If we do that, that means userspace has to > > know about those alignment constraints, or take something big enough. > > Another option would be to use a bounce buffer when things are not > > aligned properly. > > > > I'd really like to get feedback on those points before sending a v4. > > Mhh I don't really know what would be best for handling that. Either way, I > don't see how more pixel formats would really help solve the issue, so I'm still > in favor of one. > > Having a control that specifies an alignment constraint for the slice beginning > could work (as long as we make it optional, although userspace should be > required to abide by it when it is present). > > I guess it's not such a high price to pay for a unified codec interface :) > I don't think the two pixfmts are such a big deal, but at the same time, it would be much simpler for applications to forget about this entirely. Note that the goal of providing an unified interface is making applications simpler, not the other way around. If we kill the no-start-code pixfmt, but we expose the hw alignment requirement, doesn't sound like an improvement :-) Maybe we can see how it works with having just V4L2_PIX_FMT_H264_SLICE_ANNEXB, specified to have a 3-byte start code, as specified by the annex B. Maybe I can put some code together and test it on the Allwinner board I have here (thanks Maxime for that one!). Regards, Eze
On Thu, 25 Jul 2019 23:39:11 -0300 Ezequiel Garcia <ezequiel@collabora.com> wrote: > On Thu, 2019-07-25 at 21:36 +0200, Paul Kocialkowski wrote: > > Hi, > > > > On Thu 25 Jul 19, 08:42, Boris Brezillon wrote: > > > On Fri, 5 Jul 2019 19:16:18 +0200 > > > Boris Brezillon <boris.brezillon@collabora.com> wrote: > > > > > > > On Fri, 05 Jul 2019 13:40:03 -0300 > > > > Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > > > > > > > Hi Boris, Paul, > > > > > > > > > > On Wed, 2019-07-03 at 14:28 +0200, Boris Brezillon wrote: > > > > > > Looks like some stateless decoders expect slices to be prefixed with > > > > > > ANNEX B start codes (they most likely do some kind of bitstream parsing > > > > > > and/or need that to delimit slices when doing per frame decoding). > > > > > > Since skipping those start codes for dummy stateless decoders (those > > > > > > expecting all params to be passed through controls) should be pretty > > > > > > easy, let's mandate that all slices be prepended with ANNEX B start > > > > > > codes. > > > > > > > > > > > > If we ever need to support AVC headers, we can add a new menu control > > > > > > to select the type of NAL header to use. > > > > > > > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > > > > > > Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> > > > > > > --- > > > > > > Changes in v3: > > > > > > * Add Paul's R-b > > > > > > > > > > > > Changes in v2: > > > > > > * None > > > > > > --- > > > > > > Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 1 + > > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > > diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > > > > > index 7a1947f5be96..3ae1367806cf 100644 > > > > > > --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > > > > > +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > > > > > @@ -1726,6 +1726,7 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type - > > > > > > :ref:`h264`, section 7.4.3 "Slice Header Semantics". For further > > > > > > documentation, refer to the above specification, unless there is > > > > > > an explicit comment stating otherwise. > > > > > > + All slices should be prepended with an ANNEX B start code. > > > > > > > > > > > > > > > > Currently, the H264 slice V4L2_PIX_FMT_H264_SLICE_RAW, > > > > > is specified to _not_ contain the ANNEX B start code. > > > > > > > > Yep, we should provably rename the format. > > > > > > Paul, are you okay with this rename? > > > > Sorry for the very long response time here, I've had a hard time getting back > > into the context of all this. > > > > > s/V4L2_PIX_FMT_H264_SLICE_RAW/V4L2_PIX_FMT_H264_SLICE/ > > > > > > or > > > > > > s/V4L2_PIX_FMT_H264_SLICE_RAW/V4L2_PIX_FMT_H264_SLICE_ANNEXB/ > > > > I'd be in favor of the former (V4L2_PIX_FMT_H264_SLICE) and passing offsets > > to the beinning and after the start code. That would be more flexible, but one > > downside could be decoders that some decoders only take a specific start code. > > > > On the other hand I don't think that having one pixel format for each type of > > start code would be very reasonable, so I'd rather see an offset for now and > > perhaps a menu control later if needed to specify which types of start codes are > > supported. > > > > If I am reading the spec correctly, Annex B start code is specified to always > be the 3-byte start code: 0x000001. > > The first NAL of a frame may have an additional 0x00, which effectively means > the start code of the first NAL of a frame _can_ be 4-byte 0x00000001, > in addition to the 3-byte 0x000001. That's not my understanding of the Annex B section (quoting the spec for reference): " The byte stream format consists of a sequence of byte stream NAL unit syntax structures. Each byte stream NAL unit syntax structure contains one start code prefix followed by one nal_unit( NumBytesInNALunit ) syntax structure. It may (and under some circumstances, it shall) also contain an additional zero_byte syntax element. It may also contain one or more additional trailing_zero_8bits syntax elements. When it is the first byte stream NAL unit in the bitstream, it may also contain one or more additional leading_zero_8bits syntax elements. " To me, it looks like the start code can always be 0x000001 or 0x00000001. The first NAL can have extra leading 0x00 bytes, not the following ones, *but*, all NALs can have an arbitrary number of trailing 0x00 bytes. I guess stateless decoders unconditionally apply the "skip_leading_0_bytes logic" described in chapter B.1.1 of the spec to deal with these potential trailing 0x00 bytes. Stateful decoders (those parsing Annex B NAL headers) might skip this "skip leading 0x00 bytes" step for NAL > 0, but I suspect they just always skip leading 0x00 bytes because - it's simple enough - they anyway need the logic for the first NAL - that would require extra information about the NAL number which stateless decoder usually don't have > > In other words, there aren't multiple Annex B type of start codes, and only > two options for the format of the slice: NAL units with or without a start code. There's also the AVC NAL header, and I'm pretty sure you can't use the "add leading 0x00 bytes" trick to align things as you wish with that one. > > Therefore, I can't see any point in having this offset. Assuming the Annex B start codes can contain an arbitrary number of leading 0x00 bytes, we can align things according to the codec expectations. But, as I said below, that implies exposing those alignment constraints. > > > > I'd also to discuss some concerns Ezequiel and I have regarding this > > > change. Some (most?) codec have alignment constraints on the buffer > > > they pass to the HW. For HW that support Annex B parsing, that's no > > > problem because the start of the buffer will be aligned on a page (I'm > > > assuming page alignment should cover 99% of the alignment constraints). > > > But HW that need to skip the start code will have to pass a non-aligned > > > buffer (annex B start code is 3 byte long). > > > Paul looked at the Cedrus driver and thinks it can be handled correctly > > > thanks to the VE_H264_VLD_OFFSET field (which encodes an offset in bit), > > > but I fear this might be a problem on other HW. > > > > > > So, I'm asking again, are we sure we want to handle the raw (no start > > > code) and annex-b cases using the same pixel format? If we do, what's > > > the plan to address those potential alignment constraints? Should > > > we provide a way for userspace to define where the start-code ends so it > > > can align things properly (annex B can be extended with extra 00 > > > bytes at the beginning)? If we do that, that means userspace has to > > > know about those alignment constraints, or take something big enough. > > > Another option would be to use a bounce buffer when things are not > > > aligned properly. > > > > > > I'd really like to get feedback on those points before sending a v4. > > > > Mhh I don't really know what would be best for handling that. Either way, I > > don't see how more pixel formats would really help solve the issue, so I'm still > > in favor of one. > > > > Having a control that specifies an alignment constraint for the slice beginning > > could work (as long as we make it optional, although userspace should be > > required to abide by it when it is present). By making that, you put the burden on both sides of the stack: - the kernel side will have to deal with the unaligned cases (using a bounce buffer) - userspace apps/libs that want to avoid an extra copy will have to check this constraint and align things properly anyway Plus, the alignment thing won't work for AVC headers, so I think we should actually have a control to select the NAL header type rather than expose some alignment constraints (or have one pix fmt per NAL header type, but you don't seem to like the idea, so I'm trying to find something else :-)). And if we go for this option (control to select the NAL header type), I'm wondering why we're not making that NAL-header type selection mandatory from the start. We don't have to support all NAL headers at first (can be Annex B only), but, by making this control selection non-optional, we'll at least give a decent feedback to userspace (setting NAL header control fails because the selected NAL header type is not supported by the HW) instead of returning an error on the decoding operation (which, depending on how verbose the driver is, can be quite hard to figure out). > > > > I guess it's not such a high price to pay for a unified codec interface :) If by unified you mean exposing only one pixel format, then yes, it's unified. Doesn't make it easier to deal with from the userspace perspective IMHO. To sum-up, I'm fine keeping one pixel format, but I'm no longer sure not exposing the NAL header type is a good option. We've seen that providing alignment guarantees for HW expecting raw bitstream (without the start code) might become challenging at some point. So I'd opt for making this selection explicit. After all, it's just an extra control to set from userspace, and 2 extra switch-case: one to select the most appropriate NAL header type, and another one to fill the buffer with the appropriate header (if there's one).
On Fri, 26 Jul 2019 08:28:28 +0200 Boris Brezillon <boris.brezillon@collabora.com> wrote: > On Thu, 25 Jul 2019 23:39:11 -0300 > Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > On Thu, 2019-07-25 at 21:36 +0200, Paul Kocialkowski wrote: > > > Hi, > > > > > > On Thu 25 Jul 19, 08:42, Boris Brezillon wrote: > > > > On Fri, 5 Jul 2019 19:16:18 +0200 > > > > Boris Brezillon <boris.brezillon@collabora.com> wrote: > > > > > > > > > On Fri, 05 Jul 2019 13:40:03 -0300 > > > > > Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > > > > > > > > > Hi Boris, Paul, > > > > > > > > > > > > On Wed, 2019-07-03 at 14:28 +0200, Boris Brezillon wrote: > > > > > > > Looks like some stateless decoders expect slices to be prefixed with > > > > > > > ANNEX B start codes (they most likely do some kind of bitstream parsing > > > > > > > and/or need that to delimit slices when doing per frame decoding). > > > > > > > Since skipping those start codes for dummy stateless decoders (those > > > > > > > expecting all params to be passed through controls) should be pretty > > > > > > > easy, let's mandate that all slices be prepended with ANNEX B start > > > > > > > codes. > > > > > > > > > > > > > > If we ever need to support AVC headers, we can add a new menu control > > > > > > > to select the type of NAL header to use. > > > > > > > > > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > > > > > > > Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> > > > > > > > --- > > > > > > > Changes in v3: > > > > > > > * Add Paul's R-b > > > > > > > > > > > > > > Changes in v2: > > > > > > > * None > > > > > > > --- > > > > > > > Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 1 + > > > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > > > > diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > > > > > > index 7a1947f5be96..3ae1367806cf 100644 > > > > > > > --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > > > > > > +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > > > > > > @@ -1726,6 +1726,7 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type - > > > > > > > :ref:`h264`, section 7.4.3 "Slice Header Semantics". For further > > > > > > > documentation, refer to the above specification, unless there is > > > > > > > an explicit comment stating otherwise. > > > > > > > + All slices should be prepended with an ANNEX B start code. > > > > > > > > > > > > > > > > > > > Currently, the H264 slice V4L2_PIX_FMT_H264_SLICE_RAW, > > > > > > is specified to _not_ contain the ANNEX B start code. > > > > > > > > > > Yep, we should provably rename the format. > > > > > > > > Paul, are you okay with this rename? > > > > > > Sorry for the very long response time here, I've had a hard time getting back > > > into the context of all this. > > > > > > > s/V4L2_PIX_FMT_H264_SLICE_RAW/V4L2_PIX_FMT_H264_SLICE/ > > > > > > > > or > > > > > > > > s/V4L2_PIX_FMT_H264_SLICE_RAW/V4L2_PIX_FMT_H264_SLICE_ANNEXB/ > > > > > > I'd be in favor of the former (V4L2_PIX_FMT_H264_SLICE) and passing offsets > > > to the beinning and after the start code. That would be more flexible, but one > > > downside could be decoders that some decoders only take a specific start code. > > > > > > On the other hand I don't think that having one pixel format for each type of > > > start code would be very reasonable, so I'd rather see an offset for now and > > > perhaps a menu control later if needed to specify which types of start codes are > > > supported. > > > > > > > If I am reading the spec correctly, Annex B start code is specified to always > > be the 3-byte start code: 0x000001. > > > > The first NAL of a frame may have an additional 0x00, which effectively means > > the start code of the first NAL of a frame _can_ be 4-byte 0x00000001, > > in addition to the 3-byte 0x000001. > > That's not my understanding of the Annex B section (quoting the spec > for reference): > > " > The byte stream format consists of a sequence of byte stream NAL unit > syntax structures. Each byte stream NAL unit syntax structure contains > one start code prefix followed by one nal_unit( NumBytesInNALunit ) > syntax structure. It may (and under some circumstances, it shall) also > contain an additional zero_byte syntax element. It may also contain one > or more additional trailing_zero_8bits syntax elements. When it is the > first byte stream NAL unit in the bitstream, it may also contain one or > more additional leading_zero_8bits syntax elements. > " > > To me, it looks like the start code can always be 0x000001 or > 0x00000001. The first NAL can have extra leading 0x00 bytes, not the > following ones, *but*, all NALs can have an arbitrary number of > trailing 0x00 bytes. I guess stateless decoders unconditionally apply > the "skip_leading_0_bytes logic" described in chapter B.1.1 of the spec > to deal with these potential trailing 0x00 bytes. > Stateful decoders (those parsing Annex B NAL headers) might skip this > "skip leading 0x00 bytes" step for NAL > 0, but I suspect they just > always skip leading 0x00 bytes because > - it's simple enough > - they anyway need the logic for the first NAL > - that would require extra information about the NAL number which > stateless decoder usually don't have > > > > > In other words, there aren't multiple Annex B type of start codes, and only > > two options for the format of the slice: NAL units with or without a start code. > > There's also the AVC NAL header, and I'm pretty sure you can't use the > "add leading 0x00 bytes" trick to align things as you wish with that > one. > > > > > Therefore, I can't see any point in having this offset. > > Assuming the Annex B start codes can contain an arbitrary number of > leading 0x00 bytes, we can align things according to the codec > expectations. But, as I said below, that implies exposing those > alignment constraints. > > > > > > > I'd also to discuss some concerns Ezequiel and I have regarding this > > > > change. Some (most?) codec have alignment constraints on the buffer > > > > they pass to the HW. For HW that support Annex B parsing, that's no > > > > problem because the start of the buffer will be aligned on a page (I'm > > > > assuming page alignment should cover 99% of the alignment constraints). > > > > But HW that need to skip the start code will have to pass a non-aligned > > > > buffer (annex B start code is 3 byte long). > > > > Paul looked at the Cedrus driver and thinks it can be handled correctly > > > > thanks to the VE_H264_VLD_OFFSET field (which encodes an offset in bit), > > > > but I fear this might be a problem on other HW. > > > > > > > > So, I'm asking again, are we sure we want to handle the raw (no start > > > > code) and annex-b cases using the same pixel format? If we do, what's > > > > the plan to address those potential alignment constraints? Should > > > > we provide a way for userspace to define where the start-code ends so it > > > > can align things properly (annex B can be extended with extra 00 > > > > bytes at the beginning)? If we do that, that means userspace has to > > > > know about those alignment constraints, or take something big enough. > > > > Another option would be to use a bounce buffer when things are not > > > > aligned properly. > > > > > > > > I'd really like to get feedback on those points before sending a v4. > > > > > > Mhh I don't really know what would be best for handling that. Either way, I > > > don't see how more pixel formats would really help solve the issue, so I'm still > > > in favor of one. > > > > > > Having a control that specifies an alignment constraint for the slice beginning > > > could work (as long as we make it optional, although userspace should be > > > required to abide by it when it is present). > > By making that, you put the burden on both sides of the stack: > > - the kernel side will have to deal with the unaligned cases (using a > bounce buffer) > - userspace apps/libs that want to avoid an extra copy will have to > check this constraint and align things properly anyway I'd like to revise my statement. Ideally, the drivers should take care of such mis-alignments or unsupported NAL header types by copying/re-formatting the OUTPUT buffer so that existing apps work out of the box when the driver is added, which means we'll have to take care of that kernel-side anyway. Handling selection of the best encoding-mode/NAL-header-type in userspace is useful if one wants to improve perfs. > > Plus, the alignment thing won't work for AVC headers, so I think we > should actually have a control to select the NAL header type rather > than expose some alignment constraints (or have one pix fmt per NAL > header type, but you don't seem to like the idea, so I'm trying > to find something else :-)). > > And if we go for this option (control to select the NAL header type), > I'm wondering why we're not making that NAL-header type selection > mandatory from the start. We don't have to support all NAL headers at > first (can be Annex B only), but, by making this control selection > non-optional, we'll at least give a decent feedback to userspace > (setting NAL header control fails because the selected NAL header type > is not supported by the HW) instead of returning an error on the > decoding operation (which, depending on how verbose the driver is, can > be quite hard to figure out). > > > > > > > I guess it's not such a high price to pay for a unified codec interface :) > > If by unified you mean exposing only one pixel format, then yes, it's > unified. Doesn't make it easier to deal with from the userspace > perspective IMHO. > > To sum-up, I'm fine keeping one pixel format, but I'm no longer sure > not exposing the NAL header type is a good option. We've seen that > providing alignment guarantees for HW expecting raw bitstream (without > the start code) might become challenging at some point. So I'd opt for > making this selection explicit. After all, it's just an extra control > to set from userspace, and 2 extra switch-case: one to select the most > appropriate NAL header type, and another one to fill the buffer with > the appropriate header (if there's one).
On 7/26/19 9:30 AM, Boris Brezillon wrote: > On Fri, 26 Jul 2019 08:28:28 +0200 > Boris Brezillon <boris.brezillon@collabora.com> wrote: > >> On Thu, 25 Jul 2019 23:39:11 -0300 >> Ezequiel Garcia <ezequiel@collabora.com> wrote: >> >>> On Thu, 2019-07-25 at 21:36 +0200, Paul Kocialkowski wrote: >>>> Hi, >>>> >>>> On Thu 25 Jul 19, 08:42, Boris Brezillon wrote: >>>>> On Fri, 5 Jul 2019 19:16:18 +0200 >>>>> Boris Brezillon <boris.brezillon@collabora.com> wrote: >>>>> >>>>>> On Fri, 05 Jul 2019 13:40:03 -0300 >>>>>> Ezequiel Garcia <ezequiel@collabora.com> wrote: >>>>>> >>>>>>> Hi Boris, Paul, >>>>>>> >>>>>>> On Wed, 2019-07-03 at 14:28 +0200, Boris Brezillon wrote: >>>>>>>> Looks like some stateless decoders expect slices to be prefixed with >>>>>>>> ANNEX B start codes (they most likely do some kind of bitstream parsing >>>>>>>> and/or need that to delimit slices when doing per frame decoding). >>>>>>>> Since skipping those start codes for dummy stateless decoders (those >>>>>>>> expecting all params to be passed through controls) should be pretty >>>>>>>> easy, let's mandate that all slices be prepended with ANNEX B start >>>>>>>> codes. >>>>>>>> >>>>>>>> If we ever need to support AVC headers, we can add a new menu control >>>>>>>> to select the type of NAL header to use. >>>>>>>> >>>>>>>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> >>>>>>>> Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> >>>>>>>> --- >>>>>>>> Changes in v3: >>>>>>>> * Add Paul's R-b >>>>>>>> >>>>>>>> Changes in v2: >>>>>>>> * None >>>>>>>> --- >>>>>>>> Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 1 + >>>>>>>> 1 file changed, 1 insertion(+) >>>>>>>> >>>>>>>> diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst >>>>>>>> index 7a1947f5be96..3ae1367806cf 100644 >>>>>>>> --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst >>>>>>>> +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst >>>>>>>> @@ -1726,6 +1726,7 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type - >>>>>>>> :ref:`h264`, section 7.4.3 "Slice Header Semantics". For further >>>>>>>> documentation, refer to the above specification, unless there is >>>>>>>> an explicit comment stating otherwise. >>>>>>>> + All slices should be prepended with an ANNEX B start code. >>>>>>>> >>>>>>> >>>>>>> Currently, the H264 slice V4L2_PIX_FMT_H264_SLICE_RAW, >>>>>>> is specified to _not_ contain the ANNEX B start code. >>>>>> >>>>>> Yep, we should provably rename the format. >>>>> >>>>> Paul, are you okay with this rename? >>>> >>>> Sorry for the very long response time here, I've had a hard time getting back >>>> into the context of all this. >>>> >>>>> s/V4L2_PIX_FMT_H264_SLICE_RAW/V4L2_PIX_FMT_H264_SLICE/ >>>>> >>>>> or >>>>> >>>>> s/V4L2_PIX_FMT_H264_SLICE_RAW/V4L2_PIX_FMT_H264_SLICE_ANNEXB/ >>>> >>>> I'd be in favor of the former (V4L2_PIX_FMT_H264_SLICE) and passing offsets >>>> to the beinning and after the start code. That would be more flexible, but one >>>> downside could be decoders that some decoders only take a specific start code. >>>> >>>> On the other hand I don't think that having one pixel format for each type of >>>> start code would be very reasonable, so I'd rather see an offset for now and >>>> perhaps a menu control later if needed to specify which types of start codes are >>>> supported. >>>> >>> >>> If I am reading the spec correctly, Annex B start code is specified to always >>> be the 3-byte start code: 0x000001. >>> >>> The first NAL of a frame may have an additional 0x00, which effectively means >>> the start code of the first NAL of a frame _can_ be 4-byte 0x00000001, >>> in addition to the 3-byte 0x000001. >> >> That's not my understanding of the Annex B section (quoting the spec >> for reference): >> >> " >> The byte stream format consists of a sequence of byte stream NAL unit >> syntax structures. Each byte stream NAL unit syntax structure contains >> one start code prefix followed by one nal_unit( NumBytesInNALunit ) >> syntax structure. It may (and under some circumstances, it shall) also >> contain an additional zero_byte syntax element. It may also contain one >> or more additional trailing_zero_8bits syntax elements. When it is the >> first byte stream NAL unit in the bitstream, it may also contain one or >> more additional leading_zero_8bits syntax elements. >> " >> >> To me, it looks like the start code can always be 0x000001 or >> 0x00000001. The first NAL can have extra leading 0x00 bytes, not the >> following ones, *but*, all NALs can have an arbitrary number of >> trailing 0x00 bytes. I guess stateless decoders unconditionally apply >> the "skip_leading_0_bytes logic" described in chapter B.1.1 of the spec >> to deal with these potential trailing 0x00 bytes. >> Stateful decoders (those parsing Annex B NAL headers) might skip this >> "skip leading 0x00 bytes" step for NAL > 0, but I suspect they just >> always skip leading 0x00 bytes because >> - it's simple enough >> - they anyway need the logic for the first NAL >> - that would require extra information about the NAL number which >> stateless decoder usually don't have >> >>> >>> In other words, there aren't multiple Annex B type of start codes, and only >>> two options for the format of the slice: NAL units with or without a start code. >> >> There's also the AVC NAL header, and I'm pretty sure you can't use the >> "add leading 0x00 bytes" trick to align things as you wish with that >> one. >> >>> >>> Therefore, I can't see any point in having this offset. >> >> Assuming the Annex B start codes can contain an arbitrary number of >> leading 0x00 bytes, we can align things according to the codec >> expectations. But, as I said below, that implies exposing those >> alignment constraints. >> >>> >>>>> I'd also to discuss some concerns Ezequiel and I have regarding this >>>>> change. Some (most?) codec have alignment constraints on the buffer >>>>> they pass to the HW. For HW that support Annex B parsing, that's no >>>>> problem because the start of the buffer will be aligned on a page (I'm >>>>> assuming page alignment should cover 99% of the alignment constraints). >>>>> But HW that need to skip the start code will have to pass a non-aligned >>>>> buffer (annex B start code is 3 byte long). >>>>> Paul looked at the Cedrus driver and thinks it can be handled correctly >>>>> thanks to the VE_H264_VLD_OFFSET field (which encodes an offset in bit), >>>>> but I fear this might be a problem on other HW. >>>>> >>>>> So, I'm asking again, are we sure we want to handle the raw (no start >>>>> code) and annex-b cases using the same pixel format? If we do, what's >>>>> the plan to address those potential alignment constraints? Should >>>>> we provide a way for userspace to define where the start-code ends so it >>>>> can align things properly (annex B can be extended with extra 00 >>>>> bytes at the beginning)? If we do that, that means userspace has to >>>>> know about those alignment constraints, or take something big enough. >>>>> Another option would be to use a bounce buffer when things are not >>>>> aligned properly. >>>>> >>>>> I'd really like to get feedback on those points before sending a v4. >>>> >>>> Mhh I don't really know what would be best for handling that. Either way, I >>>> don't see how more pixel formats would really help solve the issue, so I'm still >>>> in favor of one. >>>> >>>> Having a control that specifies an alignment constraint for the slice beginning >>>> could work (as long as we make it optional, although userspace should be >>>> required to abide by it when it is present). >> >> By making that, you put the burden on both sides of the stack: >> >> - the kernel side will have to deal with the unaligned cases (using a >> bounce buffer) >> - userspace apps/libs that want to avoid an extra copy will have to >> check this constraint and align things properly anyway > > I'd like to revise my statement. Ideally, the drivers should take care > of such mis-alignments or unsupported NAL header types by > copying/re-formatting the OUTPUT buffer so that existing apps work > out of the box when the driver is added, which means we'll have to take > care of that kernel-side anyway. Handling selection of the best > encoding-mode/NAL-header-type in userspace is useful if one wants to > improve perfs. Just my 5 cents: You very much want to avoid the situation where drivers have to copy or reformat the OUTPUT buffer. That's asking for problems, not to mention that it is no longer zero-copy. > >> >> Plus, the alignment thing won't work for AVC headers, so I think we >> should actually have a control to select the NAL header type rather >> than expose some alignment constraints (or have one pix fmt per NAL >> header type, but you don't seem to like the idea, so I'm trying >> to find something else :-)). >> >> And if we go for this option (control to select the NAL header type), >> I'm wondering why we're not making that NAL-header type selection >> mandatory from the start. We don't have to support all NAL headers at >> first (can be Annex B only), but, by making this control selection >> non-optional, we'll at least give a decent feedback to userspace >> (setting NAL header control fails because the selected NAL header type >> is not supported by the HW) instead of returning an error on the >> decoding operation (which, depending on how verbose the driver is, can >> be quite hard to figure out). This sounds reasonable. This control should be mandatory, and it should be referred to from the H264/5 pixelformat definitions (see also https://patchwork.linuxtv.org/patch/57709/). Regards, Hans >> >>>> >>>> I guess it's not such a high price to pay for a unified codec interface :) >> >> If by unified you mean exposing only one pixel format, then yes, it's >> unified. Doesn't make it easier to deal with from the userspace >> perspective IMHO. >> >> To sum-up, I'm fine keeping one pixel format, but I'm no longer sure >> not exposing the NAL header type is a good option. We've seen that >> providing alignment guarantees for HW expecting raw bitstream (without >> the start code) might become challenging at some point. So I'd opt for >> making this selection explicit. After all, it's just an extra control >> to set from userspace, and 2 extra switch-case: one to select the most >> appropriate NAL header type, and another one to fill the buffer with >> the appropriate header (if there's one). >
Hi, On Fri 26 Jul 19, 10:53, Hans Verkuil wrote: > On 7/26/19 9:30 AM, Boris Brezillon wrote: > > On Fri, 26 Jul 2019 08:28:28 +0200 > > Boris Brezillon <boris.brezillon@collabora.com> wrote: > > > >> On Thu, 25 Jul 2019 23:39:11 -0300 > >> Ezequiel Garcia <ezequiel@collabora.com> wrote: > >> > >>> On Thu, 2019-07-25 at 21:36 +0200, Paul Kocialkowski wrote: > >>>> Having a control that specifies an alignment constraint for the slice beginning > >>>> could work (as long as we make it optional, although userspace should be > >>>> required to abide by it when it is present). > >> > >> By making that, you put the burden on both sides of the stack: > >> > >> - the kernel side will have to deal with the unaligned cases (using a > >> bounce buffer) > >> - userspace apps/libs that want to avoid an extra copy will have to > >> check this constraint and align things properly anyway > > > > I'd like to revise my statement. Ideally, the drivers should take care > > of such mis-alignments or unsupported NAL header types by > > copying/re-formatting the OUTPUT buffer so that existing apps work > > out of the box when the driver is added, which means we'll have to take > > care of that kernel-side anyway. Handling selection of the best > > encoding-mode/NAL-header-type in userspace is useful if one wants to > > improve perfs. > > Just my 5 cents: > > You very much want to avoid the situation where drivers have to copy or > reformat the OUTPUT buffer. That's asking for problems, not to mention > that it is no longer zero-copy. I definitely agree on that, since such constraints are likely to exist, we are certainly better off exposing them to userspace. I understand that it does add some complexity and asks for userspace code to be more complex, but let's be realistic: this is a complex topic with lots of hardware-specific details getting in the way. I don't think we can act as if things were simpler. My feeling is that we should keep trying to find "as elegant as possible" ways to expose constraints instead of putting strict and easy definitions for userspace that end up making drivers perform sub-optimally. Since the initial cedrus proposal, we have covered more ground to allow the API to fit the rockchip case, without conflicting with cedrus. We're now facing new constraints and issue and I really think we should keep trying to integrate them in the unified API. > >> Plus, the alignment thing won't work for AVC headers, so I think we > >> should actually have a control to select the NAL header type rather > >> than expose some alignment constraints (or have one pix fmt per NAL > >> header type, but you don't seem to like the idea, so I'm trying > >> to find something else :-)). > >> > >> And if we go for this option (control to select the NAL header type), > >> I'm wondering why we're not making that NAL-header type selection > >> mandatory from the start. We don't have to support all NAL headers at > >> first (can be Annex B only), but, by making this control selection > >> non-optional, we'll at least give a decent feedback to userspace > >> (setting NAL header control fails because the selected NAL header type > >> is not supported by the HW) instead of returning an error on the > >> decoding operation (which, depending on how verbose the driver is, can > >> be quite hard to figure out). > > This sounds reasonable. > > This control should be mandatory, and it should be referred to from > the H264/5 pixelformat definitions (see also https://patchwork.linuxtv.org/patch/57709/). I am growing confused about one thing: are we talking about selecting the type of *start code* (which can have a variable number of heading and trailing zeros depending on the situation) or about the *NAL header type*, which follows the start code? I like the idea of drivers providing what types of start codes they can support, but I don't really see how it helps regarding the alignment constraints and how it relates to the zero-padding. Cheers, Paul > Regards, > > Hans > > >> > >>>> > >>>> I guess it's not such a high price to pay for a unified codec interface :) > >> > >> If by unified you mean exposing only one pixel format, then yes, it's > >> unified. Doesn't make it easier to deal with from the userspace > >> perspective IMHO. > >> > >> To sum-up, I'm fine keeping one pixel format, but I'm no longer sure > >> not exposing the NAL header type is a good option. We've seen that > >> providing alignment guarantees for HW expecting raw bitstream (without > >> the start code) might become challenging at some point. So I'd opt for > >> making this selection explicit. After all, it's just an extra control > >> to set from userspace, and 2 extra switch-case: one to select the most > >> appropriate NAL header type, and another one to fill the buffer with > >> the appropriate header (if there's one). > > >
On Sat, 27 Jul 2019 11:27:43 +0200 Paul Kocialkowski <paul.kocialkowski@bootlin.com> wrote: > Hi, > > On Fri 26 Jul 19, 10:53, Hans Verkuil wrote: > > On 7/26/19 9:30 AM, Boris Brezillon wrote: > > > On Fri, 26 Jul 2019 08:28:28 +0200 > > > Boris Brezillon <boris.brezillon@collabora.com> wrote: > > > > > >> On Thu, 25 Jul 2019 23:39:11 -0300 > > >> Ezequiel Garcia <ezequiel@collabora.com> wrote: > > >> > > >>> On Thu, 2019-07-25 at 21:36 +0200, Paul Kocialkowski wrote: > > >>>> Having a control that specifies an alignment constraint for the slice beginning > > >>>> could work (as long as we make it optional, although userspace should be > > >>>> required to abide by it when it is present). > > >> > > >> By making that, you put the burden on both sides of the stack: > > >> > > >> - the kernel side will have to deal with the unaligned cases (using a > > >> bounce buffer) > > >> - userspace apps/libs that want to avoid an extra copy will have to > > >> check this constraint and align things properly anyway > > > > > > I'd like to revise my statement. Ideally, the drivers should take care > > > of such mis-alignments or unsupported NAL header types by > > > copying/re-formatting the OUTPUT buffer so that existing apps work > > > out of the box when the driver is added, which means we'll have to take > > > care of that kernel-side anyway. Handling selection of the best > > > encoding-mode/NAL-header-type in userspace is useful if one wants to > > > improve perfs. > > > > Just my 5 cents: > > > > You very much want to avoid the situation where drivers have to copy or > > reformat the OUTPUT buffer. That's asking for problems, not to mention > > that it is no longer zero-copy. > > I definitely agree on that, since such constraints are likely to exist, we are > certainly better off exposing them to userspace. > > I understand that it does add some complexity and asks for userspace code to be > more complex, but let's be realistic: this is a complex topic with lots of > hardware-specific details getting in the way. I don't think we can act as if > things were simpler. > > My feeling is that we should keep trying to find "as elegant as possible" ways > to expose constraints instead of putting strict and easy definitions for > userspace that end up making drivers perform sub-optimally. > > Since the initial cedrus proposal, we have covered more ground to allow the > API to fit the rockchip case, without conflicting with cedrus. We're now facing > new constraints and issue and I really think we should keep trying to integrate > them in the unified API. > > > >> Plus, the alignment thing won't work for AVC headers, so I think we > > >> should actually have a control to select the NAL header type rather > > >> than expose some alignment constraints (or have one pix fmt per NAL > > >> header type, but you don't seem to like the idea, so I'm trying > > >> to find something else :-)). > > >> > > >> And if we go for this option (control to select the NAL header type), > > >> I'm wondering why we're not making that NAL-header type selection > > >> mandatory from the start. We don't have to support all NAL headers at > > >> first (can be Annex B only), but, by making this control selection > > >> non-optional, we'll at least give a decent feedback to userspace > > >> (setting NAL header control fails because the selected NAL header type > > >> is not supported by the HW) instead of returning an error on the > > >> decoding operation (which, depending on how verbose the driver is, can > > >> be quite hard to figure out). > > > > This sounds reasonable. > > > > This control should be mandatory, and it should be referred to from > > the H264/5 pixelformat definitions (see also https://patchwork.linuxtv.org/patch/57709/). > > I am growing confused about one thing: are we talking about selecting > the type of *start code* (which can have a variable number of heading and > trailing zeros depending on the situation) or about the *NAL header type*, which > follows the start code? We're talking about start codes, but Nicolas called them nal_header in this email [1], so I thought it was the appropriate naming. > > I like the idea of drivers providing what types of start codes they can support, > but I don't really see how it helps regarding the alignment constraints and how > it relates to the zero-padding. It does help with alignment constraints because buffers allocated by the driver are usually matching the HW alignment constraints and by passing the type of NAL header (or start code if you prefer) we now guarantee that the raw bitstream (when in NO_NAL_HEADER is selected) is placed at the beginning of the buffer. Doesn't solve the case of imported buffers, but that problem is orthogonal I think (it's a problem we already have right now, and would indeed require some way to expose HW alignment constraints). Not sure what the zero padding issue is. If you know the type of start code, you don't have add extra 0 at the beginning to meet the alignment constraints. If you're talking about padding bytes added at the end of the bitstream (there's such a constraint on the rkvdec block), I think that's something driver specific and should be handled by the driver. [1]https://www.mail-archive.com/linux-media@vger.kernel.org/msg146836.html
On Fri, 2019-07-26 at 09:30 +0200, Boris Brezillon wrote: > On Fri, 26 Jul 2019 08:28:28 +0200 > Boris Brezillon <boris.brezillon@collabora.com> wrote: > > > On Thu, 25 Jul 2019 23:39:11 -0300 > > Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > > > On Thu, 2019-07-25 at 21:36 +0200, Paul Kocialkowski wrote: > > > > Hi, > > > > > > > > On Thu 25 Jul 19, 08:42, Boris Brezillon wrote: > > > > > On Fri, 5 Jul 2019 19:16:18 +0200 > > > > > Boris Brezillon <boris.brezillon@collabora.com> wrote: > > > > > > > > > > > On Fri, 05 Jul 2019 13:40:03 -0300 > > > > > > Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > > > > > > > > > > > Hi Boris, Paul, > > > > > > > > > > > > > > On Wed, 2019-07-03 at 14:28 +0200, Boris Brezillon wrote: > > > > > > > > Looks like some stateless decoders expect slices to be prefixed with > > > > > > > > ANNEX B start codes (they most likely do some kind of bitstream parsing > > > > > > > > and/or need that to delimit slices when doing per frame decoding). > > > > > > > > Since skipping those start codes for dummy stateless decoders (those > > > > > > > > expecting all params to be passed through controls) should be pretty > > > > > > > > easy, let's mandate that all slices be prepended with ANNEX B start > > > > > > > > codes. > > > > > > > > > > > > > > > > If we ever need to support AVC headers, we can add a new menu control > > > > > > > > to select the type of NAL header to use. > > > > > > > > > > > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > > > > > > > > Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> > > > > > > > > --- > > > > > > > > Changes in v3: > > > > > > > > * Add Paul's R-b > > > > > > > > > > > > > > > > Changes in v2: > > > > > > > > * None > > > > > > > > --- > > > > > > > > Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 1 + > > > > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > > > > > > diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > > > > > > > index 7a1947f5be96..3ae1367806cf 100644 > > > > > > > > --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > > > > > > > +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > > > > > > > @@ -1726,6 +1726,7 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type - > > > > > > > > :ref:`h264`, section 7.4.3 "Slice Header Semantics". For further > > > > > > > > documentation, refer to the above specification, unless there is > > > > > > > > an explicit comment stating otherwise. > > > > > > > > + All slices should be prepended with an ANNEX B start code. > > > > > > > > > > > > > > > > > > > > > > Currently, the H264 slice V4L2_PIX_FMT_H264_SLICE_RAW, > > > > > > > is specified to _not_ contain the ANNEX B start code. > > > > > > > > > > > > Yep, we should provably rename the format. > > > > > > > > > > Paul, are you okay with this rename? > > > > > > > > Sorry for the very long response time here, I've had a hard time getting back > > > > into the context of all this. > > > > > > > > > s/V4L2_PIX_FMT_H264_SLICE_RAW/V4L2_PIX_FMT_H264_SLICE/ > > > > > > > > > > or > > > > > > > > > > s/V4L2_PIX_FMT_H264_SLICE_RAW/V4L2_PIX_FMT_H264_SLICE_ANNEXB/ > > > > > > > > I'd be in favor of the former (V4L2_PIX_FMT_H264_SLICE) and passing offsets > > > > to the beinning and after the start code. That would be more flexible, but one > > > > downside could be decoders that some decoders only take a specific start code. > > > > > > > > On the other hand I don't think that having one pixel format for each type of > > > > start code would be very reasonable, so I'd rather see an offset for now and > > > > perhaps a menu control later if needed to specify which types of start codes are > > > > supported. > > > > > > > > > > If I am reading the spec correctly, Annex B start code is specified to always > > > be the 3-byte start code: 0x000001. > > > > > > The first NAL of a frame may have an additional 0x00, which effectively means > > > the start code of the first NAL of a frame _can_ be 4-byte 0x00000001, > > > in addition to the 3-byte 0x000001. > > > > That's not my understanding of the Annex B section (quoting the spec > > for reference): > > > > " > > The byte stream format consists of a sequence of byte stream NAL unit > > syntax structures. Each byte stream NAL unit syntax structure contains > > one start code prefix followed by one nal_unit( NumBytesInNALunit ) > > syntax structure. It may (and under some circumstances, it shall) also > > contain an additional zero_byte syntax element. It may also contain one > > or more additional trailing_zero_8bits syntax elements. When it is the > > first byte stream NAL unit in the bitstream, it may also contain one or > > more additional leading_zero_8bits syntax elements. > > " > > Right. I wonder what the "may or shall" part is really specifying. However, note that the table B.1.1 and its comments B.1.2 is might be interpreted differently. To me, there's a difference between the different syntax elements (zero-bytes elements vs. the start code prefix element). This is what it says about the zero_byte syntax element: """ zero_byte is a single byte equal to 0x00. When any of the following conditions are fulfilled, the zero_byte syntax element shall be present. – the nal_unit_type within the nal_unit( ) is equal to 7 (sequence parameter set) or 8 (picture parameter set) – the byte stream NAL unit syntax structure contains the first NAL unit of an access unit in decoding order, as specified by subclause 7.4.1.2.3. """ We are not dealing with SPS or PPS here, but we are discussing multislice content, so IIUC this syntax element would be part of our bitstream pixfmt. And this is what it says about the start code prefix: """ start_code_prefix_one_3bytes is a fixed-value sequence of 3 bytes equal to 0x000001. This syntax element is called a start code prefix. """ These elements are used in such a way that it might seem you have two start codes options 3-byte or 4-byte, though. > > To me, it looks like the start code can always be 0x000001 or > > 0x00000001. The first NAL can have extra leading 0x00 bytes, not the > > following ones, *but*, all NALs can have an arbitrary number of > > trailing 0x00 bytes. I guess stateless decoders unconditionally apply > > the "skip_leading_0_bytes logic" described in chapter B.1.1 of the spec > > to deal with these potential trailing 0x00 bytes. > > Stateful decoders (those parsing Annex B NAL headers) might skip this > > "skip leading 0x00 bytes" step for NAL > 0, but I suspect they just > > always skip leading 0x00 bytes because > > - it's simple enough > > - they anyway need the logic for the first NAL > > - that would require extra information about the NAL number which > > stateless decoder usually don't have > > > > > In other words, there aren't multiple Annex B type of start codes, and only > > > two options for the format of the slice: NAL units with or without a start code. > > > > There's also the AVC NAL header, and I'm pretty sure you can't use the > > "add leading 0x00 bytes" trick to align things as you wish with that > > one. > > > > > Therefore, I can't see any point in having this offset. > > > > Assuming the Annex B start codes can contain an arbitrary number of > > leading 0x00 bytes, we can align things according to the codec > > expectations. But, as I said below, that implies exposing those > > alignment constraints. > > > > > > > > > > I'd also to discuss some concerns Ezequiel and I have regarding this > > > > > change. Some (most?) codec have alignment constraints on the buffer > > > > > they pass to the HW. For HW that support Annex B parsing, that's no > > > > > problem because the start of the buffer will be aligned on a page (I'm > > > > > assuming page alignment should cover 99% of the alignment constraints). > > > > > But HW that need to skip the start code will have to pass a non-aligned > > > > > buffer (annex B start code is 3 byte long). > > > > > Paul looked at the Cedrus driver and thinks it can be handled correctly > > > > > thanks to the VE_H264_VLD_OFFSET field (which encodes an offset in bit), > > > > > but I fear this might be a problem on other HW. > > > > > > > > > > So, I'm asking again, are we sure we want to handle the raw (no start > > > > > code) and annex-b cases using the same pixel format? If we do, what's > > > > > the plan to address those potential alignment constraints? Should > > > > > we provide a way for userspace to define where the start-code ends so it > > > > > can align things properly (annex B can be extended with extra 00 > > > > > bytes at the beginning)? If we do that, that means userspace has to > > > > > know about those alignment constraints, or take something big enough. > > > > > Another option would be to use a bounce buffer when things are not > > > > > aligned properly. > > > > > > > > > > I'd really like to get feedback on those points before sending a v4. > > > > > > > > Mhh I don't really know what would be best for handling that. Either way, I > > > > don't see how more pixel formats would really help solve the issue, so I'm still > > > > in favor of one. > > > > > > > > Having a control that specifies an alignment constraint for the slice beginning > > > > could work (as long as we make it optional, although userspace should be > > > > required to abide by it when it is present). > > > > By making that, you put the burden on both sides of the stack: > > > > - the kernel side will have to deal with the unaligned cases (using a > > bounce buffer) > > - userspace apps/libs that want to avoid an extra copy will have to > > check this constraint and align things properly anyway > > I'd like to revise my statement. Ideally, the drivers should take care > of such mis-alignments or unsupported NAL header types by > copying/re-formatting the OUTPUT buffer so that existing apps work > out of the box when the driver is added, which means we'll have to take > care of that kernel-side anyway. Handling selection of the best > encoding-mode/NAL-header-type in userspace is useful if one wants to > improve perfs. > > > Plus, the alignment thing won't work for AVC headers, so I think we > > should actually have a control to select the NAL header type rather > > than expose some alignment constraints (or have one pix fmt per NAL > > header type, but you don't seem to like the idea, so I'm trying > > to find something else :-)). > > > > And if we go for this option (control to select the NAL header type), > > I'm wondering why we're not making that NAL-header type selection > > mandatory from the start. We don't have to support all NAL headers at > > first (can be Annex B only), but, by making this control selection > > non-optional, we'll at least give a decent feedback to userspace > > (setting NAL header control fails because the selected NAL header type > > is not supported by the HW) instead of returning an error on the > > decoding operation (which, depending on how verbose the driver is, can > > be quite hard to figure out). > > > > > > I guess it's not such a high price to pay for a unified codec interface :) > > > > If by unified you mean exposing only one pixel format, then yes, it's > > unified. Doesn't make it easier to deal with from the userspace > > perspective IMHO. > > > > To sum-up, I'm fine keeping one pixel format, but I'm no longer sure > > not exposing the NAL header type is a good option. We've seen that > > providing alignment guarantees for HW expecting raw bitstream (without > > the start code) might become challenging at some point. So I'd opt for > > making this selection explicit. After all, it's just an extra control > > to set from userspace, and 2 extra switch-case: one to select the most > > appropriate NAL header type, and another one to fill the buffer with > > the appropriate header (if there's one). I must admit I'm confused by what you mean about NAL header type, I thought we weren't trying to support AVC, and only the Annex B bitstream format. That said, I don't see the interface getting any simpler with a unified pixfmt, given the menu control to expose frame or slice decoding. Proper applications need to support both modes anyway, where in the latter it'll have to parse the bitstream to extract the slices. What's so bad about just supporting an extra pixel format, where the slices are stripped of its start codes and zero-byte elements? And how come this is any more complex than exposing alignment constraints, so that applications can artificially add leading_zero_8bits or trailing_zero_8bits elements to comply with a driver dma alignment. To be honest, the more I think about it, the more this option sounds just horrible :-) To me, it's far simpler to just expose what the devices support. If a driver will expect to parse the bitstream, and accepts multi-slice content, we expose that as a bitstream pixfmt. And if another driver expects no-start-code slices, then we have another pixfmt. Regards, Eze
On Sat, 27 Jul 2019 09:52:24 -0300 Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > > > > That's not my understanding of the Annex B section (quoting the spec > > > for reference): > > > > > > " > > > The byte stream format consists of a sequence of byte stream NAL unit > > > syntax structures. Each byte stream NAL unit syntax structure contains > > > one start code prefix followed by one nal_unit( NumBytesInNALunit ) > > > syntax structure. It may (and under some circumstances, it shall) also > > > contain an additional zero_byte syntax element. It may also contain one > > > or more additional trailing_zero_8bits syntax elements. When it is the > > > first byte stream NAL unit in the bitstream, it may also contain one or > > > more additional leading_zero_8bits syntax elements. > > > " > > > > > Right. I wonder what the "may or shall" part is really specifying. > > However, note that the table B.1.1 and its comments B.1.2 is might > be interpreted differently. To me, there's a difference between the different > syntax elements (zero-bytes elements vs. the start code prefix element). > > This is what it says about the zero_byte syntax element: > > """ > zero_byte is a single byte equal to 0x00. > When any of the following conditions are fulfilled, the zero_byte syntax element shall be present. > – the nal_unit_type within the nal_unit( ) is equal to 7 (sequence parameter set) or 8 (picture parameter set) > – the byte stream NAL unit syntax structure contains the first NAL unit of an access unit in decoding order, as > specified by subclause 7.4.1.2.3. > """ > > We are not dealing with SPS or PPS here, but we are discussing multislice content, > so IIUC this syntax element would be part of our bitstream pixfmt. > > And this is what it says about the start code prefix: > > """ > start_code_prefix_one_3bytes is a fixed-value sequence of 3 bytes equal to 0x000001. This syntax element is called a > start code prefix. > """ > > These elements are used in such a way that it might seem > you have two start codes options 3-byte or 4-byte, though. This is correct, but I was actually referring to: " It may also contain one or more additional trailing_zero_8bits syntax elements. When it is the first byte stream NAL unit in the bitstream, it may also contain one or more additional leading_zero_8bits syntax elements. " which would allow userspace to put additional zeros at the beginning in order to fulfill the HW alignment constraints. I'm not saying this is a good solution, just saying it can be done. > > > > > I guess it's not such a high price to pay for a unified codec interface :) > > > > > > If by unified you mean exposing only one pixel format, then yes, it's > > > unified. Doesn't make it easier to deal with from the userspace > > > perspective IMHO. > > > > > > To sum-up, I'm fine keeping one pixel format, but I'm no longer sure > > > not exposing the NAL header type is a good option. We've seen that > > > providing alignment guarantees for HW expecting raw bitstream (without > > > the start code) might become challenging at some point. So I'd opt for > > > making this selection explicit. After all, it's just an extra control > > > to set from userspace, and 2 extra switch-case: one to select the most > > > appropriate NAL header type, and another one to fill the buffer with > > > the appropriate header (if there's one). > > I must admit I'm confused by what you mean about NAL header type, I thought > we weren't trying to support AVC, and only the Annex B bitstream format. I'm not trying to support AVC headers, but designing something that allows us to extend the interface and support that case (if we ever need to) is a good thing IMO. > > That said, I don't see the interface getting any simpler with a unified > pixfmt, given the menu control to expose frame or slice decoding. I agree. I think it's pretty much the same complexity anyway ('additional ctrl to set the start-code/header/preamble type' vs 'additional pixfmt'), so it's mostly a matter of taste. > > Proper applications need to support both modes anyway, where in the latter > it'll have to parse the bitstream to extract the slices. Hm, the current uAPI forces us to pass slice offsets, which means we have to parse the bitstream anyway. I think we should keep it like that because I don't think we can assume the HW is smart enough to detect slice boundaries. > What's so bad > about just supporting an extra pixel format, where the slices are stripped > of its start codes and zero-byte elements? I'm not opposed to that solution, but Paul is, so I'm just trying to find something that makes everyone happy, hence the "NAL header type" (or "start code type"/"preamble type" if you prefer, though it's not really a start code for AVC) proposal. > > And how come this is any more complex than exposing alignment constraints, > so that applications can artificially add leading_zero_8bits or trailing_zero_8bits > elements to comply with a driver dma alignment. To be honest, the more I think > about it, the more this option sounds just horrible :-) Also think this option is more complicated and less future proof (AFAICT, AVC headers/start-codes can't be extended like Annex B ones). > > To me, it's far simpler to just expose what the devices support. If a driver > will expect to parse the bitstream, and accepts multi-slice content, > we expose that as a bitstream pixfmt. Those 2 problems are orthogonal. You could have HW dealing with multi-slice content while still requiring things to be passed without Annex B start codes. The H264 pixfmt is really just about NAL headers: No NAL headers vs Annex B headers vs AVC headers. > And if another driver expects > no-start-code slices, then we have another pixfmt. Yep, but I don't want to argue endlessly on this, and I'd be fine with the "NAL header/preamble/start-code/whatever type ctrl" too.
Hi, On Sat 27 Jul 19, 11:46, Boris Brezillon wrote: > On Sat, 27 Jul 2019 11:27:43 +0200 > Paul Kocialkowski <paul.kocialkowski@bootlin.com> wrote: > > > Hi, > > > > On Fri 26 Jul 19, 10:53, Hans Verkuil wrote: > > > On 7/26/19 9:30 AM, Boris Brezillon wrote: > > > > On Fri, 26 Jul 2019 08:28:28 +0200 > > > > Boris Brezillon <boris.brezillon@collabora.com> wrote: > > > > > > > >> On Thu, 25 Jul 2019 23:39:11 -0300 > > > >> Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > >> > > > >>> On Thu, 2019-07-25 at 21:36 +0200, Paul Kocialkowski wrote: > > > >>>> Having a control that specifies an alignment constraint for the slice beginning > > > >>>> could work (as long as we make it optional, although userspace should be > > > >>>> required to abide by it when it is present). > > > >> > > > >> By making that, you put the burden on both sides of the stack: > > > >> > > > >> - the kernel side will have to deal with the unaligned cases (using a > > > >> bounce buffer) > > > >> - userspace apps/libs that want to avoid an extra copy will have to > > > >> check this constraint and align things properly anyway > > > > > > > > I'd like to revise my statement. Ideally, the drivers should take care > > > > of such mis-alignments or unsupported NAL header types by > > > > copying/re-formatting the OUTPUT buffer so that existing apps work > > > > out of the box when the driver is added, which means we'll have to take > > > > care of that kernel-side anyway. Handling selection of the best > > > > encoding-mode/NAL-header-type in userspace is useful if one wants to > > > > improve perfs. > > > > > > Just my 5 cents: > > > > > > You very much want to avoid the situation where drivers have to copy or > > > reformat the OUTPUT buffer. That's asking for problems, not to mention > > > that it is no longer zero-copy. > > > > I definitely agree on that, since such constraints are likely to exist, we are > > certainly better off exposing them to userspace. > > > > I understand that it does add some complexity and asks for userspace code to be > > more complex, but let's be realistic: this is a complex topic with lots of > > hardware-specific details getting in the way. I don't think we can act as if > > things were simpler. > > > > My feeling is that we should keep trying to find "as elegant as possible" ways > > to expose constraints instead of putting strict and easy definitions for > > userspace that end up making drivers perform sub-optimally. > > > > Since the initial cedrus proposal, we have covered more ground to allow the > > API to fit the rockchip case, without conflicting with cedrus. We're now facing > > new constraints and issue and I really think we should keep trying to integrate > > them in the unified API. > > > > > >> Plus, the alignment thing won't work for AVC headers, so I think we > > > >> should actually have a control to select the NAL header type rather > > > >> than expose some alignment constraints (or have one pix fmt per NAL > > > >> header type, but you don't seem to like the idea, so I'm trying > > > >> to find something else :-)). > > > >> > > > >> And if we go for this option (control to select the NAL header type), > > > >> I'm wondering why we're not making that NAL-header type selection > > > >> mandatory from the start. We don't have to support all NAL headers at > > > >> first (can be Annex B only), but, by making this control selection > > > >> non-optional, we'll at least give a decent feedback to userspace > > > >> (setting NAL header control fails because the selected NAL header type > > > >> is not supported by the HW) instead of returning an error on the > > > >> decoding operation (which, depending on how verbose the driver is, can > > > >> be quite hard to figure out). > > > > > > This sounds reasonable. > > > > > > This control should be mandatory, and it should be referred to from > > > the H264/5 pixelformat definitions (see also https://patchwork.linuxtv.org/patch/57709/). > > > > I am growing confused about one thing: are we talking about selecting > > the type of *start code* (which can have a variable number of heading and > > trailing zeros depending on the situation) or about the *NAL header type*, which > > follows the start code? > > We're talking about start codes, but Nicolas called them nal_header in > this email [1], so I thought it was the appropriate naming. Okay, the representation I had in mind was: [zeros][start code][nal unit header][zeros][nal unit data] but maybe I'm mixing things up on my side. > > I like the idea of drivers providing what types of start codes they can support, > > but I don't really see how it helps regarding the alignment constraints and how > > it relates to the zero-padding. > > It does help with alignment constraints because buffers allocated by > the driver are usually matching the HW alignment constraints and by > passing the type of NAL header (or start code if you prefer) we now > guarantee that the raw bitstream (when in NO_NAL_HEADER is selected) is > placed at the beginning of the buffer. I thought the issue at hand was that we needed the nal unit header to start at an aligned address while still needing a start code of 3 bytes. I feel like I'm missing something in my understanding of the issue here. > Doesn't solve the case of > imported buffers, but that problem is orthogonal I think (it's a > problem we already have right now, and would indeed require some way > to expose HW alignment constraints). If we expose the constraints explicitly, then we can honestly say that it's up to user-space to abide by them so there should be no particular difference between imported and allocated buffers. Userspace just has to know what it's doing. And drivers chould refuse imports that don't follow the reported constraints (or are outside the pool dedicated to the VPU). > Not sure what the zero padding issue is. If you know the type of start > code, you don't have add extra 0 at the beginning to meet the alignment > constraints. If you're talking about padding bytes added at the end of > the bitstream (there's such a constraint on the rkvdec block), I think > that's something driver specific and should be handled by the driver. My point would rather be that it is (as far as I understood) valid regarding the H.264 spec to have extra zeros added by userspace (whatever the reason), so even if the type of start code is reported, it doesn't imply that we know the length of the heading zeros and start code, so the issue remains. Cheers, Paul > [1]https://www.mail-archive.com/linux-media@vger.kernel.org/msg146836.html
Hi, On Sat 27 Jul 19, 15:49, Boris Brezillon wrote: > On Sat, 27 Jul 2019 09:52:24 -0300 > Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > > > > > > > > That's not my understanding of the Annex B section (quoting the spec > > > > for reference): > > > > > > > > " > > > > The byte stream format consists of a sequence of byte stream NAL unit > > > > syntax structures. Each byte stream NAL unit syntax structure contains > > > > one start code prefix followed by one nal_unit( NumBytesInNALunit ) > > > > syntax structure. It may (and under some circumstances, it shall) also > > > > contain an additional zero_byte syntax element. It may also contain one > > > > or more additional trailing_zero_8bits syntax elements. When it is the > > > > first byte stream NAL unit in the bitstream, it may also contain one or > > > > more additional leading_zero_8bits syntax elements. > > > > " > > > > > > > > Right. I wonder what the "may or shall" part is really specifying. > > > > However, note that the table B.1.1 and its comments B.1.2 is might > > be interpreted differently. To me, there's a difference between the different > > syntax elements (zero-bytes elements vs. the start code prefix element). > > > > This is what it says about the zero_byte syntax element: > > > > """ > > zero_byte is a single byte equal to 0x00. > > When any of the following conditions are fulfilled, the zero_byte syntax element shall be present. > > – the nal_unit_type within the nal_unit( ) is equal to 7 (sequence parameter set) or 8 (picture parameter set) > > – the byte stream NAL unit syntax structure contains the first NAL unit of an access unit in decoding order, as > > specified by subclause 7.4.1.2.3. > > """ > > > > We are not dealing with SPS or PPS here, but we are discussing multislice content, > > so IIUC this syntax element would be part of our bitstream pixfmt. > > > > And this is what it says about the start code prefix: > > > > """ > > start_code_prefix_one_3bytes is a fixed-value sequence of 3 bytes equal to 0x000001. This syntax element is called a > > start code prefix. > > """ > > > > These elements are used in such a way that it might seem > > you have two start codes options 3-byte or 4-byte, though. > > This is correct, but I was actually referring to: > > " > It may also contain one or more additional trailing_zero_8bits syntax > elements. When it is the first byte stream NAL unit in the > bitstream, it may also contain one or more additional > leading_zero_8bits syntax elements. > " > > which would allow userspace to put additional zeros at the beginning > in order to fulfill the HW alignment constraints. I'm not saying this is > a good solution, just saying it can be done. > > > > > > > > I guess it's not such a high price to pay for a unified codec interface :) > > > > > > > > If by unified you mean exposing only one pixel format, then yes, it's > > > > unified. Doesn't make it easier to deal with from the userspace > > > > perspective IMHO. > > > > > > > > To sum-up, I'm fine keeping one pixel format, but I'm no longer sure > > > > not exposing the NAL header type is a good option. We've seen that > > > > providing alignment guarantees for HW expecting raw bitstream (without > > > > the start code) might become challenging at some point. So I'd opt for > > > > making this selection explicit. After all, it's just an extra control > > > > to set from userspace, and 2 extra switch-case: one to select the most > > > > appropriate NAL header type, and another one to fill the buffer with > > > > the appropriate header (if there's one). > > > > I must admit I'm confused by what you mean about NAL header type, I thought > > we weren't trying to support AVC, and only the Annex B bitstream format. > > I'm not trying to support AVC headers, but designing something that > allows us to extend the interface and support that case (if we ever need > to) is a good thing IMO. > > > > > That said, I don't see the interface getting any simpler with a unified > > pixfmt, given the menu control to expose frame or slice decoding. > > I agree. I think it's pretty much the same complexity anyway > ('additional ctrl to set the start-code/header/preamble type' vs > 'additional pixfmt'), so it's mostly a matter of taste. The reason why I am strongly in favor of a single format is that the combinatory possibilities of all the little things that can be different would eventually lead us to having one pixel format per hardware implementation and I believe it makes no sense. I really don't find it to be an elegant or scalable way to expose the differences between decoder implementations. So the approach I currently like best is to have as much information as possible passed to the driver, both as offsets to each relevant part of the data in codec controls and through specific controls (or maybe pixfmt flags could be relevant in some cases) that reflect hardware-specific constraints. Apparently this is also the approach on stateful codecs (e.g. the patch exposing whether boundaries can be detected by the hardware or not, without adding a new pixfmt for each case). > > > > Proper applications need to support both modes anyway, where in the latter > > it'll have to parse the bitstream to extract the slices. > > Hm, the current uAPI forces us to pass slice offsets, which means we > have to parse the bitstream anyway. I think we should keep it like that > because I don't think we can assume the HW is smart enough to detect > slice boundaries. Agreed. Cheers, Paul > > What's so bad > > about just supporting an extra pixel format, where the slices are stripped > > of its start codes and zero-byte elements? > > I'm not opposed to that solution, but Paul is, so I'm just trying to > find something that makes everyone happy, hence the "NAL header > type" (or "start code type"/"preamble type" if you prefer, though > it's not really a start code for AVC) proposal. > > > > > And how come this is any more complex than exposing alignment constraints, > > so that applications can artificially add leading_zero_8bits or trailing_zero_8bits > > elements to comply with a driver dma alignment. To be honest, the more I think > > about it, the more this option sounds just horrible :-) > > Also think this option is more complicated and less future proof > (AFAICT, AVC headers/start-codes can't be extended like Annex B ones). > > > > > To me, it's far simpler to just expose what the devices support. If a driver > > will expect to parse the bitstream, and accepts multi-slice content, > > we expose that as a bitstream pixfmt. > > Those 2 problems are orthogonal. You could have HW dealing with > multi-slice content while still requiring things to be passed without > Annex B start codes. The H264 pixfmt is really just about NAL headers: > No NAL headers vs Annex B headers vs AVC headers. > > > And if another driver expects > > no-start-code slices, then we have another pixfmt. > > Yep, but I don't want to argue endlessly on this, and I'd be fine with > the "NAL header/preamble/start-code/whatever type ctrl" too.
On Mon, 29 Jul 2019 15:25:21 +0200 Paul Kocialkowski <paul.kocialkowski@bootlin.com> wrote: > Hi, > > On Sat 27 Jul 19, 11:46, Boris Brezillon wrote: > > On Sat, 27 Jul 2019 11:27:43 +0200 > > Paul Kocialkowski <paul.kocialkowski@bootlin.com> wrote: > > > > > Hi, > > > > > > On Fri 26 Jul 19, 10:53, Hans Verkuil wrote: > > > > On 7/26/19 9:30 AM, Boris Brezillon wrote: > > > > > On Fri, 26 Jul 2019 08:28:28 +0200 > > > > > Boris Brezillon <boris.brezillon@collabora.com> wrote: > > > > > > > > > >> On Thu, 25 Jul 2019 23:39:11 -0300 > > > > >> Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > > >> > > > > >>> On Thu, 2019-07-25 at 21:36 +0200, Paul Kocialkowski wrote: > > > > >>>> Having a control that specifies an alignment constraint for the slice beginning > > > > >>>> could work (as long as we make it optional, although userspace should be > > > > >>>> required to abide by it when it is present). > > > > >> > > > > >> By making that, you put the burden on both sides of the stack: > > > > >> > > > > >> - the kernel side will have to deal with the unaligned cases (using a > > > > >> bounce buffer) > > > > >> - userspace apps/libs that want to avoid an extra copy will have to > > > > >> check this constraint and align things properly anyway > > > > > > > > > > I'd like to revise my statement. Ideally, the drivers should take care > > > > > of such mis-alignments or unsupported NAL header types by > > > > > copying/re-formatting the OUTPUT buffer so that existing apps work > > > > > out of the box when the driver is added, which means we'll have to take > > > > > care of that kernel-side anyway. Handling selection of the best > > > > > encoding-mode/NAL-header-type in userspace is useful if one wants to > > > > > improve perfs. > > > > > > > > Just my 5 cents: > > > > > > > > You very much want to avoid the situation where drivers have to copy or > > > > reformat the OUTPUT buffer. That's asking for problems, not to mention > > > > that it is no longer zero-copy. > > > > > > I definitely agree on that, since such constraints are likely to exist, we are > > > certainly better off exposing them to userspace. > > > > > > I understand that it does add some complexity and asks for userspace code to be > > > more complex, but let's be realistic: this is a complex topic with lots of > > > hardware-specific details getting in the way. I don't think we can act as if > > > things were simpler. > > > > > > My feeling is that we should keep trying to find "as elegant as possible" ways > > > to expose constraints instead of putting strict and easy definitions for > > > userspace that end up making drivers perform sub-optimally. > > > > > > Since the initial cedrus proposal, we have covered more ground to allow the > > > API to fit the rockchip case, without conflicting with cedrus. We're now facing > > > new constraints and issue and I really think we should keep trying to integrate > > > them in the unified API. > > > > > > > >> Plus, the alignment thing won't work for AVC headers, so I think we > > > > >> should actually have a control to select the NAL header type rather > > > > >> than expose some alignment constraints (or have one pix fmt per NAL > > > > >> header type, but you don't seem to like the idea, so I'm trying > > > > >> to find something else :-)). > > > > >> > > > > >> And if we go for this option (control to select the NAL header type), > > > > >> I'm wondering why we're not making that NAL-header type selection > > > > >> mandatory from the start. We don't have to support all NAL headers at > > > > >> first (can be Annex B only), but, by making this control selection > > > > >> non-optional, we'll at least give a decent feedback to userspace > > > > >> (setting NAL header control fails because the selected NAL header type > > > > >> is not supported by the HW) instead of returning an error on the > > > > >> decoding operation (which, depending on how verbose the driver is, can > > > > >> be quite hard to figure out). > > > > > > > > This sounds reasonable. > > > > > > > > This control should be mandatory, and it should be referred to from > > > > the H264/5 pixelformat definitions (see also https://patchwork.linuxtv.org/patch/57709/). > > > > > > I am growing confused about one thing: are we talking about selecting > > > the type of *start code* (which can have a variable number of heading and > > > trailing zeros depending on the situation) or about the *NAL header type*, which > > > follows the start code? > > > > We're talking about start codes, but Nicolas called them nal_header in > > this email [1], so I thought it was the appropriate naming. > > Okay, the representation I had in mind was: > [zeros][start code][nal unit header][zeros][nal unit data] > > but maybe I'm mixing things up on my side. > > > > I like the idea of drivers providing what types of start codes they can support, > > > but I don't really see how it helps regarding the alignment constraints and how > > > it relates to the zero-padding. > > > > It does help with alignment constraints because buffers allocated by > > the driver are usually matching the HW alignment constraints and by > > passing the type of NAL header (or start code if you prefer) we now > > guarantee that the raw bitstream (when in NO_NAL_HEADER is selected) is > > placed at the beginning of the buffer. > > I thought the issue at hand was that we needed the nal unit header to start at > an aligned address That's exactly the problem I'm trying to solve, and having a solution that allows us to pass data starting at [nal unit header] when the HW is not able to parse Annex B start codes solves that. > while still needing a start code of 3 bytes. Why do we need a start code if the HW doesn't care about it? > I feel like I'm > missing something in my understanding of the issue here. > > > Doesn't solve the case of > > imported buffers, but that problem is orthogonal I think (it's a > > problem we already have right now, and would indeed require some way > > to expose HW alignment constraints). > > If we expose the constraints explicitly, then we can honestly say that it's up > to user-space to abide by them so there should be no particular difference > between imported and allocated buffers. Userspace just has to know what it's > doing. And drivers chould refuse imports that don't follow the reported > constraints (or are outside the pool dedicated to the VPU). Again, this problem is orthogonal to the problem I'm trying to solve. I'm not saying it shouldn't be addressed at some point, but that's a completely different topic. > > > Not sure what the zero padding issue is. If you know the type of start > > code, you don't have add extra 0 at the beginning to meet the alignment > > constraints. If you're talking about padding bytes added at the end of > > the bitstream (there's such a constraint on the rkvdec block), I think > > that's something driver specific and should be handled by the driver. > > My point would rather be that it is (as far as I understood) valid regarding the > H.264 spec to have extra zeros added by userspace (whatever the reason), AFAICT, it's only valid when the bitstream is wrapped with an Annex B format. Not sure other wrappers are skipping leading/trailing 0-bytes. > so even > if the type of start code is reported, it doesn't imply that we know the length > of the heading zeros and start code, so the issue remains. Hm, in the NO_START_CODE case we expect userspace to remove those leading/trailing 0-bytes as well as the 000001 start code pattern. That means we no longer have to specify an offset, and the payload is guaranteed to be aligned as the HW expects.
diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst index 7a1947f5be96..3ae1367806cf 100644 --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst @@ -1726,6 +1726,7 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type - :ref:`h264`, section 7.4.3 "Slice Header Semantics". For further documentation, refer to the above specification, unless there is an explicit comment stating otherwise. + All slices should be prepended with an ANNEX B start code. .. note::