Message ID | 20201007103544.22807-1-ezequiel@collabora.com (mailing list archive) |
---|---|
Headers | show |
Series | CODA timeout fix & assorted changes | expand |
Hi Ezequiel, On Wed, Oct 7, 2020 at 8:01 AM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > Hi all, > > The main motivation for this series is to address a PIC_RUN > timeout, which we managed to link with a hardware bitstream > buffer underrun condition. > > Upon further investigation we discovered that the underrun > was produced by a subtle issue in the way buffer_meta's were > being tracked. > > The issue is fixed by patch "5/6 coda: coda_buffer_meta housekeeping fix". > > Patches 1 to 4 are mostly cleanups and minor cosmetic changes. Shouldn't the fix be the first patch in the series so that it can be backported to stable? Thanks
Hi Fabio, On Wed, 2020-10-07 at 08:13 -0300, Fabio Estevam wrote: > Hi Ezequiel, > > On Wed, Oct 7, 2020 at 8:01 AM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > Hi all, > > > > The main motivation for this series is to address a PIC_RUN > > timeout, which we managed to link with a hardware bitstream > > buffer underrun condition. > > > > Upon further investigation we discovered that the underrun > > was produced by a subtle issue in the way buffer_meta's were > > being tracked. > > > > The issue is fixed by patch "5/6 coda: coda_buffer_meta housekeeping fix". > > > > Patches 1 to 4 are mostly cleanups and minor cosmetic changes. > > Shouldn't the fix be the first patch in the series so that it can be > backported to stable? > While that is typically the case, it won't change much. You'll find that the fix in patch 5 can be applied cleanly on top of v5.4 and v5.8 :-) Now that you mention this, I believe that the patch should carry: Cc: stable@vger.kernel.org # v5.4 Thanks, Ezequiel
On Wed, Oct 7, 2020 at 8:34 AM Ezequiel Garcia <ezequiel@collabora.com> wrote: > While that is typically the case, it won't change much. > > You'll find that the fix in patch 5 can be applied cleanly > on top of v5.4 and v5.8 :-) > > Now that you mention this, I believe that the patch > should carry: > > Cc: stable@vger.kernel.org # v5.4 And a Fixes tag too :-)
Hi Ezequiel! I applied your series on top of bbf5c979011a099af5dc76498918ed7df445635b (5.9 tag). Additionally, I applied the attached patch to print the active metas during coda_fill_bitstream(): 0001-DEBUG-coda-print-list-of-current-active-metas.patch For verification, I used the attached single-color video: MPEG4, 512x512, ~135 kb/s, 20 FPS, GoP = 5; see below for reproduction steps. This might be a very "hard" and "theoretical" sample. However, with this video, I still get a timeout (see error log below). When applying my "simplistic approach" (second patch) on top of that, the video is working. I saw a padding to 512 byte in the H.264 implementation, therefore I assumed it's the same here. Since I don't know how to pad the MPEG4 stream, I decided to keep adding frames until the "requirement" (I have no documentation for that) is reached. Therefore, this patch ensures that there are at least two buffers queued which reach the 512 byte threshold (for simplistic reasons 3, since "beyond the current one"). Note that the second patch is also just a proof of concept, there might be more efficient solutions. If your reproduction attempts lead to different results, please write me an email. Regards & many thanks! Benjamin *Video creation:* 1.) Create single color video: gst-launch-1.0 videotestsrc num-buffers=100 pattern=blue ! \ video/x-raw,format=I420,width=512,height=512 ! v4l2mpeg4enc output-io-mode=4 ! \ avimux ! filesink location=temp.avi 2.) Re-encode with a GoP (IIRC this option is not provided by v4l2mpeg4enc, even though enabled in the driver; I usually do this on my test pc, not the SuT): ffmpeg -i temp.avi -vf scale=512:512 -vtag "xvid" -g 5 mpeg4.avi *Error log:* [ 1067.150368] coda 2040000.vpu: 0: start streaming vid-cap [ 1067.164871] coda 2040000.vpu: 0: not ready: need 2 buffers available (queue:1 + bitstream:0) [ 1067.165490] coda 2040000.vpu: 0: job ready [ 1067.166682] coda 2040000.vpu: 0: active metas: [ 1067.166692] coda 2040000.vpu: 0: - payload: 60 [ 1067.166697] coda 2040000.vpu: 0: - payload: 2913 [ 1067.166702] coda 2040000.vpu: 0: - payload: 155 [ 1067.166706] coda 2040000.vpu: 0: - payload: 155 [ 1067.166711] coda 2040000.vpu: 0: - payload: 155 [ 1067.166715] coda 2040000.vpu: 0: want to queue: payload: 155 [ 1067.166973] coda 2040000.vpu: 0: active metas: [ 1067.166982] coda 2040000.vpu: 0: - payload: 60 [ 1067.166987] coda 2040000.vpu: 0: - payload: 2913 [ 1067.166992] coda 2040000.vpu: 0: - payload: 155 [ 1067.166996] coda 2040000.vpu: 0: - payload: 155 [ 1067.167000] coda 2040000.vpu: 0: - payload: 155 [ 1067.167005] coda 2040000.vpu: 0: want to queue: payload: 155 [ 1068.168449] coda 2040000.vpu: CODA PIC_RUN timeout
Hi Benjamin, On Mon, 2020-10-12 at 14:47 +0000, Benjamin Bara - SKIDATA wrote: > Hi Ezequiel! > > I applied your series on top of bbf5c979011a099af5dc76498918ed7df445635b (5.9 tag). > Additionally, I applied the attached patch to print the active metas during coda_fill_bitstream(): > 0001-DEBUG-coda-print-list-of-current-active-metas.patch > > For verification, I used the attached single-color video: > MPEG4, 512x512, ~135 kb/s, 20 FPS, GoP = 5; see below for reproduction steps. > This might be a very "hard" and "theoretical" sample. > However, with this video, I still get a timeout (see error log below). > > When applying my "simplistic approach" (second patch) on top of that, the video is working. > I saw a padding to 512 byte in the H.264 implementation, therefore I assumed it's the same here. > Since I don't know how to pad the MPEG4 stream, I decided to keep adding frames > until the "requirement" (I have no documentation for that) is reached. > Therefore, this patch ensures that there are at least two buffers queued > which reach the 512 byte threshold (for simplistic reasons 3, since "beyond the current one"). > > Note that the second patch is also just a proof of concept, there might be more efficient solutions. > > If your reproduction attempts lead to different results, please write me an email. > > Regards & many thanks! > Benjamin > > > *Video creation:* > 1.) Create single color video: > gst-launch-1.0 videotestsrc num-buffers=100 pattern=blue ! \ > video/x-raw,format=I420,width=512,height=512 ! v4l2mpeg4enc output-io-mode=4 ! \ > avimux ! filesink location=temp.avi > > 2.) Re-encode with a GoP (IIRC this option is not provided by v4l2mpeg4enc, > even though enabled in the driver; I usually do this on my test pc, not the SuT): > ffmpeg -i temp.avi -vf scale=512:512 -vtag "xvid" -g 5 mpeg4.avi > > *Error log:* > [ 1067.150368] coda 2040000.vpu: 0: start streaming vid-cap > [ 1067.164871] coda 2040000.vpu: 0: not ready: need 2 buffers available (queue:1 + bitstream:0) > [ 1067.165490] coda 2040000.vpu: 0: job ready > [ 1067.166682] coda 2040000.vpu: 0: active metas: > [ 1067.166692] coda 2040000.vpu: 0: - payload: 60 > [ 1067.166697] coda 2040000.vpu: 0: - payload: 2913 > [ 1067.166702] coda 2040000.vpu: 0: - payload: 155 > [ 1067.166706] coda 2040000.vpu: 0: - payload: 155 > [ 1067.166711] coda 2040000.vpu: 0: - payload: 155 > [ 1067.166715] coda 2040000.vpu: 0: want to queue: payload: 155 > [ 1067.166973] coda 2040000.vpu: 0: active metas: > [ 1067.166982] coda 2040000.vpu: 0: - payload: 60 > [ 1067.166987] coda 2040000.vpu: 0: - payload: 2913 > [ 1067.166992] coda 2040000.vpu: 0: - payload: 155 > [ 1067.166996] coda 2040000.vpu: 0: - payload: 155 > [ 1067.167000] coda 2040000.vpu: 0: - payload: 155 > [ 1067.167005] coda 2040000.vpu: 0: want to queue: payload: 155 > [ 1068.168449] coda 2040000.vpu: CODA PIC_RUN timeout > Many thanks for your report. Indeed you managed to create a video that is quite problematic for CODA. Adding some debugging code to dump the metas, and then inspect the bitstream payload before/after a sync, we can see that the hardware buffer has 690 bytes. This seems a bit confusing, since the driver considers it's enough. From the log below, isn't this supposed to be enough metas? [ 274.087643] coda 2040000.vpu: CODA PIC_RUN timeout [ 274.092655] coda 2040000.vpu: meta 2: 432 - 804 (00 00 01 b0 01) [ 274.099133] coda 2040000.vpu: meta 3: 804 - 1176 (00 00 01 b0 01) [ 274.105551] coda 2040000.vpu: meta 4: 1176 - 1548 (00 00 01 b0 01) [ 274.112200] coda 2040000.vpu: meta 5: 1548 - 1920 (00 00 01 b0 01) [ 274.118840] coda 2040000.vpu: meta 6: 1920 - 2292 (00 00 01 b0 01) [ 274.125347] coda 2040000.vpu: bitstream payload: 690 (before sync) [ 274.130627] coda 2040000.vpu: bitstream payload: 690 (after sync) Philipp, what do you think? I must admit I can't fully wrap my head around your prefetch fix, and how the new condition would fix this issue. Could you explain it in more detail? Why wouldn't the prefetcher count chunks smaller than 512? Do you have some documentation for that? Thanks, Ezequiel
> -----Original Message----- > From: Ezequiel Garcia <ezequiel@collabora.com> > Sent: Mittwoch, 4. November 2020 18:01 Hi again! > Many thanks for your report. Indeed you managed to create a video that is quite > problematic for CODA. Thanks for looking at it! > I must admit I can't fully wrap my head around your prefetch fix, and how the new > condition would fix this issue. Could you explain it in more detail? I don't have access to the CODA960 documentation, so all my findings are based on tests or code documentation. My starting point was the inline documentation [1]: "If we managed to fill in at least a full reorder window of buffers and the bitstream prefetcher has at least 2 256 bytes periods beyond the first buffer to fetch (...)". The current code checks if there are 512 bytes after the current meta. When I comment out the following break, my test videos are working. So for me, starvation is caused by this statement, because it hinders follow-up metas for performance reasons (the decoder might also start hiccuping with too much data enqueued, but not sure). I did some tests, and basically I found out that the sum doesn't matter, so something like - Meta 2: 1024 - Meta 3: 100 - Meta 4: 100 starves, even if it passes the current check. Next, I thought about "2x 256" is actually not the same as "1x 512". With 2x 256, I could actually got a couple more test videos running, something like: - Meta 2: 650 - Meta 3: 50 - Meta 4: 650 The crafting of such video is quite easy by varying the Group-of-Picture and the resolution. However, then I tried some smaller videos (with meta 2 & 4: 256 < meta < 512) which lead to starvation again. I haven't done additional tests, instead I shared my findings and asked for further documentation hints about the actual starvation criteria. So my solution is just a Proof-of-Concept, found by testing and is not backed by documentation - therefore I am also not sure if it is sufficient or has any side effects. Best regards Benjamin [1] https://elixir.bootlin.com/linux/v5.10-rc2/source/drivers/media/platform/coda/coda-bit.c#L352
Hi Benjamin, On Wed, 2020-11-04 at 17:49 +0000, Benjamin Bara - SKIDATA wrote: > > -----Original Message----- > > From: Ezequiel Garcia <ezequiel@collabora.com> > > Sent: Mittwoch, 4. November 2020 18:01 > > Hi again! > > > Many thanks for your report. Indeed you managed to create a video that is quite > > problematic for CODA. > > Thanks for looking at it! > > > I must admit I can't fully wrap my head around your prefetch fix, and how the new > > condition would fix this issue. Could you explain it in more detail? > > I don't have access to the CODA960 documentation, so all my findings are based on tests > or code documentation. > > My starting point was the inline documentation [1]: > "If we managed to fill in at least a full reorder window of buffers and the bitstream > prefetcher has at least 2 256 bytes periods beyond the first buffer to fetch (...)". > > The current code checks if there are 512 bytes after the current meta. > When I comment out the following break, my test videos are working. > So for me, starvation is caused by this statement, because it hinders follow-up metas > for performance reasons (the decoder might also start hiccuping with too much data enqueued, > but not sure). > > I did some tests, and basically I found out that the sum doesn't matter, so something like > - Meta 2: 1024 > - Meta 3: 100 > - Meta 4: 100 > starves, even if it passes the current check. > > Next, I thought about "2x 256" is actually not the same as "1x 512". > With 2x 256, I could actually got a couple more test videos running, something like: > - Meta 2: 650 > - Meta 3: 50 > - Meta 4: 650 > > The crafting of such video is quite easy by varying the Group-of-Picture and the resolution. > > However, then I tried some smaller videos (with meta 2 & 4: 256 < meta < 512) > which lead to starvation again. > I haven't done additional tests, instead I shared my findings and asked for further documentation > hints about the actual starvation criteria. > > So my solution is just a Proof-of-Concept, found by testing and is not backed by documentation - > therefore I am also not sure if it is sufficient or has any side effects. > Please submit your patch on top of latest media master branch and let's start discussing it. BTW, Nicolas and I have started using https://github.com/fluendo/fluster/ conformance testing. Perhaps you can run the H264 test suite, with & without your patch (assuming it affects H264 in any way) and include that information in the cover letter. Thanks! Ezequiel