Message ID | 1369139597-24446-1-git-send-email-andriy.shevchenko@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote: > Hi Andy, > > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote: >> When thread is going to be stopped we have to unconditionally terminate all >> ongoing transfers. Otherwise it would be possible that callback function will >> be called on the next interrupt and will try to access to already freed >> structures. >> >> The patch introduces specific error message for this, though it doesn't >> increase the counter of the failed tests. >> >> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> >> Reported-by: Will Deacon <will.deacon@arm.com> > > Thanks for persevering with this! Although this patch definitely fixes the > panic I was seeing, I now observe buffer verification failures in subsequent > test runs after an aborted run: I think the description to the commit adfa543e "dmatest: don't use set_freezable_with_signal()" may shed light on this. The background (if I got it correctly) is in race with done flag. So, we got a callback call from the DMA engine, but we don't know which transfer triggers it. I might be wrong. This is just an assumption. Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean with old dmatest module) -- With Best Regards, Andy Shevchenko
On Tue, May 21, 2013 at 06:24:15PM +0100, Andy Shevchenko wrote: > On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote: > > Hi Andy, > > > > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote: > >> When thread is going to be stopped we have to unconditionally terminate all > >> ongoing transfers. Otherwise it would be possible that callback function will > >> be called on the next interrupt and will try to access to already freed > >> structures. > >> > >> The patch introduces specific error message for this, though it doesn't > >> increase the counter of the failed tests. > >> > >> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > >> Reported-by: Will Deacon <will.deacon@arm.com> > > > > Thanks for persevering with this! Although this patch definitely fixes the > > panic I was seeing, I now observe buffer verification failures in subsequent > > test runs after an aborted run: > > I think the description to the commit adfa543e "dmatest: don't use > set_freezable_with_signal()" may shed light on this. > > The background (if I got it correctly) is in race with done flag. So, > we got a callback call from the DMA engine, but we don't know which > transfer triggers it. > I might be wrong. This is just an assumption. I've not managed to work out exactly what's going on, but it's certainly something like that. In fact, I just managed to trigger a case where all but one of the transfers is aborted, whilst the remaining one fails. Looking at the code, I can't see how that situation comes about, since the threads are protected with the info mutex and kthread_stop is synchronous. > Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean > with old dmatest module) No, dmatest from 3.9 is completely reliable in my experience. Will
On Wed, May 22, 2013 at 3:41 PM, Will Deacon <will.deacon@arm.com> wrote: > On Tue, May 21, 2013 at 06:24:15PM +0100, Andy Shevchenko wrote: >> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote: >> > I now observe buffer verification failures in subsequent >> > test runs after an aborted run: >> >> I think the description to the commit adfa543e "dmatest: don't use >> set_freezable_with_signal()" may shed light on this. >> >> The background (if I got it correctly) is in race with done flag. So, >> we got a callback call from the DMA engine, but we don't know which >> transfer triggers it. >> I might be wrong. This is just an assumption. > > I've not managed to work out exactly what's going on, but it's certainly > something like that. In fact, I just managed to trigger a case where all but > one of the transfers is aborted, whilst the remaining one fails. Looking at > the code, I can't see how that situation comes about, since the threads are > protected with the info mutex and kthread_stop is synchronous. >> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean >> with old dmatest module) > > No, dmatest from 3.9 is completely reliable in my experience. Yeah, I supposed that was a rhetorical question. So, have I understood correctly that if you revert the 77101ce5 ("cancel thread ...") everything is working fine / as before? -- With Best Regards, Andy Shevchenko
On Tue, May 21, 2013 at 08:24:15PM +0300, Andy Shevchenko wrote: > On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote: > > Hi Andy, > > > > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote: > >> When thread is going to be stopped we have to unconditionally terminate all > >> ongoing transfers. Otherwise it would be possible that callback function will > >> be called on the next interrupt and will try to access to already freed > >> structures. > >> > >> The patch introduces specific error message for this, though it doesn't > >> increase the counter of the failed tests. > >> > >> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > >> Reported-by: Will Deacon <will.deacon@arm.com> > > > > Thanks for persevering with this! Although this patch definitely fixes the > > panic I was seeing, I now observe buffer verification failures in subsequent > > test runs after an aborted run: > > I think the description to the commit adfa543e "dmatest: don't use > set_freezable_with_signal()" may shed light on this. > > The background (if I got it correctly) is in race with done flag. So, > we got a callback call from the DMA engine, but we don't know which > transfer triggers it. > I might be wrong. This is just an assumption. > > Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean > with old dmatest module) Terminate shouldnt cause the issue with buffer verfication, can you try this on dw_dmac, do you see similar failures on verfication? -- ~Vinod
On Thu, May 23, 2013 at 01:51:29PM +0300, Andy Shevchenko wrote: > On Thu, May 23, 2013 at 1:09 PM, Vinod Koul <vinod.koul@intel.com> wrote: > > On Tue, May 21, 2013 at 08:24:15PM +0300, Andy Shevchenko wrote: > >> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote: > >> > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote: > >> >> When thread is going to be stopped we have to unconditionally terminate all > >> >> ongoing transfers. Otherwise it would be possible that callback function will > >> >> be called on the next interrupt and will try to access to already freed > >> >> structures. > >> >> > >> >> The patch introduces specific error message for this, though it doesn't > >> >> increase the counter of the failed tests. > >> >> > >> > Thanks for persevering with this! Although this patch definitely fixes the > >> > panic I was seeing, I now observe buffer verification failures in subsequent > >> > test runs after an aborted run: > >> > >> I think the description to the commit adfa543e "dmatest: don't use > >> set_freezable_with_signal()" may shed light on this. > >> > >> The background (if I got it correctly) is in race with done flag. So, > >> we got a callback call from the DMA engine, but we don't know which > >> transfer triggers it. > >> I might be wrong. This is just an assumption. > >> > >> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean > >> with old dmatest module) > > Terminate shouldnt cause the issue with buffer verfication, can you try this on > > dw_dmac, do you see similar failures on verfication? > > I saw the similar errors on dw_dmac on Intel Medfield device. Ah, so may not be a driver issue. > Anyway, I checked another approach with Will. > For now I will send a quick fix that prevents tester to abort an > ongoing transfer. okay so taht should prevent regression, let me check on my setup if I find anything -- ~Vinod > In future we could implement a robust logic when transfers can be > interrupted at any time. > > -- > With Best Regards, > Andy Shevchenko
On Thu, May 23, 2013 at 1:09 PM, Vinod Koul <vinod.koul@intel.com> wrote: > On Tue, May 21, 2013 at 08:24:15PM +0300, Andy Shevchenko wrote: >> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote: >> > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote: >> >> When thread is going to be stopped we have to unconditionally terminate all >> >> ongoing transfers. Otherwise it would be possible that callback function will >> >> be called on the next interrupt and will try to access to already freed >> >> structures. >> >> >> >> The patch introduces specific error message for this, though it doesn't >> >> increase the counter of the failed tests. >> >> >> > Thanks for persevering with this! Although this patch definitely fixes the >> > panic I was seeing, I now observe buffer verification failures in subsequent >> > test runs after an aborted run: >> >> I think the description to the commit adfa543e "dmatest: don't use >> set_freezable_with_signal()" may shed light on this. >> >> The background (if I got it correctly) is in race with done flag. So, >> we got a callback call from the DMA engine, but we don't know which >> transfer triggers it. >> I might be wrong. This is just an assumption. >> >> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean >> with old dmatest module) > Terminate shouldnt cause the issue with buffer verfication, can you try this on > dw_dmac, do you see similar failures on verfication? I saw the similar errors on dw_dmac on Intel Medfield device. Anyway, I checked another approach with Will. For now I will send a quick fix that prevents tester to abort an ongoing transfer. In future we could implement a robust logic when transfers can be interrupted at any time.
diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c index d8ce4ec..f61bd55 100644 --- a/drivers/dma/dmatest.c +++ b/drivers/dma/dmatest.c @@ -92,6 +92,7 @@ enum dmatest_error_type { DMATEST_ET_MAP_DST, DMATEST_ET_PREP, DMATEST_ET_SUBMIT, + DMATEST_ET_ABORT, DMATEST_ET_TIMEOUT, DMATEST_ET_DMA_ERROR, DMATEST_ET_DMA_IN_PROGRESS, @@ -366,6 +367,7 @@ static char *thread_result_get(const char *name, [DMATEST_ET_MAP_DST] = "dst mapping error", [DMATEST_ET_PREP] = "prep error", [DMATEST_ET_SUBMIT] = "submit error", + [DMATEST_ET_ABORT] = "transfer aborted", [DMATEST_ET_TIMEOUT] = "test timed out", [DMATEST_ET_DMA_ERROR] = "got completion callback (DMA_ERROR)", @@ -720,6 +722,15 @@ static int dmatest_func(void *data) done.done || kthread_should_stop(), msecs_to_jiffies(params->timeout)); + /* terminate all transfers on specified channel if needed */ + if (kthread_should_stop()) { + dmaengine_terminate_all(chan); + thread_result_add(info, result, DMATEST_ET_ABORT, + total_tests, src_off, dst_off, + len, 0); + break; + } + status = dma_async_is_tx_complete(chan, cookie, NULL, NULL); if (!done.done) {
When thread is going to be stopped we have to unconditionally terminate all ongoing transfers. Otherwise it would be possible that callback function will be called on the next interrupt and will try to access to already freed structures. The patch introduces specific error message for this, though it doesn't increase the counter of the failed tests. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reported-by: Will Deacon <will.deacon@arm.com> --- drivers/dma/dmatest.c | 11 +++++++++++ 1 file changed, 11 insertions(+)