Message ID | 20180323125707.25223-1-peter.ujfalusi@ti.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 23/03/18 14:57, Peter Ujfalusi wrote: > From: Tomi Valkeinen <tomi.valkeinen@ti.com> > > A DMM timeout "timed out waiting for done" has been observed on DRA7 > devices. The timeout happens rarely, and only when the system is under > heavy load. > > Debugging showed that the timeout can be made to happen much more > frequently by optimizing the DMM driver, so that there's almost no code > between writing the last DMM descriptors to RAM, and writing to DMM > register which starts the DMM transaction. > > The current theory is that a wmb() does not properly ensure that the > data written to RAM is observable by all the components in the system. > > This DMM timeout has caused interesting (and rare) bugs as the error > handling was not functioning properly (the error handling has been fixed > in previous commits): > > * If a DMM timeout happened when a GEM buffer was being pinned for > display on the screen, a timeout error would be shown, but the driver > would continue programming DSS HW with broken buffer, leading to > SYNCLOST floods and possible crashes. > > * If a DMM timeout happened when other user (say, video decoder) was > pinning a GEM buffer, a timeout would be shown but if the user > handled the error properly, no other issues followed. > > * If a DMM timeout happened when a GEM buffer was being released, the > driver does not even notice the error, leading to crashes or hang > later. > > This patch adds wmb() and readl() calls after the last bit is written to > RAM, which should ensure that the execution proceeds only after the data > is actually in RAM, and thus observable by DMM. > > The read-back should not be needed. Further study is required to understand > if DMM is somehow special case and read-back is ok, or if DRA7's memory > barriers do not work correctly. > > Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> > Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> > --- > drivers/gpu/drm/omapdrm/omap_dmm_tiler.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c b/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c > index c40f90d2db82..27c67bc36203 100644 > --- a/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c > +++ b/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c > @@ -410,6 +410,17 @@ static int dmm_txn_commit(struct dmm_txn *txn, bool wait) > } > > txn->last_pat->next_pa = 0; > + /* ensure that the written descriptors are visible to DMM */ > + wmb(); > + > + /* > + * NOTE: the wmb() above should be enough, but there seems to be a bug > + * in OMAP's memory barrier implementation, which in some rare cases may > + * cause the writes not to be observable after wmb(). > + */ > + > + /* read back to ensure the data is in RAM */ > + readl(&txn->last_pat->next_pa); > > /* write to PAT_DESCR to clear out any pending transaction */ > dmm_write(dmm, 0x0, reg[PAT_DESCR][engine->id]); > Thanks, I have picked this up. Tomi
diff --git a/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c b/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c index c40f90d2db82..27c67bc36203 100644 --- a/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c +++ b/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c @@ -410,6 +410,17 @@ static int dmm_txn_commit(struct dmm_txn *txn, bool wait) } txn->last_pat->next_pa = 0; + /* ensure that the written descriptors are visible to DMM */ + wmb(); + + /* + * NOTE: the wmb() above should be enough, but there seems to be a bug + * in OMAP's memory barrier implementation, which in some rare cases may + * cause the writes not to be observable after wmb(). + */ + + /* read back to ensure the data is in RAM */ + readl(&txn->last_pat->next_pa); /* write to PAT_DESCR to clear out any pending transaction */ dmm_write(dmm, 0x0, reg[PAT_DESCR][engine->id]);