Message ID | 20190315183949.GA4578@vrkonda-desk.ra.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/display: Increase timeout for DP Aux channel ctl signal | expand |
On Fri, Mar 15, 2019 at 11:39:54AM -0700, Vanshidhar Konda wrote: > Extend the timeout for the hardware to signal SEND_BUSY on the DP > Aux Channel Controller register. > > This is needed to address FDO #109982 > https://bugzilla.freedesktop.org/show_bug.cgi?id=109982 instead of mentioning like this, please use: Bugzilla: https://bugzilla.freedesktop.org/show_bug.cgi?id=109982 Also instead of the "needed to address" it would be better to add some reasoning explaining that "empirically we got some bugs workarounded by increasing the timeout from 10ms to 15ms although spec was only requiring 4ms" or something like that... > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> > Cc: Imre Deak <imre.deak@intel.com> > Signed-off-by: Vanshidhar Konda <vanshidhar.r.konda@intel.com> > --- > drivers/gpu/drm/i915/intel_dp.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c > index 47857f96c3b1..fd6de33c5664 100644 > --- a/drivers/gpu/drm/i915/intel_dp.c > +++ b/drivers/gpu/drm/i915/intel_dp.c > @@ -1053,6 +1053,8 @@ intel_dp_check_edp(struct intel_dp *intel_dp) > } > } > > +#define DP_AUX_CH_CTL_SIGNAL_TIMEOUT_MS 15 This define is spurious since it's in use in a single place. Also, giving the timeout a name, like this, makes it appear it came from the spec. Well, if it came from Spec it should be defined in the proper .h files. Since I don't believe this came from spec I believe we can just remove it and go for the timeout directly on the function below. > + > static u32 > intel_dp_aux_wait_done(struct intel_dp *intel_dp) > { > @@ -1063,7 +1065,7 @@ intel_dp_aux_wait_done(struct intel_dp *intel_dp) > > #define C (((status = I915_READ_NOTRACE(ch_ctl)) & DP_AUX_CH_CTL_SEND_BUSY) == 0) > done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, > - msecs_to_jiffies_timeout(10)); > + msecs_to_jiffies_timeout(DP_AUX_CH_CTL_SIGNAL_TIMEOUT_MS)); Is this just a guess that you are trying to check? I'm asking because I didn't see any indication that the increase really fixed the issue. So, if you are trying to just validate your approach maybe the try-bot could be used? Thanks, Rodrigo. > > /* just trace the final value */ > trace_i915_reg_rw(false, ch_ctl, status, sizeof(status), true); > -- > 2.20.1 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Fri, Mar 15, 2019 at 12:38:41PM -0700, Rodrigo Vivi wrote: >On Fri, Mar 15, 2019 at 11:39:54AM -0700, Vanshidhar Konda wrote: >> Extend the timeout for the hardware to signal SEND_BUSY on the DP >> Aux Channel Controller register. >> >> This is needed to address FDO #109982 >> https://bugzilla.freedesktop.org/show_bug.cgi?id=109982 > >instead of mentioning like this, please use: >Bugzilla: https://bugzilla.freedesktop.org/show_bug.cgi?id=109982 > >Also instead of the "needed to address" it would be better >to add some reasoning explaining that >"empirically we got some bugs workarounded by increasing the timeout > from 10ms to 15ms although spec was only requiring 4ms" >or something like that... > Thanks! I'll keep these in mind for next patches. >> >> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> >> Cc: Imre Deak <imre.deak@intel.com> >> Signed-off-by: Vanshidhar Konda <vanshidhar.r.konda@intel.com> >> --- >> drivers/gpu/drm/i915/intel_dp.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c >> index 47857f96c3b1..fd6de33c5664 100644 >> --- a/drivers/gpu/drm/i915/intel_dp.c >> +++ b/drivers/gpu/drm/i915/intel_dp.c >> @@ -1053,6 +1053,8 @@ intel_dp_check_edp(struct intel_dp *intel_dp) >> } >> } >> >> +#define DP_AUX_CH_CTL_SIGNAL_TIMEOUT_MS 15 > >This define is spurious since it's in use in a single place. > >Also, giving the timeout a name, like this, makes it appear it came from >the spec. Well, if it came from Spec it should be defined in the proper .h >files. > >Since I don't believe this came from spec I believe we can just remove it >and go for the timeout directly on the function below. > I'll make this change in the next patch. >> + >> static u32 >> intel_dp_aux_wait_done(struct intel_dp *intel_dp) >> { >> @@ -1063,7 +1065,7 @@ intel_dp_aux_wait_done(struct intel_dp *intel_dp) >> >> #define C (((status = I915_READ_NOTRACE(ch_ctl)) & DP_AUX_CH_CTL_SEND_BUSY) == 0) >> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, >> - msecs_to_jiffies_timeout(10)); >> + msecs_to_jiffies_timeout(DP_AUX_CH_CTL_SIGNAL_TIMEOUT_MS)); > >Is this just a guess that you are trying to check? >I'm asking because I didn't see any indication that the increase >really fixed the issue. > >So, if you are trying to just validate your approach maybe the try-bot >could be used? > I also would have preferred to use the trybot. The error in Bugzilla 109982 is only reproduced on only shard-iclb2 machine once in approximately 2 days. So the trybot is not of much help for this. Is there another way of testing experimental patches for bugs that are not reproduced locally or on other machines? Or, if the issue is only happening on one machine, should it be lowered in priority/whitelisted on the specific CI machine? >Thanks, >Rodrigo. > >> >> /* just trace the final value */ >> trace_i915_reg_rw(false, ch_ctl, status, sizeof(status), true); >> -- >> 2.20.1 >> >> _______________________________________________ >> Intel-gfx mailing list >> Intel-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Fri, Mar 15, 2019 at 01:27:22PM -0700, Vanshidhar Konda wrote: > On Fri, Mar 15, 2019 at 12:38:41PM -0700, Rodrigo Vivi wrote: > > On Fri, Mar 15, 2019 at 11:39:54AM -0700, Vanshidhar Konda wrote: > > > Extend the timeout for the hardware to signal SEND_BUSY on the DP > > > Aux Channel Controller register. > > > > > > This is needed to address FDO #109982 > > > https://bugzilla.freedesktop.org/show_bug.cgi?id=109982 > > > > instead of mentioning like this, please use: > > Bugzilla: https://bugzilla.freedesktop.org/show_bug.cgi?id=109982 > > > > Also instead of the "needed to address" it would be better > > to add some reasoning explaining that > > "empirically we got some bugs workarounded by increasing the timeout > > from 10ms to 15ms although spec was only requiring 4ms" > > or something like that... > > > Thanks! I'll keep these in mind for next patches. > > > > > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > Cc: Imre Deak <imre.deak@intel.com> > > > Signed-off-by: Vanshidhar Konda <vanshidhar.r.konda@intel.com> > > > --- > > > drivers/gpu/drm/i915/intel_dp.c | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c > > > index 47857f96c3b1..fd6de33c5664 100644 > > > --- a/drivers/gpu/drm/i915/intel_dp.c > > > +++ b/drivers/gpu/drm/i915/intel_dp.c > > > @@ -1053,6 +1053,8 @@ intel_dp_check_edp(struct intel_dp *intel_dp) > > > } > > > } > > > > > > +#define DP_AUX_CH_CTL_SIGNAL_TIMEOUT_MS 15 > > > > This define is spurious since it's in use in a single place. > > > > Also, giving the timeout a name, like this, makes it appear it came from > > the spec. Well, if it came from Spec it should be defined in the proper .h > > files. > > > > Since I don't believe this came from spec I believe we can just remove it > > and go for the timeout directly on the function below. > > > I'll make this change in the next patch. > > > + > > > static u32 > > > intel_dp_aux_wait_done(struct intel_dp *intel_dp) > > > { > > > @@ -1063,7 +1065,7 @@ intel_dp_aux_wait_done(struct intel_dp *intel_dp) > > > > > > #define C (((status = I915_READ_NOTRACE(ch_ctl)) & DP_AUX_CH_CTL_SEND_BUSY) == 0) > > > done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, > > > - msecs_to_jiffies_timeout(10)); > > > + msecs_to_jiffies_timeout(DP_AUX_CH_CTL_SIGNAL_TIMEOUT_MS)); > > > > Is this just a guess that you are trying to check? > > I'm asking because I didn't see any indication that the increase > > really fixed the issue. > > > > So, if you are trying to just validate your approach maybe the try-bot > > could be used? > > > I also would have preferred to use the trybot. The error in Bugzilla 109982 is only reproduced on only shard-iclb2 machine once > in approximately 2 days. So the trybot is not of much help for this. > > Is there another way of testing experimental patches for bugs that are > not reproduced locally or on other machines? Or, if the issue is only > happening on one machine, should it be lowered in priority/whitelisted > on the specific CI machine? For now I believe the best alternative is to --subject-prefix="[CI]" But also you need to be sure that the fail rate don't fool us... I believe there will be better alternatives for using CI for cases like this soon, so +Martin > > > Thanks, > > Rodrigo. > > > > > > > > /* just trace the final value */ > > > trace_i915_reg_rw(false, ch_ctl, status, sizeof(status), true); > > > -- > > > 2.20.1 > > > > > > _______________________________________________ > > > Intel-gfx mailing list > > > Intel-gfx@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Fri, Mar 15, 2019 at 02:31:40PM -0700, Rodrigo Vivi wrote: > On Fri, Mar 15, 2019 at 01:27:22PM -0700, Vanshidhar Konda wrote: > > On Fri, Mar 15, 2019 at 12:38:41PM -0700, Rodrigo Vivi wrote: > > > On Fri, Mar 15, 2019 at 11:39:54AM -0700, Vanshidhar Konda wrote: > > > > Extend the timeout for the hardware to signal SEND_BUSY on the DP > > > > Aux Channel Controller register. > > > > > > > > This is needed to address FDO #109982 > > > > https://bugzilla.freedesktop.org/show_bug.cgi?id=109982 > > > > > > instead of mentioning like this, please use: > > > Bugzilla: https://bugzilla.freedesktop.org/show_bug.cgi?id=109982 > > > > > > Also instead of the "needed to address" it would be better > > > to add some reasoning explaining that > > > "empirically we got some bugs workarounded by increasing the timeout > > > from 10ms to 15ms although spec was only requiring 4ms" > > > or something like that... > > > > > Thanks! I'll keep these in mind for next patches. > > > > > > > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > > Cc: Imre Deak <imre.deak@intel.com> > > > > Signed-off-by: Vanshidhar Konda <vanshidhar.r.konda@intel.com> > > > > --- > > > > drivers/gpu/drm/i915/intel_dp.c | 4 +++- > > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c > > > > index 47857f96c3b1..fd6de33c5664 100644 > > > > --- a/drivers/gpu/drm/i915/intel_dp.c > > > > +++ b/drivers/gpu/drm/i915/intel_dp.c > > > > @@ -1053,6 +1053,8 @@ intel_dp_check_edp(struct intel_dp *intel_dp) > > > > } > > > > } > > > > > > > > +#define DP_AUX_CH_CTL_SIGNAL_TIMEOUT_MS 15 > > > > > > This define is spurious since it's in use in a single place. > > > > > > Also, giving the timeout a name, like this, makes it appear it came from > > > the spec. Well, if it came from Spec it should be defined in the proper .h > > > files. > > > > > > Since I don't believe this came from spec I believe we can just remove it > > > and go for the timeout directly on the function below. > > > > > I'll make this change in the next patch. > > > > + > > > > static u32 > > > > intel_dp_aux_wait_done(struct intel_dp *intel_dp) > > > > { > > > > @@ -1063,7 +1065,7 @@ intel_dp_aux_wait_done(struct intel_dp *intel_dp) > > > > > > > > #define C (((status = I915_READ_NOTRACE(ch_ctl)) & DP_AUX_CH_CTL_SEND_BUSY) == 0) > > > > done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, > > > > - msecs_to_jiffies_timeout(10)); > > > > + msecs_to_jiffies_timeout(DP_AUX_CH_CTL_SIGNAL_TIMEOUT_MS)); > > > > > > Is this just a guess that you are trying to check? > > > I'm asking because I didn't see any indication that the increase > > > really fixed the issue. > > > > > > So, if you are trying to just validate your approach maybe the try-bot > > > could be used? > > > > > I also would have preferred to use the trybot. The error in Bugzilla 109982 is only reproduced on only shard-iclb2 machine once > > in approximately 2 days. So the trybot is not of much help for this. > > > > Is there another way of testing experimental patches for bugs that are > > not reproduced locally or on other machines? Or, if the issue is only > > happening on one machine, should it be lowered in priority/whitelisted > > on the specific CI machine? > > For now I believe the best alternative is to --subject-prefix="[CI]" > > But also you need to be sure that the fail rate don't fool us... hmm... this testcase seems to have a passrate of 77% of the runs. So probably using the CI infrastructure here isn't the best alternative to validate this approach in general. > > I believe there will be better alternatives for using CI for cases like > this soon, so +Martin > > > > > > Thanks, > > > Rodrigo. > > > > > > > > > > > /* just trace the final value */ > > > > trace_i915_reg_rw(false, ch_ctl, status, sizeof(status), true); > > > > -- > > > > 2.20.1 > > > > > > > > _______________________________________________ > > > > Intel-gfx mailing list > > > > Intel-gfx@lists.freedesktop.org > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Fri, Mar 15, 2019 at 02:39:25PM -0700, Rodrigo Vivi wrote: > On Fri, Mar 15, 2019 at 02:31:40PM -0700, Rodrigo Vivi wrote: > > On Fri, Mar 15, 2019 at 01:27:22PM -0700, Vanshidhar Konda wrote: > > > On Fri, Mar 15, 2019 at 12:38:41PM -0700, Rodrigo Vivi wrote: > > > > On Fri, Mar 15, 2019 at 11:39:54AM -0700, Vanshidhar Konda wrote: > > > > > Extend the timeout for the hardware to signal SEND_BUSY on the DP > > > > > Aux Channel Controller register. > > > > > > > > > > This is needed to address FDO #109982 > > > > > https://bugzilla.freedesktop.org/show_bug.cgi?id=109982 > > > > > > > > instead of mentioning like this, please use: > > > > Bugzilla: https://bugzilla.freedesktop.org/show_bug.cgi?id=109982 > > > > > > > > Also instead of the "needed to address" it would be better > > > > to add some reasoning explaining that > > > > "empirically we got some bugs workarounded by increasing the timeout > > > > from 10ms to 15ms although spec was only requiring 4ms" > > > > or something like that... > > > > > > > Thanks! I'll keep these in mind for next patches. > > > > > > > > > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > > > Cc: Imre Deak <imre.deak@intel.com> > > > > > Signed-off-by: Vanshidhar Konda <vanshidhar.r.konda@intel.com> > > > > > --- > > > > > drivers/gpu/drm/i915/intel_dp.c | 4 +++- > > > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c > > > > > index 47857f96c3b1..fd6de33c5664 100644 > > > > > --- a/drivers/gpu/drm/i915/intel_dp.c > > > > > +++ b/drivers/gpu/drm/i915/intel_dp.c > > > > > @@ -1053,6 +1053,8 @@ intel_dp_check_edp(struct intel_dp *intel_dp) > > > > > } > > > > > } > > > > > > > > > > +#define DP_AUX_CH_CTL_SIGNAL_TIMEOUT_MS 15 > > > > > > > > This define is spurious since it's in use in a single place. > > > > > > > > Also, giving the timeout a name, like this, makes it appear it came from > > > > the spec. Well, if it came from Spec it should be defined in the proper .h > > > > files. > > > > > > > > Since I don't believe this came from spec I believe we can just remove it > > > > and go for the timeout directly on the function below. > > > > > > > I'll make this change in the next patch. > > > > > + > > > > > static u32 > > > > > intel_dp_aux_wait_done(struct intel_dp *intel_dp) > > > > > { > > > > > @@ -1063,7 +1065,7 @@ intel_dp_aux_wait_done(struct intel_dp *intel_dp) > > > > > > > > > > #define C (((status = I915_READ_NOTRACE(ch_ctl)) & DP_AUX_CH_CTL_SEND_BUSY) == 0) > > > > > done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, > > > > > - msecs_to_jiffies_timeout(10)); > > > > > + msecs_to_jiffies_timeout(DP_AUX_CH_CTL_SIGNAL_TIMEOUT_MS)); > > > > > > > > Is this just a guess that you are trying to check? > > > > I'm asking because I didn't see any indication that the increase > > > > really fixed the issue. > > > > > > > > So, if you are trying to just validate your approach maybe the try-bot > > > > could be used? > > > > > > > I also would have preferred to use the trybot. The error in Bugzilla 109982 is only reproduced on only shard-iclb2 machine once > > > in approximately 2 days. So the trybot is not of much help for this. > > > > > > Is there another way of testing experimental patches for bugs that are > > > not reproduced locally or on other machines? Or, if the issue is only > > > happening on one machine, should it be lowered in priority/whitelisted > > > on the specific CI machine? > > > > For now I believe the best alternative is to --subject-prefix="[CI]" > > > > But also you need to be sure that the fail rate don't fool us... > > hmm... this testcase seems to have a passrate of 77% of the runs. > So probably using the CI infrastructure here isn't the best alternative > to validate this approach in general. ops... I forgot to paste the link from these data: https://intel-gfx-ci.01.org/tree/drm-tip/shard-iclb.html (30 runs out of 45 passing) > > > > > I believe there will be better alternatives for using CI for cases like > > this soon, so +Martin > > > > > > > > > Thanks, > > > > Rodrigo. > > > > > > > > > > > > > > /* just trace the final value */ > > > > > trace_i915_reg_rw(false, ch_ctl, status, sizeof(status), true); > > > > > -- > > > > > 2.20.1 > > > > > > > > > > _______________________________________________ > > > > > Intel-gfx mailing list > > > > > Intel-gfx@lists.freedesktop.org > > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Fri, Mar 15, 2019 at 02:43:46PM -0700, Rodrigo Vivi wrote: >On Fri, Mar 15, 2019 at 02:39:25PM -0700, Rodrigo Vivi wrote: >> On Fri, Mar 15, 2019 at 02:31:40PM -0700, Rodrigo Vivi wrote: >> > On Fri, Mar 15, 2019 at 01:27:22PM -0700, Vanshidhar Konda wrote: >> > > On Fri, Mar 15, 2019 at 12:38:41PM -0700, Rodrigo Vivi wrote: >> > > > On Fri, Mar 15, 2019 at 11:39:54AM -0700, Vanshidhar Konda wrote: >> > > > > Extend the timeout for the hardware to signal SEND_BUSY on the DP >> > > > > Aux Channel Controller register. >> > > > > >> > > > > This is needed to address FDO #109982 >> > > > > https://bugzilla.freedesktop.org/show_bug.cgi?id=109982 >> > > > >> > > > instead of mentioning like this, please use: >> > > > Bugzilla: https://bugzilla.freedesktop.org/show_bug.cgi?id=109982 >> > > > >> > > > Also instead of the "needed to address" it would be better >> > > > to add some reasoning explaining that >> > > > "empirically we got some bugs workarounded by increasing the timeout >> > > > from 10ms to 15ms although spec was only requiring 4ms" >> > > > or something like that... >> > > > >> > > Thanks! I'll keep these in mind for next patches. >> > > > > >> > > > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> >> > > > > Cc: Imre Deak <imre.deak@intel.com> >> > > > > Signed-off-by: Vanshidhar Konda <vanshidhar.r.konda@intel.com> >> > > > > --- >> > > > > drivers/gpu/drm/i915/intel_dp.c | 4 +++- >> > > > > 1 file changed, 3 insertions(+), 1 deletion(-) >> > > > > >> > > > > diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c >> > > > > index 47857f96c3b1..fd6de33c5664 100644 >> > > > > --- a/drivers/gpu/drm/i915/intel_dp.c >> > > > > +++ b/drivers/gpu/drm/i915/intel_dp.c >> > > > > @@ -1053,6 +1053,8 @@ intel_dp_check_edp(struct intel_dp *intel_dp) >> > > > > } >> > > > > } >> > > > > >> > > > > +#define DP_AUX_CH_CTL_SIGNAL_TIMEOUT_MS 15 >> > > > >> > > > This define is spurious since it's in use in a single place. >> > > > >> > > > Also, giving the timeout a name, like this, makes it appear it came from >> > > > the spec. Well, if it came from Spec it should be defined in the proper .h >> > > > files. >> > > > >> > > > Since I don't believe this came from spec I believe we can just remove it >> > > > and go for the timeout directly on the function below. >> > > > >> > > I'll make this change in the next patch. >> > > > > + >> > > > > static u32 >> > > > > intel_dp_aux_wait_done(struct intel_dp *intel_dp) >> > > > > { >> > > > > @@ -1063,7 +1065,7 @@ intel_dp_aux_wait_done(struct intel_dp *intel_dp) >> > > > > >> > > > > #define C (((status = I915_READ_NOTRACE(ch_ctl)) & DP_AUX_CH_CTL_SEND_BUSY) == 0) >> > > > > done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, >> > > > > - msecs_to_jiffies_timeout(10)); >> > > > > + msecs_to_jiffies_timeout(DP_AUX_CH_CTL_SIGNAL_TIMEOUT_MS)); >> > > > >> > > > Is this just a guess that you are trying to check? >> > > > I'm asking because I didn't see any indication that the increase >> > > > really fixed the issue. >> > > > >> > > > So, if you are trying to just validate your approach maybe the try-bot >> > > > could be used? >> > > > >> > > I also would have preferred to use the trybot. The error in Bugzilla 109982 is only reproduced on only shard-iclb2 machine once >> > > in approximately 2 days. So the trybot is not of much help for this. >> > > >> > > Is there another way of testing experimental patches for bugs that are >> > > not reproduced locally or on other machines? Or, if the issue is only >> > > happening on one machine, should it be lowered in priority/whitelisted >> > > on the specific CI machine? >> > >> > For now I believe the best alternative is to --subject-prefix="[CI]" >> > >> > But also you need to be sure that the fail rate don't fool us... >> >> hmm... this testcase seems to have a passrate of 77% of the runs. >> So probably using the CI infrastructure here isn't the best alternative >> to validate this approach in general. > >ops... I forgot to paste the link from these data: >https://intel-gfx-ci.01.org/tree/drm-tip/shard-iclb.html >(30 runs out of 45 passing) > Sorry, I don't think I understood what you are saying here. The CI/IGT test that is reported in the bug itself doesn't have an issue. It pases successfully. The issue reported by Martin is that there is an error printed out to dmesg. <3> [1128.098368] [drm:intel_dp_aux_xfer [i915]] *ERROR* dp aux hw did not signal timeout! I noticed that there is a similar failure on one other KBL system (fi-kbl-7500u) where this issue occurs regularly. But on that platform even after 5 try programming the DP Aux Channel Controller fails. So, this might be a one off hardware platform issue? >> >> > >> > I believe there will be better alternatives for using CI for cases like >> > this soon, so +Martin >> > >> > > >> > > > Thanks, >> > > > Rodrigo. >> > > > >> > > > > >> > > > > /* just trace the final value */ >> > > > > trace_i915_reg_rw(false, ch_ctl, status, sizeof(status), true); >> > > > > -- >> > > > > 2.20.1 >> > > > > >> > > > > _______________________________________________ >> > > > > Intel-gfx mailing list >> > > > > Intel-gfx@lists.freedesktop.org >> > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx >> > _______________________________________________ >> > Intel-gfx mailing list >> > Intel-gfx@lists.freedesktop.org >> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx >> _______________________________________________ >> Intel-gfx mailing list >> Intel-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Fri, Mar 15, 2019 at 04:05:12PM -0700, Vanshidhar Konda wrote: > On Fri, Mar 15, 2019 at 02:43:46PM -0700, Rodrigo Vivi wrote: > > On Fri, Mar 15, 2019 at 02:39:25PM -0700, Rodrigo Vivi wrote: > > > On Fri, Mar 15, 2019 at 02:31:40PM -0700, Rodrigo Vivi wrote: > > > > On Fri, Mar 15, 2019 at 01:27:22PM -0700, Vanshidhar Konda wrote: > > > > > On Fri, Mar 15, 2019 at 12:38:41PM -0700, Rodrigo Vivi wrote: > > > > > > On Fri, Mar 15, 2019 at 11:39:54AM -0700, Vanshidhar Konda wrote: > > > > > > > Extend the timeout for the hardware to signal SEND_BUSY on the DP > > > > > > > Aux Channel Controller register. > > > > > > > > > > > > > > This is needed to address FDO #109982 > > > > > > > https://bugzilla.freedesktop.org/show_bug.cgi?id=109982 > > > > > > > > > > > > instead of mentioning like this, please use: > > > > > > Bugzilla: https://bugzilla.freedesktop.org/show_bug.cgi?id=109982 > > > > > > > > > > > > Also instead of the "needed to address" it would be better > > > > > > to add some reasoning explaining that > > > > > > "empirically we got some bugs workarounded by increasing the timeout > > > > > > from 10ms to 15ms although spec was only requiring 4ms" > > > > > > or something like that... > > > > > > > > > > > Thanks! I'll keep these in mind for next patches. > > > > > > > > > > > > > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > > > > > Cc: Imre Deak <imre.deak@intel.com> > > > > > > > Signed-off-by: Vanshidhar Konda <vanshidhar.r.konda@intel.com> > > > > > > > --- > > > > > > > drivers/gpu/drm/i915/intel_dp.c | 4 +++- > > > > > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c > > > > > > > index 47857f96c3b1..fd6de33c5664 100644 > > > > > > > --- a/drivers/gpu/drm/i915/intel_dp.c > > > > > > > +++ b/drivers/gpu/drm/i915/intel_dp.c > > > > > > > @@ -1053,6 +1053,8 @@ intel_dp_check_edp(struct intel_dp *intel_dp) > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > +#define DP_AUX_CH_CTL_SIGNAL_TIMEOUT_MS 15 > > > > > > > > > > > > This define is spurious since it's in use in a single place. > > > > > > > > > > > > Also, giving the timeout a name, like this, makes it appear it came from > > > > > > the spec. Well, if it came from Spec it should be defined in the proper .h > > > > > > files. > > > > > > > > > > > > Since I don't believe this came from spec I believe we can just remove it > > > > > > and go for the timeout directly on the function below. > > > > > > > > > > > I'll make this change in the next patch. > > > > > > > + > > > > > > > static u32 > > > > > > > intel_dp_aux_wait_done(struct intel_dp *intel_dp) > > > > > > > { > > > > > > > @@ -1063,7 +1065,7 @@ intel_dp_aux_wait_done(struct intel_dp *intel_dp) > > > > > > > > > > > > > > #define C (((status = I915_READ_NOTRACE(ch_ctl)) & DP_AUX_CH_CTL_SEND_BUSY) == 0) > > > > > > > done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, > > > > > > > - msecs_to_jiffies_timeout(10)); > > > > > > > + msecs_to_jiffies_timeout(DP_AUX_CH_CTL_SIGNAL_TIMEOUT_MS)); > > > > > > > > > > > > Is this just a guess that you are trying to check? > > > > > > I'm asking because I didn't see any indication that the increase > > > > > > really fixed the issue. > > > > > > > > > > > > So, if you are trying to just validate your approach maybe the try-bot > > > > > > could be used? > > > > > > > > > > > I also would have preferred to use the trybot. The error in Bugzilla 109982 is only reproduced on only shard-iclb2 machine once > > > > > in approximately 2 days. So the trybot is not of much help for this. > > > > > > > > > > Is there another way of testing experimental patches for bugs that are > > > > > not reproduced locally or on other machines? Or, if the issue is only > > > > > happening on one machine, should it be lowered in priority/whitelisted > > > > > on the specific CI machine? > > > > > > > > For now I believe the best alternative is to --subject-prefix="[CI]" > > > > > > > > But also you need to be sure that the fail rate don't fool us... > > > > > > hmm... this testcase seems to have a passrate of 77% of the runs. > > > So probably using the CI infrastructure here isn't the best alternative > > > to validate this approach in general. > > > > ops... I forgot to paste the link from these data: > > https://intel-gfx-ci.01.org/tree/drm-tip/shard-iclb.html > > (30 runs out of 45 passing) > > > Sorry, I don't think I understood what you are saying here. The CI/IGT > test that is reported in the bug itself doesn't have an issue. It pases > successfully. > > The issue reported by Martin is that there is an error printed out to dmesg. > <3> [1128.098368] [drm:intel_dp_aux_xfer [i915]] *ERROR* dp aux hw did > not signal timeout! 30 runs out of 45 of this test on this ICL comes with full clean result. This mean this message of *ERROR* didn't occur. Martin can correct me if my interpretation is wrong here. But with this "high" pass rate you won't be able to prove that your timeout increase is the real fix of the problem. Unless you are able to reproduce it locally and run many times enough to validate this is reliably and constantly passing. > > I noticed that there is a similar failure on one other KBL system > (fi-kbl-7500u) where this issue occurs regularly. But on that platform > even after 5 try programming the DP Aux Channel Controller fails. > > So, this might be a one off hardware platform issue? > > > > > > > > > > > > I believe there will be better alternatives for using CI for cases like > > > > this soon, so +Martin > > > > > > > > > > > > > > > Thanks, > > > > > > Rodrigo. > > > > > > > > > > > > > > > > > > > > /* just trace the final value */ > > > > > > > trace_i915_reg_rw(false, ch_ctl, status, sizeof(status), true); > > > > > > > -- > > > > > > > 2.20.1 > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Intel-gfx mailing list > > > > > > > Intel-gfx@lists.freedesktop.org > > > > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx > > > > _______________________________________________ > > > > Intel-gfx mailing list > > > > Intel-gfx@lists.freedesktop.org > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx > > > _______________________________________________ > > > Intel-gfx mailing list > > > Intel-gfx@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c index 47857f96c3b1..fd6de33c5664 100644 --- a/drivers/gpu/drm/i915/intel_dp.c +++ b/drivers/gpu/drm/i915/intel_dp.c @@ -1053,6 +1053,8 @@ intel_dp_check_edp(struct intel_dp *intel_dp) } } +#define DP_AUX_CH_CTL_SIGNAL_TIMEOUT_MS 15 + static u32 intel_dp_aux_wait_done(struct intel_dp *intel_dp) { @@ -1063,7 +1065,7 @@ intel_dp_aux_wait_done(struct intel_dp *intel_dp) #define C (((status = I915_READ_NOTRACE(ch_ctl)) & DP_AUX_CH_CTL_SEND_BUSY) == 0) done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, - msecs_to_jiffies_timeout(10)); + msecs_to_jiffies_timeout(DP_AUX_CH_CTL_SIGNAL_TIMEOUT_MS)); /* just trace the final value */ trace_i915_reg_rw(false, ch_ctl, status, sizeof(status), true);
Extend the timeout for the hardware to signal SEND_BUSY on the DP Aux Channel Controller register. This is needed to address FDO #109982 https://bugzilla.freedesktop.org/show_bug.cgi?id=109982 Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Imre Deak <imre.deak@intel.com> Signed-off-by: Vanshidhar Konda <vanshidhar.r.konda@intel.com> --- drivers/gpu/drm/i915/intel_dp.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)