Message ID | 1460049678-21918-5-git-send-email-david.s.gordon@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 04/07/2016 10:21 AM, Dave Gordon wrote: > During a hibernate/resume cycle, the whole system is reset, including > the GuC and the doorbell hardware. Then the system is booted up, drivers > are loaded, etc -- the GuC firmware may be loaded and set running at this > point. But then, the booted kernel is replaced by the hibernated image, > and this resumed kernel will also try to reload the GuC firmware (which > will fail). To recover, we reset the GuC and try again (which should > work). But this GuC reset doesn't also reset the doorbell hardware, so > it can be left in a state inconsistent with that assumed by the driver > and the GuC. > > It would be better if the GuC reset also cleared all doorbell state, > but that's not how the hardware currently works; also, the driver cannot > directly reprogram the doorbell hardware (only the GuC can do that). > > So this patch cycles through all doorbells, assigning and releasing each > in turn, so that all the doorbell hardware is left in a consistent state, > no matter how it was programmed by the previously-running kernel and/or > GuC firmware. > > This patch can be removed if/when the GuC firmware is updated so that it > (re)initialises the doorbell hardware after every firmware (re)load. > > Signed-off-by: Dave Gordon <david.s.gordon@intel.com> > --- > drivers/gpu/drm/i915/i915_guc_submission.c | 46 +++++++++++++++++++++++++++++- > 1 file changed, 45 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c > index 2fc69f1..f466eab 100644 > --- a/drivers/gpu/drm/i915/i915_guc_submission.c > +++ b/drivers/gpu/drm/i915/i915_guc_submission.c > @@ -707,6 +707,50 @@ static void guc_client_free(struct drm_device *dev, > kfree(client); > } > > +/* > + * Borrow the first client to set up & tear down every doorbell > + * in turn, to ensure that all doorbell h/w is (re)initialised. > + */ > +static void guc_init_doorbell_hw(struct intel_guc *guc) > +{ > + struct drm_i915_private *dev_priv = guc_to_i915(guc); > + struct i915_guc_client *client = guc->execbuf_client; > + struct guc_doorbell_info *doorbell; > + uint16_t db_id, i; > + void *base; > + int ret; > + > + base = kmap_atomic(i915_gem_object_get_page(client->client_obj, 0)); > + doorbell = base + client->doorbell_offset; > + db_id = client->doorbell_id; > + > + for (i = 0; i < GUC_MAX_DOORBELLS; ++i) { > + i915_reg_t drbreg = GEN8_DRBREGL(i); > + u32 value = I915_READ(drbreg); > + > + ret = guc_update_doorbell_id(client, doorbell, i); > + > + if (((value & GUC_DOORBELL_ENABLED) && (i != db_id)) || ret) > + DRM_DEBUG_DRIVER("Doorbell reg 0x%x was 0x%x, ret %d\n", > + drbreg.reg, value, ret); > + } > + > + /* Restore to original value */ > + guc_update_doorbell_id(client, doorbell, db_id); > + > + for (i = 0; i < GUC_MAX_DOORBELLS; ++i) { > + i915_reg_t drbreg = GEN8_DRBREGL(i); > + u32 value = I915_READ(drbreg); > + > + if ((value & GUC_DOORBELL_ENABLED) && (i != db_id)) > + DRM_DEBUG_DRIVER("Doorbell reg 0x%x finally 0x%x\n", > + drbreg.reg, value); > + > + } > + The for loop above is not needed. It can be merged into previous loop by print out new drbreg value (read it again after update_doorbell_id). At this point, only need to check if db_id is correctly enabled or not by print out I915_READ(GEN8_DRBREGL(db_id)). Alex > + kunmap_atomic(base); > +} > + > /** > * guc_client_alloc() - Allocate an i915_guc_client > * @dev: drm device > @@ -971,8 +1015,8 @@ int i915_guc_submission_enable(struct drm_device *dev) > } > > guc->execbuf_client = client; > - > host2guc_sample_forcewake(guc, client); > + guc_init_doorbell_hw(guc); > > return 0; > }
On 13/04/16 18:50, Yu Dai wrote: > > > On 04/07/2016 10:21 AM, Dave Gordon wrote: >> During a hibernate/resume cycle, the whole system is reset, including >> the GuC and the doorbell hardware. Then the system is booted up, drivers >> are loaded, etc -- the GuC firmware may be loaded and set running at this >> point. But then, the booted kernel is replaced by the hibernated image, >> and this resumed kernel will also try to reload the GuC firmware (which >> will fail). To recover, we reset the GuC and try again (which should >> work). But this GuC reset doesn't also reset the doorbell hardware, so >> it can be left in a state inconsistent with that assumed by the driver >> and the GuC. >> >> It would be better if the GuC reset also cleared all doorbell state, >> but that's not how the hardware currently works; also, the driver cannot >> directly reprogram the doorbell hardware (only the GuC can do that). >> >> So this patch cycles through all doorbells, assigning and releasing each >> in turn, so that all the doorbell hardware is left in a consistent state, >> no matter how it was programmed by the previously-running kernel and/or >> GuC firmware. >> >> This patch can be removed if/when the GuC firmware is updated so that it >> (re)initialises the doorbell hardware after every firmware (re)load. >> >> Signed-off-by: Dave Gordon <david.s.gordon@intel.com> >> --- >> drivers/gpu/drm/i915/i915_guc_submission.c | 46 >> +++++++++++++++++++++++++++++- >> 1 file changed, 45 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c >> b/drivers/gpu/drm/i915/i915_guc_submission.c >> index 2fc69f1..f466eab 100644 >> --- a/drivers/gpu/drm/i915/i915_guc_submission.c >> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c >> @@ -707,6 +707,50 @@ static void guc_client_free(struct drm_device *dev, >> kfree(client); >> } >> +/* >> + * Borrow the first client to set up & tear down every doorbell >> + * in turn, to ensure that all doorbell h/w is (re)initialised. >> + */ >> +static void guc_init_doorbell_hw(struct intel_guc *guc) >> +{ >> + struct drm_i915_private *dev_priv = guc_to_i915(guc); >> + struct i915_guc_client *client = guc->execbuf_client; >> + struct guc_doorbell_info *doorbell; >> + uint16_t db_id, i; >> + void *base; >> + int ret; >> + >> + base = kmap_atomic(i915_gem_object_get_page(client->client_obj, 0)); >> + doorbell = base + client->doorbell_offset; >> + db_id = client->doorbell_id; >> + >> + for (i = 0; i < GUC_MAX_DOORBELLS; ++i) { >> + i915_reg_t drbreg = GEN8_DRBREGL(i); >> + u32 value = I915_READ(drbreg); >> + >> + ret = guc_update_doorbell_id(client, doorbell, i); >> + >> + if (((value & GUC_DOORBELL_ENABLED) && (i != db_id)) || ret) >> + DRM_DEBUG_DRIVER("Doorbell reg 0x%x was 0x%x, ret %d\n", >> + drbreg.reg, value, ret); >> + } >> + >> + /* Restore to original value */ >> + guc_update_doorbell_id(client, doorbell, db_id); >> + >> + for (i = 0; i < GUC_MAX_DOORBELLS; ++i) { >> + i915_reg_t drbreg = GEN8_DRBREGL(i); >> + u32 value = I915_READ(drbreg); >> + >> + if ((value & GUC_DOORBELL_ENABLED) && (i != db_id)) >> + DRM_DEBUG_DRIVER("Doorbell reg 0x%x finally 0x%x\n", >> + drbreg.reg, value); >> + >> + } >> + > > The for loop above is not needed. It can be merged into previous loop by > print out new drbreg value (read it again after update_doorbell_id). > > At this point, only need to check if db_id is correctly enabled or not > by print out I915_READ(GEN8_DRBREGL(db_id)). > > Alex No, the idea is not to check that the GuC call has *enabled* each selected doorbell, but to check that after the end of the first loop, and the subsequent restore, all *other* doorbells have been *disabled*. We're only *selecting* each doorbell so that we can then *deselect* it as a side effect of selecting the next one! Hence separate loop required ... .Dave. >> + kunmap_atomic(base); >> +} >> + >> /** >> * guc_client_alloc() - Allocate an i915_guc_client >> * @dev: drm device >> @@ -971,8 +1015,8 @@ int i915_guc_submission_enable(struct drm_device >> *dev) >> } >> guc->execbuf_client = client; >> - >> host2guc_sample_forcewake(guc, client); >> + guc_init_doorbell_hw(guc); >> return 0; >> } >
On 04/13/2016 12:46 PM, Dave Gordon wrote: > On 13/04/16 18:50, Yu Dai wrote: > > > > > > On 04/07/2016 10:21 AM, Dave Gordon wrote: > >> During a hibernate/resume cycle, the whole system is reset, including > >> the GuC and the doorbell hardware. Then the system is booted up, drivers > >> are loaded, etc -- the GuC firmware may be loaded and set running at this > >> point. But then, the booted kernel is replaced by the hibernated image, > >> and this resumed kernel will also try to reload the GuC firmware (which > >> will fail). To recover, we reset the GuC and try again (which should > >> work). But this GuC reset doesn't also reset the doorbell hardware, so > >> it can be left in a state inconsistent with that assumed by the driver > >> and the GuC. > >> > >> It would be better if the GuC reset also cleared all doorbell state, > >> but that's not how the hardware currently works; also, the driver cannot > >> directly reprogram the doorbell hardware (only the GuC can do that). > >> > >> So this patch cycles through all doorbells, assigning and releasing each > >> in turn, so that all the doorbell hardware is left in a consistent state, > >> no matter how it was programmed by the previously-running kernel and/or > >> GuC firmware. > >> > >> This patch can be removed if/when the GuC firmware is updated so that it > >> (re)initialises the doorbell hardware after every firmware (re)load. > >> > >> Signed-off-by: Dave Gordon <david.s.gordon@intel.com> > >> --- > >> drivers/gpu/drm/i915/i915_guc_submission.c | 46 > >> +++++++++++++++++++++++++++++- > >> 1 file changed, 45 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c > >> b/drivers/gpu/drm/i915/i915_guc_submission.c > >> index 2fc69f1..f466eab 100644 > >> --- a/drivers/gpu/drm/i915/i915_guc_submission.c > >> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c > >> @@ -707,6 +707,50 @@ static void guc_client_free(struct drm_device *dev, > >> kfree(client); > >> } > >> +/* > >> + * Borrow the first client to set up & tear down every doorbell > >> + * in turn, to ensure that all doorbell h/w is (re)initialised. > >> + */ > >> +static void guc_init_doorbell_hw(struct intel_guc *guc) > >> +{ > >> + struct drm_i915_private *dev_priv = guc_to_i915(guc); > >> + struct i915_guc_client *client = guc->execbuf_client; > >> + struct guc_doorbell_info *doorbell; > >> + uint16_t db_id, i; > >> + void *base; > >> + int ret; > >> + > >> + base = kmap_atomic(i915_gem_object_get_page(client->client_obj, 0)); > >> + doorbell = base + client->doorbell_offset; > >> + db_id = client->doorbell_id; > >> + > >> + for (i = 0; i < GUC_MAX_DOORBELLS; ++i) { > >> + i915_reg_t drbreg = GEN8_DRBREGL(i); > >> + u32 value = I915_READ(drbreg); > >> + > >> + ret = guc_update_doorbell_id(client, doorbell, i); > >> + > >> + if (((value & GUC_DOORBELL_ENABLED) && (i != db_id)) || ret) > >> + DRM_DEBUG_DRIVER("Doorbell reg 0x%x was 0x%x, ret %d\n", > >> + drbreg.reg, value, ret); > >> + } > >> + > >> + /* Restore to original value */ > >> + guc_update_doorbell_id(client, doorbell, db_id); > >> + > >> + for (i = 0; i < GUC_MAX_DOORBELLS; ++i) { > >> + i915_reg_t drbreg = GEN8_DRBREGL(i); > >> + u32 value = I915_READ(drbreg); > >> + > >> + if ((value & GUC_DOORBELL_ENABLED) && (i != db_id)) > >> + DRM_DEBUG_DRIVER("Doorbell reg 0x%x finally 0x%x\n", > >> + drbreg.reg, value); > >> + > >> + } > >> + > > > > The for loop above is not needed. It can be merged into previous loop by > > print out new drbreg value (read it again after update_doorbell_id). > > > > At this point, only need to check if db_id is correctly enabled or not > > by print out I915_READ(GEN8_DRBREGL(db_id)). > > > > Alex > > No, the idea is not to check that the GuC call has *enabled* each > selected doorbell, but to check that after the end of the first loop, > and the subsequent restore, all *other* doorbells have been *disabled*. > We're only *selecting* each doorbell so that we can then *deselect* it > as a side effect of selecting the next one! > > Hence separate loop required ... > > .Dave. This still can be done by backup of previous client->doorbell_id. If it is not same as the desired db_id, then make sure it is *disabled* after the update. The real problem here, at least not for now, is that it assumes there is only one guc_client. In future, if there is user created guc_client, the code doesn't restore doorbell for it. Alex > >> + kunmap_atomic(base); > >> +} > >> + > >> /** > >> * guc_client_alloc() - Allocate an i915_guc_client > >> * @dev: drm device > >> @@ -971,8 +1015,8 @@ int i915_guc_submission_enable(struct drm_device > >> *dev) > >> } > >> guc->execbuf_client = client; > >> - > >> host2guc_sample_forcewake(guc, client); > >> + guc_init_doorbell_hw(guc); > >> return 0; > >> } > > >
On 13/04/16 21:13, Yu Dai wrote: > On 04/13/2016 12:46 PM, Dave Gordon wrote: >> On 13/04/16 18:50, Yu Dai wrote: >> > >> > On 04/07/2016 10:21 AM, Dave Gordon wrote: >> >> During a hibernate/resume cycle, the whole system is reset, including >> >> the GuC and the doorbell hardware. Then the system is booted up, >> drivers >> >> are loaded, etc -- the GuC firmware may be loaded and set running >> at this >> >> point. But then, the booted kernel is replaced by the hibernated >> image, >> >> and this resumed kernel will also try to reload the GuC firmware >> (which >> >> will fail). To recover, we reset the GuC and try again (which should >> >> work). But this GuC reset doesn't also reset the doorbell hardware, so >> >> it can be left in a state inconsistent with that assumed by the driver >> >> and the GuC. >> >> >> >> It would be better if the GuC reset also cleared all doorbell state, >> >> but that's not how the hardware currently works; also, the driver >> cannot >> >> directly reprogram the doorbell hardware (only the GuC can do that). >> >> >> >> So this patch cycles through all doorbells, assigning and releasing >> each >> >> in turn, so that all the doorbell hardware is left in a consistent >> state, >> >> no matter how it was programmed by the previously-running kernel >> and/or >> >> GuC firmware. >> >> >> >> This patch can be removed if/when the GuC firmware is updated so >> that it >> >> (re)initialises the doorbell hardware after every firmware (re)load. >> >> >> >> Signed-off-by: Dave Gordon <david.s.gordon@intel.com> >> >> --- >> >> drivers/gpu/drm/i915/i915_guc_submission.c | 46 >> >> +++++++++++++++++++++++++++++- >> >> 1 file changed, 45 insertions(+), 1 deletion(-) >> >> >> >> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c >> >> b/drivers/gpu/drm/i915/i915_guc_submission.c >> >> index 2fc69f1..f466eab 100644 >> >> --- a/drivers/gpu/drm/i915/i915_guc_submission.c >> >> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c >> >> @@ -707,6 +707,50 @@ static void guc_client_free(struct drm_device >> *dev, >> >> kfree(client); >> >> } >> >> +/* >> >> + * Borrow the first client to set up & tear down every doorbell >> >> + * in turn, to ensure that all doorbell h/w is (re)initialised. >> >> + */ >> >> +static void guc_init_doorbell_hw(struct intel_guc *guc) >> >> +{ >> >> + struct drm_i915_private *dev_priv = guc_to_i915(guc); >> >> + struct i915_guc_client *client = guc->execbuf_client; >> >> + struct guc_doorbell_info *doorbell; >> >> + uint16_t db_id, i; >> >> + void *base; >> >> + int ret; >> >> + >> >> + base = >> kmap_atomic(i915_gem_object_get_page(client->client_obj, 0)); >> >> + doorbell = base + client->doorbell_offset; >> >> + db_id = client->doorbell_id; >> >> + >> >> + for (i = 0; i < GUC_MAX_DOORBELLS; ++i) { >> >> + i915_reg_t drbreg = GEN8_DRBREGL(i); >> >> + u32 value = I915_READ(drbreg); >> >> + >> >> + ret = guc_update_doorbell_id(client, doorbell, i); >> >> + >> >> + if (((value & GUC_DOORBELL_ENABLED) && (i != db_id)) || ret) >> >> + DRM_DEBUG_DRIVER("Doorbell reg 0x%x was 0x%x, ret %d\n", >> >> + drbreg.reg, value, ret); >> >> + } >> >> + >> >> + /* Restore to original value */ >> >> + guc_update_doorbell_id(client, doorbell, db_id); >> >> + >> >> + for (i = 0; i < GUC_MAX_DOORBELLS; ++i) { >> >> + i915_reg_t drbreg = GEN8_DRBREGL(i); >> >> + u32 value = I915_READ(drbreg); >> >> + >> >> + if ((value & GUC_DOORBELL_ENABLED) && (i != db_id)) >> >> + DRM_DEBUG_DRIVER("Doorbell reg 0x%x finally 0x%x\n", >> >> + drbreg.reg, value); >> >> + >> >> + } >> >> + >> > >> > The for loop above is not needed. It can be merged into previous >> loop by >> > print out new drbreg value (read it again after update_doorbell_id). >> > >> > At this point, only need to check if db_id is correctly enabled or not >> > by print out I915_READ(GEN8_DRBREGL(db_id)). >> > >> > Alex >> >> No, the idea is not to check that the GuC call has *enabled* each >> selected doorbell, but to check that after the end of the first loop, >> and the subsequent restore, all *other* doorbells have been *disabled*. >> We're only *selecting* each doorbell so that we can then *deselect* it >> as a side effect of selecting the next one! >> >> Hence separate loop required ... >> >> .Dave. > > This still can be done by backup of previous client->doorbell_id. If it > is not same as the desired db_id, then make sure it is *disabled* after > the update. No, I also want to check that there's no crosstalk, so that after having swept the entire space, there's exactly ONE doorbell enabled, and it's the one we most recently associated with the client. > The real problem here, at least not for now, is that it assumes there is > only one guc_client. In future, if there is user created guc_client, the > code doesn't restore doorbell for it. > > Alex By definition there is only one (newly-created) client at the point where this is called. I'd have done this first, but we need a client for talking to the GuC! Any additional clients (e.g. for preemption) will be created afterwards. This entire function wouldn't be necessary if the GuC would guarantee to clear all doorbells at startup, or if the hardware provide a RESET signal for the doorbell block :( BTW the persistent-kmap patch should be merged today, so I'll have to update the above to get rid of the kmap_atomic() above. .Dave. >> >> + kunmap_atomic(base); >> >> +} >> >> + >> >> /** >> >> * guc_client_alloc() - Allocate an i915_guc_client >> >> * @dev: drm device >> >> @@ -971,8 +1015,8 @@ int i915_guc_submission_enable(struct drm_device >> >> *dev) >> >> } >> >> guc->execbuf_client = client; >> >> - >> >> host2guc_sample_forcewake(guc, client); >> >> + guc_init_doorbell_hw(guc); >> >> return 0; >> >> }
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c index 2fc69f1..f466eab 100644 --- a/drivers/gpu/drm/i915/i915_guc_submission.c +++ b/drivers/gpu/drm/i915/i915_guc_submission.c @@ -707,6 +707,50 @@ static void guc_client_free(struct drm_device *dev, kfree(client); } +/* + * Borrow the first client to set up & tear down every doorbell + * in turn, to ensure that all doorbell h/w is (re)initialised. + */ +static void guc_init_doorbell_hw(struct intel_guc *guc) +{ + struct drm_i915_private *dev_priv = guc_to_i915(guc); + struct i915_guc_client *client = guc->execbuf_client; + struct guc_doorbell_info *doorbell; + uint16_t db_id, i; + void *base; + int ret; + + base = kmap_atomic(i915_gem_object_get_page(client->client_obj, 0)); + doorbell = base + client->doorbell_offset; + db_id = client->doorbell_id; + + for (i = 0; i < GUC_MAX_DOORBELLS; ++i) { + i915_reg_t drbreg = GEN8_DRBREGL(i); + u32 value = I915_READ(drbreg); + + ret = guc_update_doorbell_id(client, doorbell, i); + + if (((value & GUC_DOORBELL_ENABLED) && (i != db_id)) || ret) + DRM_DEBUG_DRIVER("Doorbell reg 0x%x was 0x%x, ret %d\n", + drbreg.reg, value, ret); + } + + /* Restore to original value */ + guc_update_doorbell_id(client, doorbell, db_id); + + for (i = 0; i < GUC_MAX_DOORBELLS; ++i) { + i915_reg_t drbreg = GEN8_DRBREGL(i); + u32 value = I915_READ(drbreg); + + if ((value & GUC_DOORBELL_ENABLED) && (i != db_id)) + DRM_DEBUG_DRIVER("Doorbell reg 0x%x finally 0x%x\n", + drbreg.reg, value); + + } + + kunmap_atomic(base); +} + /** * guc_client_alloc() - Allocate an i915_guc_client * @dev: drm device @@ -971,8 +1015,8 @@ int i915_guc_submission_enable(struct drm_device *dev) } guc->execbuf_client = client; - host2guc_sample_forcewake(guc, client); + guc_init_doorbell_hw(guc); return 0; }
During a hibernate/resume cycle, the whole system is reset, including the GuC and the doorbell hardware. Then the system is booted up, drivers are loaded, etc -- the GuC firmware may be loaded and set running at this point. But then, the booted kernel is replaced by the hibernated image, and this resumed kernel will also try to reload the GuC firmware (which will fail). To recover, we reset the GuC and try again (which should work). But this GuC reset doesn't also reset the doorbell hardware, so it can be left in a state inconsistent with that assumed by the driver and the GuC. It would be better if the GuC reset also cleared all doorbell state, but that's not how the hardware currently works; also, the driver cannot directly reprogram the doorbell hardware (only the GuC can do that). So this patch cycles through all doorbells, assigning and releasing each in turn, so that all the doorbell hardware is left in a consistent state, no matter how it was programmed by the previously-running kernel and/or GuC firmware. This patch can be removed if/when the GuC firmware is updated so that it (re)initialises the doorbell hardware after every firmware (re)load. Signed-off-by: Dave Gordon <david.s.gordon@intel.com> --- drivers/gpu/drm/i915/i915_guc_submission.c | 46 +++++++++++++++++++++++++++++- 1 file changed, 45 insertions(+), 1 deletion(-)