Message ID | 20181002215430.15049-1-daniele.ceraolospurio@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2,1/3] drm/i915/guc: init GuC descriptors after GuC load | expand |
On 02/10/18 15:39, Patchwork wrote: > == Series Details == > > Series: series starting with [v2,1/3] drm/i915/guc: init GuC descriptors after GuC load > URL : https://patchwork.freedesktop.org/series/50464/ > State : failure > > == Summary == > > = CI Bug Log - changes from CI_DRM_4915 -> Patchwork_10331 = > > == Summary - FAILURE == > > Serious unknown changes coming with Patchwork_10331 absolutely need to be > verified manually. > > If you think the reported changes have nothing to do with the changes > introduced in Patchwork_10331, please notify your bug team to allow them > to document this new failure mode, which will reduce false positives in CI. > > External URL: https://patchwork.freedesktop.org/api/1.0/series/50464/revisions/1/mbox/ > > == Possible new issues == > > Here are the unknown changes that may have been introduced in Patchwork_10331: > > === IGT changes === > > ==== Possible regressions ==== > > igt@drv_selftest@live_gem: > fi-whl-u: PASS -> INCOMPLETE > fi-skl-6600u: PASS -> INCOMPLETE > fi-kbl-7560u: PASS -> INCOMPLETE > fi-cfl-s3: PASS -> INCOMPLETE > fi-skl-iommu: PASS -> INCOMPLETE > fi-skl-6700k2: PASS -> INCOMPLETE > fi-skl-6700hq: PASS -> INCOMPLETE > fi-cfl-8109u: PASS -> INCOMPLETE > fi-kbl-7500u: PASS -> INCOMPLETE > fi-cfl-8700k: PASS -> INCOMPLETE > fi-skl-6770hq: PASS -> INCOMPLETE > fi-kbl-7567u: PASS -> INCOMPLETE > fi-kbl-x1275: PASS -> INCOMPLETE > fi-kbl-8809g: PASS -> INCOMPLETE > fi-kbl-r: PASS -> INCOMPLETE > Those failures are there even without my patches (see https://patchwork.freedesktop.org/series/40112/). Is there an existing bugzilla? In the meantime, I'll have a look to see if I can find what's causing this. Daniele > > ==== Warnings ==== > > igt@drv_selftest@live_guc: > fi-glk-j4005: SKIP -> PASS > > > == Known issues == > > Here are the changes found in Patchwork_10331 that come from known issues: > > === IGT changes === > > ==== Issues hit ==== > > igt@drv_module_reload@basic-reload: > fi-glk-j4005: PASS -> DMESG-WARN (fdo#106248, fdo#106725) > > igt@drv_selftest@live_gem: > fi-bxt-dsi: PASS -> INCOMPLETE (fdo#103927) > fi-skl-gvtdvm: PASS -> INCOMPLETE (fdo#105600) > fi-bxt-j4205: PASS -> INCOMPLETE (fdo#103927) > > igt@kms_flip@basic-flip-vs-wf_vblank: > fi-glk-j4005: PASS -> FAIL (fdo#103928) > > > ==== Possible fixes ==== > > igt@drv_selftest@live_coherency: > fi-gdg-551: DMESG-FAIL (fdo#107164) -> PASS > > igt@drv_selftest@live_execlists: > fi-glk-j4005: INCOMPLETE (k.org#198133, fdo#103359) -> PASS > > igt@kms_pipe_crc_basic@read-crc-pipe-a-frame-sequence: > fi-byt-clapper: FAIL (fdo#107362, fdo#103191) -> PASS > > igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b: > fi-skl-guc: FAIL (fdo#103191) -> PASS > > > fdo#103191 https://bugs.freedesktop.org/show_bug.cgi?id=103191 > fdo#103359 https://bugs.freedesktop.org/show_bug.cgi?id=103359 > fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927 > fdo#103928 https://bugs.freedesktop.org/show_bug.cgi?id=103928 > fdo#105600 https://bugs.freedesktop.org/show_bug.cgi?id=105600 > fdo#106248 https://bugs.freedesktop.org/show_bug.cgi?id=106248 > fdo#106725 https://bugs.freedesktop.org/show_bug.cgi?id=106725 > fdo#107164 https://bugs.freedesktop.org/show_bug.cgi?id=107164 > fdo#107362 https://bugs.freedesktop.org/show_bug.cgi?id=107362 > k.org#198133 https://bugzilla.kernel.org/show_bug.cgi?id=198133 > > > == Participating hosts (46 -> 42) == > > Missing (4): fi-bsw-cyan fi-byt-squawks fi-icl-u2 fi-skl-6260u > > > == Build changes == > > * Linux: CI_DRM_4915 -> Patchwork_10331 > > CI_DRM_4915: 26e7a7d954a9c28b97af8ca7813f430fd9117232 @ git://anongit.freedesktop.org/gfx-ci/linux > IGT_4660: d0975646c50568e66e65b44b81d28232d059b94e @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools > Patchwork_10331: 5007c8c7c5a8b91022f2bf31c3ae02d941d40e04 @ git://anongit.freedesktop.org/gfx-ci/linux > > > == Linux commits == > > 5007c8c7c5a8 HAX enable GuC for CI > 82f033d0174e drm/i915/guc: Don't clear the cookie on doorbell destroy > 9830698b9d34 drm/i915/guc: init GuC descriptors after GuC load > > == Logs == > > For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10331/issues.html >
On Tue, 02 Oct 2018 23:54:28 +0200, Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> wrote: > GuC stores some data in there, which might be stale after a reset. > We already reset the WQ head and tail, but more things are being moved > to the descriptor with the interface updates. Instead of trying to track > them one by one, always memset and init the descriptors from scratch > after GuC is loaded. > The code is also reorganized so that the above operations and the > doorbell creation are grouped as "client enabling" > > v2: add proc_desc_fini for symmetry (Daniele), remove unneeded var init, > add guc_is_alive() (Michal) > > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> > Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > --- Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Michal
Quoting Daniele Ceraolo Spurio (2018-10-03 01:12:57) > > > On 02/10/18 15:39, Patchwork wrote: > > == Series Details == > > > > Series: series starting with [v2,1/3] drm/i915/guc: init GuC descriptors after GuC load > > URL : https://patchwork.freedesktop.org/series/50464/ > > State : failure > > > > == Summary == > > > > = CI Bug Log - changes from CI_DRM_4915 -> Patchwork_10331 = > > > > == Summary - FAILURE == > > > > Serious unknown changes coming with Patchwork_10331 absolutely need to be > > verified manually. > > > > If you think the reported changes have nothing to do with the changes > > introduced in Patchwork_10331, please notify your bug team to allow them > > to document this new failure mode, which will reduce false positives in CI. > > > > External URL: https://patchwork.freedesktop.org/api/1.0/series/50464/revisions/1/mbox/ > > > > == Possible new issues == > > > > Here are the unknown changes that may have been introduced in Patchwork_10331: > > > > === IGT changes === > > > > ==== Possible regressions ==== > > > > igt@drv_selftest@live_gem: > > fi-whl-u: PASS -> INCOMPLETE > > fi-skl-6600u: PASS -> INCOMPLETE > > fi-kbl-7560u: PASS -> INCOMPLETE > > fi-cfl-s3: PASS -> INCOMPLETE > > fi-skl-iommu: PASS -> INCOMPLETE > > fi-skl-6700k2: PASS -> INCOMPLETE > > fi-skl-6700hq: PASS -> INCOMPLETE > > fi-cfl-8109u: PASS -> INCOMPLETE > > fi-kbl-7500u: PASS -> INCOMPLETE > > fi-cfl-8700k: PASS -> INCOMPLETE > > fi-skl-6770hq: PASS -> INCOMPLETE > > fi-kbl-7567u: PASS -> INCOMPLETE > > fi-kbl-x1275: PASS -> INCOMPLETE > > fi-kbl-8809g: PASS -> INCOMPLETE > > fi-kbl-r: PASS -> INCOMPLETE > > > > Those failures are there even without my patches (see > https://patchwork.freedesktop.org/series/40112/). Is there an existing > bugzilla? In the meantime, I'll have a look to see if I can find what's > causing this. inject_preempt_context() fails when talking to the guc, catastrophe ensues. As shown above it's quite reliable after a fake suspend/resume, but it also happens during normal preemption (the preemption smoketest was added to exercise this issue). -Chris
On 03/10/18 08:24, Chris Wilson wrote: > Quoting Daniele Ceraolo Spurio (2018-10-03 01:12:57) >> >> >> On 02/10/18 15:39, Patchwork wrote: >>> == Series Details == >>> >>> Series: series starting with [v2,1/3] drm/i915/guc: init GuC descriptors after GuC load >>> URL : https://patchwork.freedesktop.org/series/50464/ >>> State : failure >>> >>> == Summary == >>> >>> = CI Bug Log - changes from CI_DRM_4915 -> Patchwork_10331 = >>> >>> == Summary - FAILURE == >>> >>> Serious unknown changes coming with Patchwork_10331 absolutely need to be >>> verified manually. >>> >>> If you think the reported changes have nothing to do with the changes >>> introduced in Patchwork_10331, please notify your bug team to allow them >>> to document this new failure mode, which will reduce false positives in CI. >>> >>> External URL: https://patchwork.freedesktop.org/api/1.0/series/50464/revisions/1/mbox/ >>> >>> == Possible new issues == >>> >>> Here are the unknown changes that may have been introduced in Patchwork_10331: >>> >>> === IGT changes === >>> >>> ==== Possible regressions ==== >>> >>> igt@drv_selftest@live_gem: >>> fi-whl-u: PASS -> INCOMPLETE >>> fi-skl-6600u: PASS -> INCOMPLETE >>> fi-kbl-7560u: PASS -> INCOMPLETE >>> fi-cfl-s3: PASS -> INCOMPLETE >>> fi-skl-iommu: PASS -> INCOMPLETE >>> fi-skl-6700k2: PASS -> INCOMPLETE >>> fi-skl-6700hq: PASS -> INCOMPLETE >>> fi-cfl-8109u: PASS -> INCOMPLETE >>> fi-kbl-7500u: PASS -> INCOMPLETE >>> fi-cfl-8700k: PASS -> INCOMPLETE >>> fi-skl-6770hq: PASS -> INCOMPLETE >>> fi-kbl-7567u: PASS -> INCOMPLETE >>> fi-kbl-x1275: PASS -> INCOMPLETE >>> fi-kbl-8809g: PASS -> INCOMPLETE >>> fi-kbl-r: PASS -> INCOMPLETE >>> >> >> Those failures are there even without my patches (see >> https://patchwork.freedesktop.org/series/40112/). Is there an existing >> bugzilla? In the meantime, I'll have a look to see if I can find what's >> causing this. > > inject_preempt_context() fails when talking to the guc, catastrophe > ensues. As shown above it's quite reliable after a fake suspend/resume, > but it also happens during normal preemption (the preemption smoketest > was added to exercise this issue). > -Chris > Do you consider this a blocker to getting the patches merged? BTW, on my SKL even with the preemption smoketest I didn't see any issue on the tree I based the patches on (from Monday) and I only see issues after: b16c765122f987056e1dc9ef6c214571bb5bd694 is the first bad commit commit b16c765122f987056e1dc9ef6c214571bb5bd694 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Oct 1 15:47:53 2018 +0100 drm/i915: Priority boost for new clients However I don't get any error logs out (the machine just dies) so not sure if it is the same issue or not. with that patch and the 2 following related ones reverted I've been running the live selftests in a loop without issues. Is this the bug you mentioned or are those possibly 2 different issues? Thanks, Daniele
Quoting Daniele Ceraolo Spurio (2018-10-03 23:45:02) > > > On 03/10/18 08:24, Chris Wilson wrote: > > Quoting Daniele Ceraolo Spurio (2018-10-03 01:12:57) > >> > >> > >> On 02/10/18 15:39, Patchwork wrote: > >>> == Series Details == > >>> > >>> Series: series starting with [v2,1/3] drm/i915/guc: init GuC descriptors after GuC load > >>> URL : https://patchwork.freedesktop.org/series/50464/ > >>> State : failure > >>> > >>> == Summary == > >>> > >>> = CI Bug Log - changes from CI_DRM_4915 -> Patchwork_10331 = > >>> > >>> == Summary - FAILURE == > >>> > >>> Serious unknown changes coming with Patchwork_10331 absolutely need to be > >>> verified manually. > >>> > >>> If you think the reported changes have nothing to do with the changes > >>> introduced in Patchwork_10331, please notify your bug team to allow them > >>> to document this new failure mode, which will reduce false positives in CI. > >>> > >>> External URL: https://patchwork.freedesktop.org/api/1.0/series/50464/revisions/1/mbox/ > >>> > >>> == Possible new issues == > >>> > >>> Here are the unknown changes that may have been introduced in Patchwork_10331: > >>> > >>> === IGT changes === > >>> > >>> ==== Possible regressions ==== > >>> > >>> igt@drv_selftest@live_gem: > >>> fi-whl-u: PASS -> INCOMPLETE > >>> fi-skl-6600u: PASS -> INCOMPLETE > >>> fi-kbl-7560u: PASS -> INCOMPLETE > >>> fi-cfl-s3: PASS -> INCOMPLETE > >>> fi-skl-iommu: PASS -> INCOMPLETE > >>> fi-skl-6700k2: PASS -> INCOMPLETE > >>> fi-skl-6700hq: PASS -> INCOMPLETE > >>> fi-cfl-8109u: PASS -> INCOMPLETE > >>> fi-kbl-7500u: PASS -> INCOMPLETE > >>> fi-cfl-8700k: PASS -> INCOMPLETE > >>> fi-skl-6770hq: PASS -> INCOMPLETE > >>> fi-kbl-7567u: PASS -> INCOMPLETE > >>> fi-kbl-x1275: PASS -> INCOMPLETE > >>> fi-kbl-8809g: PASS -> INCOMPLETE > >>> fi-kbl-r: PASS -> INCOMPLETE > >>> > >> > >> Those failures are there even without my patches (see > >> https://patchwork.freedesktop.org/series/40112/). Is there an existing > >> bugzilla? In the meantime, I'll have a look to see if I can find what's > >> causing this. > > > > inject_preempt_context() fails when talking to the guc, catastrophe > > ensues. As shown above it's quite reliable after a fake suspend/resume, > > but it also happens during normal preemption (the preemption smoketest > > was added to exercise this issue). > > -Chris > > > > Do you consider this a blocker to getting the patches merged? > > BTW, on my SKL even with the preemption smoketest I didn't see any issue > on the tree I based the patches on (from Monday) and I only see issues > after: > > b16c765122f987056e1dc9ef6c214571bb5bd694 is the first bad commit > commit b16c765122f987056e1dc9ef6c214571bb5bd694 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Mon Oct 1 15:47:53 2018 +0100 > > drm/i915: Priority boost for new clients > > However I don't get any error logs out (the machine just dies) so not > sure if it is the same issue or not. with that patch and the 2 following > related ones reverted I've been running the live selftests in a loop > without issues. Is this the bug you mentioned or are those possibly 2 > different issues? Possibly different, but unlikely since it's still the preemption that is at the root cause. smoketest fails for me on bxt/kbl, either starting at a timeout waiting for the preemption report, or the failure to send the guc command. The difference with live_gem + preemption I think is all in the timing, in that it tries to do a very early preemption, shortly after the fw is loaded. If you are confident that these are the patches you want, done. -Chris
On 03/10/18 23:29, Chris Wilson wrote: > Quoting Daniele Ceraolo Spurio (2018-10-03 23:45:02) >> >> >> On 03/10/18 08:24, Chris Wilson wrote: >>> Quoting Daniele Ceraolo Spurio (2018-10-03 01:12:57) >>>> >>>> >>>> On 02/10/18 15:39, Patchwork wrote: >>>>> == Series Details == >>>>> >>>>> Series: series starting with [v2,1/3] drm/i915/guc: init GuC descriptors after GuC load >>>>> URL : https://patchwork.freedesktop.org/series/50464/ >>>>> State : failure >>>>> >>>>> == Summary == >>>>> >>>>> = CI Bug Log - changes from CI_DRM_4915 -> Patchwork_10331 = >>>>> >>>>> == Summary - FAILURE == >>>>> >>>>> Serious unknown changes coming with Patchwork_10331 absolutely need to be >>>>> verified manually. >>>>> >>>>> If you think the reported changes have nothing to do with the changes >>>>> introduced in Patchwork_10331, please notify your bug team to allow them >>>>> to document this new failure mode, which will reduce false positives in CI. >>>>> >>>>> External URL: https://patchwork.freedesktop.org/api/1.0/series/50464/revisions/1/mbox/ >>>>> >>>>> == Possible new issues == >>>>> >>>>> Here are the unknown changes that may have been introduced in Patchwork_10331: >>>>> >>>>> === IGT changes === >>>>> >>>>> ==== Possible regressions ==== >>>>> >>>>> igt@drv_selftest@live_gem: >>>>> fi-whl-u: PASS -> INCOMPLETE >>>>> fi-skl-6600u: PASS -> INCOMPLETE >>>>> fi-kbl-7560u: PASS -> INCOMPLETE >>>>> fi-cfl-s3: PASS -> INCOMPLETE >>>>> fi-skl-iommu: PASS -> INCOMPLETE >>>>> fi-skl-6700k2: PASS -> INCOMPLETE >>>>> fi-skl-6700hq: PASS -> INCOMPLETE >>>>> fi-cfl-8109u: PASS -> INCOMPLETE >>>>> fi-kbl-7500u: PASS -> INCOMPLETE >>>>> fi-cfl-8700k: PASS -> INCOMPLETE >>>>> fi-skl-6770hq: PASS -> INCOMPLETE >>>>> fi-kbl-7567u: PASS -> INCOMPLETE >>>>> fi-kbl-x1275: PASS -> INCOMPLETE >>>>> fi-kbl-8809g: PASS -> INCOMPLETE >>>>> fi-kbl-r: PASS -> INCOMPLETE >>>>> >>>> >>>> Those failures are there even without my patches (see >>>> https://patchwork.freedesktop.org/series/40112/). Is there an existing >>>> bugzilla? In the meantime, I'll have a look to see if I can find what's >>>> causing this. >>> >>> inject_preempt_context() fails when talking to the guc, catastrophe >>> ensues. As shown above it's quite reliable after a fake suspend/resume, >>> but it also happens during normal preemption (the preemption smoketest >>> was added to exercise this issue). >>> -Chris >>> >> >> Do you consider this a blocker to getting the patches merged? >> >> BTW, on my SKL even with the preemption smoketest I didn't see any issue >> on the tree I based the patches on (from Monday) and I only see issues >> after: >> >> b16c765122f987056e1dc9ef6c214571bb5bd694 is the first bad commit >> commit b16c765122f987056e1dc9ef6c214571bb5bd694 >> Author: Chris Wilson <chris@chris-wilson.co.uk> >> Date: Mon Oct 1 15:47:53 2018 +0100 >> >> drm/i915: Priority boost for new clients >> >> However I don't get any error logs out (the machine just dies) so not >> sure if it is the same issue or not. with that patch and the 2 following >> related ones reverted I've been running the live selftests in a loop >> without issues. Is this the bug you mentioned or are those possibly 2 >> different issues? > > Possibly different, but unlikely since it's still the preemption that is > at the root cause. smoketest fails for me on bxt/kbl, either starting at > a timeout waiting for the preemption report, or the failure to send the > guc command. The difference with live_gem + preemption I think is all in > the timing, in that it tries to do a very early preemption, shortly > after the fw is loaded. > > If you are confident that these are the patches you want, done. > -Chris > Thanks! I'll try to get my hands on another platform to see if I can pull out the guc logs in this scenario to see what the GuC perspective. Upcoming FW has also changes in the area that should help. Daniele
diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h index ad42faf48c46..0f1c4f9ebfd8 100644 --- a/drivers/gpu/drm/i915/intel_guc.h +++ b/drivers/gpu/drm/i915/intel_guc.h @@ -95,6 +95,11 @@ struct intel_guc { void (*notify)(struct intel_guc *guc); }; +static inline bool intel_guc_is_alive(struct intel_guc *guc) +{ + return intel_uc_fw_is_loaded(&guc->fw); +} + static inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len) { diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c index ac862b42f6a1..aa4d6bbdd1e9 100644 --- a/drivers/gpu/drm/i915/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/intel_guc_submission.c @@ -282,8 +282,7 @@ __get_process_desc(struct intel_guc_client *client) /* * Initialise the process descriptor shared with the GuC firmware. */ -static void guc_proc_desc_init(struct intel_guc *guc, - struct intel_guc_client *client) +static void guc_proc_desc_init(struct intel_guc_client *client) { struct guc_process_desc *desc; @@ -304,6 +303,14 @@ static void guc_proc_desc_init(struct intel_guc *guc, desc->priority = client->priority; } +static void guc_proc_desc_fini(struct intel_guc_client *client) +{ + struct guc_process_desc *desc; + + desc = __get_process_desc(client); + memset(desc, 0, sizeof(*desc)); +} + static int guc_stage_desc_pool_create(struct intel_guc *guc) { struct i915_vma *vma; @@ -341,9 +348,9 @@ static void guc_stage_desc_pool_destroy(struct intel_guc *guc) * data structures relating to this client (doorbell, process descriptor, * write queue, etc). */ -static void guc_stage_desc_init(struct intel_guc *guc, - struct intel_guc_client *client) +static void guc_stage_desc_init(struct intel_guc_client *client) { + struct intel_guc *guc = client->guc; struct drm_i915_private *dev_priv = guc_to_i915(guc); struct intel_engine_cs *engine; struct i915_gem_context *ctx = client->owner; @@ -424,8 +431,7 @@ static void guc_stage_desc_init(struct intel_guc *guc, desc->desc_private = ptr_to_u64(client); } -static void guc_stage_desc_fini(struct intel_guc *guc, - struct intel_guc_client *client) +static void guc_stage_desc_fini(struct intel_guc_client *client) { struct guc_stage_desc *desc; @@ -486,14 +492,6 @@ static void guc_wq_item_append(struct intel_guc_client *client, WRITE_ONCE(desc->tail, (wq_off + wqi_size) & (GUC_WQ_SIZE - 1)); } -static void guc_reset_wq(struct intel_guc_client *client) -{ - struct guc_process_desc *desc = __get_process_desc(client); - - desc->head = 0; - desc->tail = 0; -} - static void guc_ring_doorbell(struct intel_guc_client *client) { struct guc_doorbell_info *db; @@ -898,45 +896,6 @@ static bool guc_verify_doorbells(struct intel_guc *guc) return true; } -static int guc_clients_doorbell_init(struct intel_guc *guc) -{ - int ret; - - ret = create_doorbell(guc->execbuf_client); - if (ret) - return ret; - - if (guc->preempt_client) { - ret = create_doorbell(guc->preempt_client); - if (ret) { - destroy_doorbell(guc->execbuf_client); - return ret; - } - } - - return 0; -} - -static void guc_clients_doorbell_fini(struct intel_guc *guc) -{ - /* - * By the time we're here, GuC has already been reset. - * Instead of trying (in vain) to communicate with it, let's just - * cleanup the doorbell HW and our internal state. - */ - if (guc->preempt_client) { - __destroy_doorbell(guc->preempt_client); - __update_doorbell_desc(guc->preempt_client, - GUC_DOORBELL_INVALID); - } - - if (guc->execbuf_client) { - __destroy_doorbell(guc->execbuf_client); - __update_doorbell_desc(guc->execbuf_client, - GUC_DOORBELL_INVALID); - } -} - /** * guc_client_alloc() - Allocate an intel_guc_client * @dev_priv: driver private data structure @@ -1009,9 +968,6 @@ guc_client_alloc(struct drm_i915_private *dev_priv, else client->proc_desc_offset = (GUC_DB_SIZE / 2); - guc_proc_desc_init(guc, client); - guc_stage_desc_init(guc, client); - ret = reserve_doorbell(client); if (ret) goto err_vaddr; @@ -1037,7 +993,6 @@ guc_client_alloc(struct drm_i915_private *dev_priv, static void guc_client_free(struct intel_guc_client *client) { unreserve_doorbell(client); - guc_stage_desc_fini(client->guc, client); i915_vma_unpin_and_release(&client->vma, I915_VMA_RELEASE_MAP); ida_simple_remove(&client->guc->stage_ids, client->stage_id); kfree(client); @@ -1104,6 +1059,69 @@ static void guc_clients_destroy(struct intel_guc *guc) guc_client_free(client); } +static int __guc_client_enable(struct intel_guc_client *client) +{ + int ret; + + guc_proc_desc_init(client); + guc_stage_desc_init(client); + + ret = create_doorbell(client); + if (ret) + goto fail; + + return 0; + +fail: + guc_stage_desc_fini(client); + guc_proc_desc_fini(client); + return ret; +} + +static void __guc_client_disable(struct intel_guc_client *client) +{ + /* + * By the time we're here, GuC may have already been reset. if that is + * the case, instead of trying (in vain) to communicate with it, let's + * just cleanup the doorbell HW and our internal state. + */ + if (intel_guc_is_alive(client->guc)) + destroy_doorbell(client); + else + __destroy_doorbell(client); + + guc_stage_desc_fini(client); + guc_proc_desc_fini(client); +} + +static int guc_clients_enable(struct intel_guc *guc) +{ + int ret; + + ret = __guc_client_enable(guc->execbuf_client); + if (ret) + return ret; + + if (guc->preempt_client) { + ret = __guc_client_enable(guc->preempt_client); + if (ret) { + __guc_client_disable(guc->execbuf_client); + return ret; + } + } + + return 0; +} + +static void guc_clients_disable(struct intel_guc *guc) +{ + if (guc->preempt_client) + __guc_client_disable(guc->preempt_client); + + if (guc->execbuf_client) + __guc_client_disable(guc->execbuf_client); +} + /* * Set up the memory resources to be shared with the GuC (via the GGTT) * at firmware loading time. @@ -1287,15 +1305,11 @@ int intel_guc_submission_enable(struct intel_guc *guc) GEM_BUG_ON(!guc->execbuf_client); - guc_reset_wq(guc->execbuf_client); - if (guc->preempt_client) - guc_reset_wq(guc->preempt_client); - err = intel_guc_sample_forcewake(guc); if (err) return err; - err = guc_clients_doorbell_init(guc); + err = guc_clients_enable(guc); if (err) return err; @@ -1317,7 +1331,7 @@ void intel_guc_submission_disable(struct intel_guc *guc) GEM_BUG_ON(dev_priv->gt.awake); /* GT should be parked first */ guc_interrupts_release(dev_priv); - guc_clients_doorbell_fini(guc); + guc_clients_disable(guc); } #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) diff --git a/drivers/gpu/drm/i915/intel_uc_fw.h b/drivers/gpu/drm/i915/intel_uc_fw.h index 87910aa83267..0e3bd580e267 100644 --- a/drivers/gpu/drm/i915/intel_uc_fw.h +++ b/drivers/gpu/drm/i915/intel_uc_fw.h @@ -115,9 +115,14 @@ static inline bool intel_uc_fw_is_selected(struct intel_uc_fw *uc_fw) return uc_fw->path != NULL; } +static inline bool intel_uc_fw_is_loaded(struct intel_uc_fw *uc_fw) +{ + return uc_fw->load_status == INTEL_UC_FIRMWARE_SUCCESS; +} + static inline void intel_uc_fw_sanitize(struct intel_uc_fw *uc_fw) { - if (uc_fw->load_status == INTEL_UC_FIRMWARE_SUCCESS) + if (intel_uc_fw_is_loaded(uc_fw)) uc_fw->load_status = INTEL_UC_FIRMWARE_PENDING; } diff --git a/drivers/gpu/drm/i915/selftests/intel_guc.c b/drivers/gpu/drm/i915/selftests/intel_guc.c index 0c0ab82b6228..bf27162fb327 100644 --- a/drivers/gpu/drm/i915/selftests/intel_guc.c +++ b/drivers/gpu/drm/i915/selftests/intel_guc.c @@ -159,6 +159,7 @@ static int igt_guc_clients(void *args) * Get rid of clients created during driver load because the test will * recreate them. */ + guc_clients_disable(guc); guc_clients_destroy(guc); if (guc->execbuf_client || guc->preempt_client) { pr_err("guc_clients_destroy lied!\n"); @@ -197,8 +198,8 @@ static int igt_guc_clients(void *args) goto out; } - /* Now create the doorbells */ - guc_clients_doorbell_init(guc); + /* Now enable the clients */ + guc_clients_enable(guc); /* each client should now have received a doorbell */ if (!client_doorbell_in_sync(guc->execbuf_client) || @@ -212,7 +213,7 @@ static int igt_guc_clients(void *args) * Basic test - an attempt to reallocate a valid doorbell to the * client it is currently assigned should not cause a failure. */ - err = guc_clients_doorbell_init(guc); + err = create_doorbell(guc->execbuf_client); if (err) goto out; @@ -263,12 +264,10 @@ static int igt_guc_clients(void *args) * Leave clean state for other test, plus the driver always destroy the * clients during unload. */ - destroy_doorbell(guc->execbuf_client); - if (guc->preempt_client) - destroy_doorbell(guc->preempt_client); + guc_clients_disable(guc); guc_clients_destroy(guc); guc_clients_create(guc); - guc_clients_doorbell_init(guc); + guc_clients_enable(guc); unlock: intel_runtime_pm_put(dev_priv); mutex_unlock(&dev_priv->drm.struct_mutex); @@ -352,7 +351,7 @@ static int igt_guc_doorbells(void *arg) db_id = clients[i]->doorbell_id; - err = create_doorbell(clients[i]); + err = __guc_client_enable(clients[i]); if (err) { pr_err("[%d] Failed to create a doorbell\n", i); goto out; @@ -378,7 +377,7 @@ static int igt_guc_doorbells(void *arg) out: for (i = 0; i < ATTEMPTS; i++) if (!IS_ERR_OR_NULL(clients[i])) { - destroy_doorbell(clients[i]); + __guc_client_disable(clients[i]); guc_client_free(clients[i]); } unlock:
GuC stores some data in there, which might be stale after a reset. We already reset the WQ head and tail, but more things are being moved to the descriptor with the interface updates. Instead of trying to track them one by one, always memset and init the descriptors from scratch after GuC is loaded. The code is also reorganized so that the above operations and the doorbell creation are grouped as "client enabling" v2: add proc_desc_fini for symmetry (Daniele), remove unneeded var init, add guc_is_alive() (Michal) Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> --- drivers/gpu/drm/i915/intel_guc.h | 5 + drivers/gpu/drm/i915/intel_guc_submission.c | 140 +++++++++++--------- drivers/gpu/drm/i915/intel_uc_fw.h | 7 +- drivers/gpu/drm/i915/selftests/intel_guc.c | 17 ++- 4 files changed, 96 insertions(+), 73 deletions(-)