[0/4] Don't fail on HuC early init errors

Message ID	20190804195052.31140-1-michal.wajdeczko@intel.com (mailing list archive)
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Michal Wajdeczko <michal.wajdeczko@intel.com> To: intel-gfx@lists.freedesktop.org Date: Sun, 4 Aug 2019 19:50:48 +0000 Message-Id: <20190804195052.31140-1-michal.wajdeczko@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 0/4] Don't fail on HuC early init errors Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	Don't fail on HuC early init errors \| expand [0/4] Don't fail on HuC early init errors [1/4] drm/i915/guc: Prefer intel_guc_is_submission_supported [2/4] drm/i915/huc: Prefer intel_huc_is_supported [3/4] drm/i915/uc: Remove redundant GuC support checks [4/4] drm/i915/uc: Don't fail on HuC early init errors

Message ID

20190804195052.31140-1-michal.wajdeczko@intel.com (mailing list archive)

Headers

From: Michal Wajdeczko <michal.wajdeczko@intel.com>
To: intel-gfx@lists.freedesktop.org
Date: Sun,  4 Aug 2019 19:50:48 +0000
Message-Id: <20190804195052.31140-1-michal.wajdeczko@intel.com>
MIME-Version: 1.0
Subject: [Intel-gfx] [PATCH 0/4] Don't fail on HuC early init errors
Precedence: list
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Series

Don't fail on HuC early init errors | expand

Message

Michal Wajdeczko Aug. 4, 2019, 7:50 p.m. UTC

Next step to ignore all HuC related errors

Michal Wajdeczko (4):
  drm/i915/guc: Prefer intel_guc_is_submission_supported
  drm/i915/huc: Prefer intel_huc_is_supported
  drm/i915/uc: Remove redundant GuC support checks
  drm/i915/uc: Don't fail on HuC early init errors

 drivers/gpu/drm/i915/gt/uc/intel_guc.c |  8 ++++----
 drivers/gpu/drm/i915/gt/uc/intel_huc.c |  9 +++++++--
 drivers/gpu/drm/i915/gt/uc/intel_uc.c  | 19 +++++--------------
 3 files changed, 16 insertions(+), 20 deletions(-)

Comments

Michal Wajdeczko Aug. 4, 2019, 8:27 p.m. UTC | #1

On Sun, 04 Aug 2019 22:18:51 +0200, Patchwork  
<patchwork@emeril.freedesktop.org> wrote:

> == Series Details ==
>
> Series: Don't fail on HuC early init errors
> URL   : https://patchwork.freedesktop.org/series/64668/
> State : failure
>
> == Summary ==
>
> CI Bug Log - changes from CI_DRM_6624 -> Patchwork_13866
> ====================================================
>
> Summary
> -------
>
>   **FAILURE**
>
>   Serious unknown changes coming with Patchwork_13866 absolutely need to  
> be
>   verified manually.
>  If you think the reported changes have nothing to do with the changes
>   introduced in Patchwork_13866, please notify your bug team to allow  
> them
>   to document this new failure mode, which will reduce false positives  
> in CI.
>
>   External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13866/
>
> Possible new issues
> -------------------
>
>   Here are the unknown changes that may have been introduced in  
> Patchwork_13866:
>
> ### IGT changes ###
>
> #### Possible regressions ####
>
>   * igt@runner@aborted:
>     - fi-cml-u2:          NOTRUN -> [FAIL][1]
>    [1]:  
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13866/fi-cml-u2/igt@runner@aborted.html
>

hmm, looks unrelated (this CML is not using guc/huc at all)

<7>[    9.344398] [drm:intel_uc_init_early [i915]] enable_guc=0 (guc:no  
submission:no huc:no)
...
<4>[  486.270801] ------------[ cut here ]------------
<4>[  486.270843] list_add corruption. prev->next should be next  
(ffff8883c0fe17d8), but was ffff8883cb29c5a8. (prev=ffff8883cb29c5a8).
...
<4>[  486.270889] Call Trace:
<4>[  486.270945]  __i915_request_commit+0x35c/0x6a0 [i915]
<4>[  486.270991]  ? __i915_request_create+0x22c/0x4d0 [i915]
<4>[  486.271030]  __engine_park+0x64/0x200 [i915]
<4>[  486.271067]  __intel_wakeref_put_last+0x14/0x60 [i915]
<4>[  486.271104]  __igt_reset_engine+0x2be/0x490 [i915]
<4>[  486.271111]  ? __trace_bprintk+0x57/0x80
<4>[  486.271160]  __i915_subtests+0xb8/0x210 [i915]
<4>[  486.271205]  ? __i915_live_teardown+0x70/0x70 [i915]
<4>[  486.271248]  ? __intel_gt_live_setup+0x10/0x10 [i915]
<4>[  486.271287]  intel_hangcheck_live_selftests+0xa5/0x100 [i915]
<4>[  486.271332]  __run_selftests+0x112/0x170 [i915]
<4>[  486.271376]  i915_live_selftests+0x2c/0x60 [i915]
<4>[  486.271410]  i915_pci_probe+0x93/0x1b0 [i915]
<4>[  486.271414]  ? _raw_spin_unlock_irqrestore+0x39/0x60
<4>[  486.271419]  pci_device_probe+0x9e/0x120
<4>[  486.271423]  really_probe+0xea/0x3d0
<4>[  486.271426]  driver_probe_device+0x10b/0x120
<4>[  486.271429]  device_driver_attach+0x4a/0x50
<4>[  486.271431]  __driver_attach+0x97/0x130
<4>[  486.271434]  ? device_driver_attach+0x50/0x50
<4>[  486.271436]  bus_for_each_dev+0x74/0xc0
<4>[  486.271440]  bus_add_driver+0x13f/0x210
<4>[  486.271442]  ? 0xffffffffa0822000
<4>[  486.271444]  driver_register+0x56/0xe0
<4>[  486.271446]  ? 0xffffffffa0822000
<4>[  486.271449]  do_one_initcall+0x58/0x300
<4>[  486.271452]  ? do_init_module+0x1d/0x1f6
<4>[  486.271455]  ? rcu_read_lock_sched_held+0x6f/0x80
<4>[  486.271458]  ? kmem_cache_alloc_trace+0x2d1/0x300
<4>[  486.271462]  do_init_module+0x56/0x1f6
<4>[  486.271465]  load_module+0x25bd/0x2a40
<4>[  486.271477]  ? __se_sys_finit_module+0xd3/0xf0
<4>[  486.271479]  __se_sys_finit_module+0xd3/0xf0
<4>[  486.271487]  do_syscall_64+0x55/0x1c0

Chris Wilson Aug. 5, 2019, 6:15 p.m. UTC | #2

Quoting Michal Wajdeczko (2019-08-04 21:27:38)
> On Sun, 04 Aug 2019 22:18:51 +0200, Patchwork  
> <patchwork@emeril.freedesktop.org> wrote:
> >   * igt@runner@aborted:
> >     - fi-cml-u2:          NOTRUN -> [FAIL][1]
> >    [1]:  
> > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13866/fi-cml-u2/igt@runner@aborted.html
> >
> 
> hmm, looks unrelated (this CML is not using guc/huc at all)
> 
> <7>[    9.344398] [drm:intel_uc_init_early [i915]] enable_guc=0 (guc:no  
> submission:no huc:no)
> ...
> <4>[  486.270801] ------------[ cut here ]------------
> <4>[  486.270843] list_add corruption. prev->next should be next  
> (ffff8883c0fe17d8), but was ffff8883cb29c5a8. (prev=ffff8883cb29c5a8).

That really needs the timeline patches, if only for the lockdep markup.
But I think it's
  drm/i915: Protect request retirement with timeline->mutex
judging from the probable race.
-Chris