Message ID | 20210815201559.1150-1-matthew.brost@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Clean up GuC CI failures, simplify locking, and kernel DOC | expand |
On Sun, Aug 15, 2021 at 09:15:31PM +0000, Patchwork wrote: > Patch Details > > Series: Clean up GuC CI failures, simplify locking, and kernel DOC > URL: https://patchwork.freedesktop.org/series/93704/ > State: failure > Details: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20826/index.html > > CI Bug Log - changes from CI_DRM_10484 -> Patchwork_20826 > > Summary > > FAILURE > > Serious unknown changes coming with Patchwork_20826 absolutely need to be > verified manually. > > If you think the reported changes have nothing to do with the changes > introduced in Patchwork_20826, please notify your bug team to allow them > to document this new failure mode, which will reduce false positives in CI. > > External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20826/ > index.html > > Possible new issues > > Here are the unknown changes that may have been introduced in Patchwork_20826: > > IGT changes > > Possible regressions > > • igt@i915_selftest@live@requests: > > □ fi-cfl-guc: PASS -> DMESG-FAIL > New selftext, __cancel_reset, is exposing a bug (or at minimum different behavior) in the execlists implementation for canceling requests. Will fix in new rev. Matt > □ fi-kbl-soraka: PASS -> DMESG-FAIL > > □ fi-bxt-dsi: PASS -> DMESG-FAIL > > □ fi-tgl-1115g4: PASS -> DMESG-FAIL > > □ fi-cml-u2: PASS -> DMESG-FAIL > > □ fi-kbl-8809g: PASS -> DMESG-FAIL > > □ fi-cfl-8700k: PASS -> DMESG-FAIL > > □ fi-cfl-8109u: PASS -> DMESG-FAIL > > □ fi-icl-u2: PASS -> DMESG-FAIL > > □ fi-kbl-7500u: PASS -> DMESG-FAIL > > □ fi-bsw-nick: PASS -> DMESG-FAIL > > □ fi-icl-y: PASS -> DMESG-FAIL > > □ fi-kbl-guc: PASS -> DMESG-FAIL > > □ fi-kbl-7567u: PASS -> DMESG-FAIL > > □ fi-skl-guc: PASS -> DMESG-FAIL > > □ fi-bdw-5557u: PASS -> DMESG-FAIL > > □ fi-glk-dsi: PASS -> DMESG-FAIL > > □ fi-bsw-kefka: PASS -> DMESG-FAIL > > □ fi-skl-6700k2: PASS -> DMESG-FAIL > > Warnings > > • igt@i915_selftest@live@workarounds: > □ fi-rkl-guc: DMESG-FAIL (i915#3928) -> INCOMPLETE > > Suppressed > > The following results come from untrusted machines, tests, or statuses. > They do not affect the overall result. > > • igt@i915_selftest@live@requests: > > □ {fi-tgl-dsi}: PASS -> DMESG-FAIL > > □ {fi-jsl-1}: PASS -> DMESG-FAIL > > □ {fi-ehl-2}: PASS -> DMESG-FAIL > > New tests > > New tests have been introduced between CI_DRM_10484 and Patchwork_20826: > > New IGT tests (1) > > • igt@i915_selftest@live@guc: > □ Statuses : 30 pass(s) > □ Exec time: [0.42, 5.06] s > > Known issues > > Here are the changes found in Patchwork_20826 that come from known issues: > > IGT changes > > Issues hit > > • igt@i915_module_load@reload: > > □ fi-kbl-soraka: PASS -> DMESG-WARN (i915#1982) > • igt@i915_selftest@live@execlists: > > □ fi-icl-y: PASS -> DMESG-FAIL (i915#1993) > > Possible fixes > > • igt@i915_module_load@reload: > □ {fi-tgl-dsi}: DMESG-WARN (i915#1982 / k.org#205379) -> PASS > > {name}: This element is suppressed. This means it is ignored when computing > the status of the difference (SUCCESS, WARNING, or FAILURE). > > Participating hosts (37 -> 34) > > Missing (3): fi-bdw-samus fi-bsw-cyan bat-jsl-1 > > Build changes > > • Linux: CI_DRM_10484 -> Patchwork_20826 > > CI-20190529: 20190529 > CI_DRM_10484: 7de02d5cb1f35bd3f068237444063844dea47ddc @ git:// > anongit.freedesktop.org/gfx-ci/linux > IGT_6175: c91f99c74b966f635d7e2eb898bf0f78383d281b @ https:// > gitlab.freedesktop.org/drm/igt-gpu-tools.git > Patchwork_20826: f7ff315bfe3a76713c1f0a16cd92b0908d28e4c6 @ git:// > anongit.freedesktop.org/gfx-ci/linux > > == Linux commits == > > f7ff315bfe3a drm/i915/guc: Add GuC kernel doc > af14e3698d19 drm/i915/guc: Move GuC priority fields in context under guc_active > eb8a352e7c1f drm/i915/guc: Drop pin count check trick between sched_disable and > re-pin > 7057a0daff8c drm/i915/guc: Proper xarray usage for contexts_lookup > d97ab34c8bac drm/i915/guc: Rework and simplify locking > 4c980575f7af drm/i915/guc: Move guc_blocked fence to struct guc_state > 1c2b4c0ac62a drm/i915/guc: Release submit fence from an IRQ > 14dc302536e1 drm/i915/guc: Flush G2H work queue during reset > c0ad63d810e6 drm/i915: Allocate error capture in atomic context > 6c1c488a3654 drm/i915/guc: Reset LRC descriptor if register returns -ENODEV > 025e88fa74d3 drm/i915/guc: Don't touch guc_state.sched_state without a lock > b929abcf3b59 drm/i915/guc: Take context ref when cancelling request > b5e8c08dff35 drm/i915/selftests: Add initial GuC selftest for scrubbing lost > G2H > cddf94c9bda0 drm/i915/selftests: Fix memory corruption in live_lrc_isolation > c18da32e671c drm/i915/guc: Don't enable scheduling on a banned context, guc_id > invalid, not registered > 0c0928ba1ba8 drm/i915/selftests: Add a cancel request selftest that triggers a > reset > 8ec967ce47da drm/i915/guc: Workaround reset G2H is received after schedule done > G2H > 2739b8f8966f drm/i915/guc: Don't drop ce->guc_active.lock when unwinding > context > 1d414637d838 drm/i915/guc: Unwind context requests in reverse order > 7fe738f56abc drm/i915/guc: outstanding G2H accounting > c9f3859e5dce drm/i915/guc: Fix blocked context accounting > > SECURITY NOTE: file ~/.netrc must not be accessible by others
Daniel Vetter pointed out that locking in the GuC submission code was overly complicated, let's clean this up a bit before introducing more features in the GuC submission backend. Also fix some CI failures, port fixes from our internal tree, and add a few more selftests for coverage. Lastly, add some kernel DOC explaining how the GuC submission backend works. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Matthew Brost (21): drm/i915/guc: Fix blocked context accounting drm/i915/guc: outstanding G2H accounting drm/i915/guc: Unwind context requests in reverse order drm/i915/guc: Don't drop ce->guc_active.lock when unwinding context drm/i915/guc: Workaround reset G2H is received after schedule done G2H drm/i915/selftests: Add a cancel request selftest that triggers a reset drm/i915/guc: Don't enable scheduling on a banned context, guc_id invalid, not registered drm/i915/selftests: Fix memory corruption in live_lrc_isolation drm/i915/selftests: Add initial GuC selftest for scrubbing lost G2H drm/i915/guc: Take context ref when cancelling request drm/i915/guc: Don't touch guc_state.sched_state without a lock drm/i915/guc: Reset LRC descriptor if register returns -ENODEV drm/i915: Allocate error capture in atomic context drm/i915/guc: Flush G2H work queue during reset drm/i915/guc: Release submit fence from an IRQ drm/i915/guc: Move guc_blocked fence to struct guc_state drm/i915/guc: Rework and simplify locking drm/i915/guc: Proper xarray usage for contexts_lookup drm/i915/guc: Drop pin count check trick between sched_disable and re-pin drm/i915/guc: Move GuC priority fields in context under guc_active drm/i915/guc: Add GuC kernel doc drivers/gpu/drm/i915/gt/intel_context.c | 5 +- drivers/gpu/drm/i915/gt/intel_context_types.h | 68 +- drivers/gpu/drm/i915/gt/selftest_lrc.c | 29 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 19 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 688 +++++++++++------- drivers/gpu/drm/i915/gt/uc/selftest_guc.c | 126 ++++ drivers/gpu/drm/i915/i915_gpu_error.c | 37 +- drivers/gpu/drm/i915/i915_request.h | 23 +- drivers/gpu/drm/i915/i915_trace.h | 8 +- .../drm/i915/selftests/i915_live_selftests.h | 1 + drivers/gpu/drm/i915/selftests/i915_request.c | 94 +++ .../i915/selftests/intel_scheduler_helpers.c | 12 + .../i915/selftests/intel_scheduler_helpers.h | 2 + 13 files changed, 805 insertions(+), 307 deletions(-) create mode 100644 drivers/gpu/drm/i915/gt/uc/selftest_guc.c