Message ID | 20210826032327.18078-1-matthew.brost@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Clean up GuC CI failures, simplify locking, and kernel DOC | expand |
On Thu, Aug 26, 2021 at 04:17:07PM +0000, Patchwork wrote: > Patch Details > > Series: Clean up GuC CI failures, simplify locking, and kernel DOC (rev6) > URL: https://patchwork.freedesktop.org/series/93704/ > State: failure > Details: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20904/index.html > > CI Bug Log - changes from CI_DRM_10525 -> Patchwork_20904 > > Summary > > FAILURE > > Serious unknown changes coming with Patchwork_20904 absolutely need to be > verified manually. > > If you think the reported changes have nothing to do with the changes > introduced in Patchwork_20904, please notify your bug team to allow them > to document this new failure mode, which will reduce false positives in CI. > > External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20904/ > index.html > > Possible new issues > > Here are the unknown changes that may have been introduced in Patchwork_20904: > > IGT changes > > Possible regressions > > • igt@i915_selftest@live@hangcheck: > □ fi-rkl-guc: PASS -> INCOMPLETE I've seen this locally before and after this series. I wouldn't hold of the merge of this series because of this as I don't believe it is a regression, just an existing instability in the stack. I haven't been able to root cause this yet, but my initial analysis points to the GuC losing a submission after the GuC has reset a context. Will dig into this and hopefully get a fix after I'm back from vacation on 9/7. Matt > > New tests > > New tests have been introduced between CI_DRM_10525 and Patchwork_20904: > > New IGT tests (1) > > • igt@i915_selftest@live@guc: > □ Statuses : 30 pass(s) > □ Exec time: [0.41, 5.26] s > > Known issues > > Here are the changes found in Patchwork_20904 that come from known issues: > > IGT changes > > Issues hit > > • igt@amdgpu/amd_cs_nop@sync-compute0: > > □ fi-kbl-soraka: NOTRUN -> SKIP (fdo#109271) +5 similar issues > • igt@runner@aborted: > > □ fi-rkl-guc: NOTRUN -> FAIL (i915#3928) > > {name}: This element is suppressed. This means it is ignored when computing > the status of the difference (SUCCESS, WARNING, or FAILURE). > > Participating hosts (40 -> 33) > > Missing (7): fi-ilk-m540 bat-adls-5 fi-hsw-4200u fi-tgl-1115g4 fi-bsw-cyan > fi-bdw-samus bat-jsl-1 > > Build changes > > • Linux: CI_DRM_10525 -> Patchwork_20904 > > CI-20190529: 20190529 > CI_DRM_10525: 059309d37ac2de5d93cf6d71fd7fe33c9c2c66ea @ git:// > anongit.freedesktop.org/gfx-ci/linux > IGT_6186: 250081b306c6fa8f95405fab6a7604f1968dd4ec @ https:// > gitlab.freedesktop.org/drm/igt-gpu-tools.git > Patchwork_20904: 0c1d27ac9fce7e231e7dddebcf56905e05302cae @ git:// > anongit.freedesktop.org/gfx-ci/linux > > == Linux commits == > > 0c1d27ac9fce drm/i915/guc: Drop static inline functions intel_guc_submission.c > 50ada01b3d95 drm/i915/guc: Add GuC kernel doc > 883eccfa8221 drm/i915/guc: Drop guc_active move everything into guc_state > fa075902c938 drm/i915/guc: Move fields protected by guc->contexts_lock into sub > structure > a1c73c8c481a drm/i915/guc: Move GuC priority fields in context under guc_active > f16c0554ae08 drm/i915/guc: Drop pin count check trick between sched_disable and > re-pin > 42ac1b77a019 drm/i915/guc: Proper xarray usage for contexts_lookup > 9b9222998c83 drm/i915/guc: Rework and simplify locking > 244934484f63 drm/i915/guc: Move guc_blocked fence to struct guc_state > ba695a58136a drm/i915/guc: Release submit fence from an irq_work > 3bd5803d5e25 drm/i915/guc: Flush G2H work queue during reset > b87ba9121748 drm/i915: Allocate error capture in nowait context > adb35ad83c76 drm/i915/guc: Reset LRC descriptor if register returns -ENODEV > 97e616063006 drm/i915/guc: Don't touch guc_state.sched_state without a lock > 1ff99308ef88 drm/i915/guc: Take context ref when cancelling request > ff84f14ddceb drm/i915/selftests: Add initial GuC selftest for scrubbing lost > G2H > abd6a8884cf4 drm/i915/guc: Copy whole golden context, set engine state size of > subset > a19ba1f51009 drm/i915/guc: Don't enable scheduling on a banned context, guc_id > invalid, not registered > f29b2b338002 drm/i915/guc: Kick tasklet after queuing a request > f577a4fdeeab drm/i915/selftests: Add a cancel request selftest that triggers a > reset > da3d87dfe8c5 Revert "drm/i915/gt: Propagate change in error status to children > on unhold" > 25273a034c8d drm/i915/guc: Workaround reset G2H is received after schedule done > G2H > c00d543957c2 drm/i915/guc: Process all G2H message at once in work queue > 5b7ff1fa9e43 drm/i915/guc: Don't drop ce->guc_active.lock when unwinding > context > 54cd904fa232 drm/i915/guc: Unwind context requests in reverse order > 593f21493fda drm/i915/guc: Fix outstanding G2H accounting > 6b511953d015 drm/i915/guc: Fix blocked context accounting > > SECURITY NOTE: file ~/.netrc must not be accessible by others
On Thu, Aug 26, 2021 at 10:34:36AM +0000, Patchwork wrote: > Patch Details > > Series: Clean up GuC CI failures, simplify locking, and kernel DOC (rev5) > URL: https://patchwork.freedesktop.org/series/93704/ > State: failure > Details: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20896/index.html > > CI Bug Log - changes from CI_DRM_10522_full -> Patchwork_20896_full > > Summary > > FAILURE > > Serious unknown changes coming with Patchwork_20896_full absolutely need to be > verified manually. > > If you think the reported changes have nothing to do with the changes > introduced in Patchwork_20896_full, please notify your bug team to allow them > to document this new failure mode, which will reduce false positives in CI. > > Possible new issues > > Here are the unknown changes that may have been introduced in > Patchwork_20896_full: > > IGT changes > > Possible regressions > > • igt@gem_exec_schedule@reorder-wide@vcs0: > □ shard-skl: PASS -> FAIL > Not really sure what this one is about but I don't see how it could be related to this series as almost all the changes in this series are in the GuC backend while this test is runing on a much older platform. Matt > New tests > > New tests have been introduced between CI_DRM_10522_full and > Patchwork_20896_full: > > New IGT tests (1) > > • igt@i915_selftest@live@guc: > □ Statuses : 8 pass(s) > □ Exec time: [0.47, 4.95] s > > Known issues > > Here are the changes found in Patchwork_20896_full that come from known issues: > > IGT changes > > Issues hit > > • igt@gem_ctx_persistence@legacy-engines-mixed-process: > > □ shard-snb: NOTRUN -> SKIP (fdo#109271 / i915#1099) +1 similar issue > • igt@gem_ctx_sseu@mmap-args: > > □ shard-tglb: NOTRUN -> SKIP ([i915#280]) > • igt@gem_eio@in-flight-10ms: > > □ shard-skl: PASS -> TIMEOUT ([i915#3063]) +1 similar issue > • igt@gem_exec_fair@basic-deadline: > > □ shard-kbl: PASS -> FAIL ([i915#2846]) > • igt@gem_exec_fair@basic-none-solo@rcs0: > > □ shard-tglb: NOTRUN -> FAIL ([i915#2842]) > • igt@gem_exec_fair@basic-pace@vcs1: > > □ shard-iclb: NOTRUN -> FAIL ([i915#2842]) > • igt@gem_exec_fair@basic-pace@vecs0: > > □ shard-kbl: PASS -> FAIL ([i915#2842]) +1 similar issue > > □ shard-tglb: PASS -> FAIL ([i915#2842]) > > • igt@gem_exec_fair@basic-throttle@rcs0: > > □ shard-glk: PASS -> FAIL ([i915#2842]) +1 similar issue > > □ shard-iclb: PASS -> FAIL ([i915#2849]) > > • igt@gem_exec_params@secure-non-master: > > □ shard-tglb: NOTRUN -> SKIP (fdo#112283) > • igt@gem_pread@exhaustion: > > □ shard-snb: NOTRUN -> WARN ([i915#2658]) > • igt@gem_render_copy@yf-tiled-to-vebox-linear: > > □ shard-iclb: NOTRUN -> SKIP ([i915#768]) > • igt@gem_userptr_blits@readonly-pwrite-unsync: > > □ shard-tglb: NOTRUN -> SKIP ([i915#3297]) > > □ shard-iclb: NOTRUN -> SKIP ([i915#3297]) > > • igt@gen3_render_tiledy_blits: > > □ shard-tglb: NOTRUN -> SKIP (fdo#109289) > • igt@i915_pm_dc@dc6-psr: > > □ shard-iclb: PASS -> FAIL ([i915#454]) > • igt@i915_pm_rpm@modeset-non-lpsp-stress-no-wait: > > □ shard-tglb: NOTRUN -> SKIP (fdo#111644 / i915#1397 / i915#2411) > • igt@kms_big_fb@linear-16bpp-rotate-90: > > □ shard-apl: NOTRUN -> SKIP (fdo#109271) +177 similar issues > • igt@kms_big_fb@linear-32bpp-rotate-0: > > □ shard-glk: PASS -> DMESG-WARN (i915#118 / [i915#95]) +2 similar issues > • igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-0-hflip: > > □ shard-skl: NOTRUN -> SKIP (fdo#109271 / [i915#3777]) > • igt@kms_big_fb@y-tiled-max-hw-stride-32bpp-rotate-0-hflip: > > □ shard-apl: NOTRUN -> SKIP (fdo#109271 / [i915#3777]) > • igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-0-hflip: > > □ shard-tglb: NOTRUN -> SKIP (fdo#111615) > • igt@kms_ccs@pipe-a-bad-pixel-format-y_tiled_gen12_rc_ccs_cc: > > □ shard-skl: NOTRUN -> SKIP (fdo#109271 / [i915#3886]) +4 similar issues > • igt@kms_ccs@pipe-a-ccs-on-another-bo-y_tiled_gen12_mc_ccs: > > □ shard-apl: NOTRUN -> SKIP (fdo#109271 / [i915#3886]) +6 similar issues > • igt@kms_ccs@pipe-b-missing-ccs-buffer-y_tiled_gen12_rc_ccs_cc: > > □ shard-iclb: NOTRUN -> SKIP (fdo#109278 / [i915#3886]) > • igt@kms_ccs@pipe-c-random-ccs-data-y_tiled_gen12_mc_ccs: > > □ shard-tglb: NOTRUN -> SKIP ([i915#3689] / [i915#3886]) > • igt@kms_chamelium@dp-audio: > > □ shard-tglb: NOTRUN -> SKIP (fdo#109284 / fdo#111827) +1 similar issue > • igt@kms_chamelium@dp-crc-single: > > □ shard-snb: NOTRUN -> SKIP (fdo#109271 / fdo#111827) +8 similar issues > • igt@kms_chamelium@hdmi-hpd-fast: > > □ shard-iclb: NOTRUN -> SKIP (fdo#109284 / fdo#111827) +1 similar issue > • igt@kms_chamelium@vga-hpd-enable-disable-mode: > > □ shard-skl: NOTRUN -> SKIP (fdo#109271 / fdo#111827) > • igt@kms_color@pipe-d-ctm-green-to-red: > > □ shard-iclb: NOTRUN -> SKIP (fdo#109278 / i915#1149) > • igt@kms_color_chamelium@pipe-a-ctm-limited-range: > > □ shard-apl: NOTRUN -> SKIP (fdo#109271 / fdo#111827) +16 similar issues > • igt@kms_content_protection@atomic-dpms: > > □ shard-tglb: NOTRUN -> SKIP (fdo#111828) +1 similar issue > > □ shard-iclb: NOTRUN -> SKIP (fdo#109300 / fdo#111066) > > • igt@kms_content_protection@legacy: > > □ shard-apl: NOTRUN -> TIMEOUT (i915#1319) > • igt@kms_cursor_crc@pipe-a-cursor-32x32-sliding: > > □ shard-tglb: NOTRUN -> SKIP ([i915#3319]) > • igt@kms_cursor_crc@pipe-b-cursor-512x512-offscreen: > > □ shard-skl: NOTRUN -> SKIP (fdo#109271) +35 similar issues > • igt@kms_cursor_crc@pipe-c-cursor-32x10-onscreen: > > □ shard-tglb: NOTRUN -> SKIP ([i915#3359]) > • igt@kms_cursor_crc@pipe-c-cursor-512x512-random: > > □ shard-tglb: NOTRUN -> SKIP (fdo#109279 / [i915#3359]) > • igt@kms_flip@2x-single-buffer-flip-vs-dpms-off-vs-modeset-interruptible: > > □ shard-iclb: NOTRUN -> SKIP (fdo#109274) > • igt@kms_flip@flip-vs-expired-vblank-interruptible@b-edp1: > > □ shard-skl: PASS -> FAIL ([i915#79]) > • igt@kms_flip@flip-vs-suspend-interruptible@a-dp1: > > □ shard-apl: PASS -> DMESG-WARN (i915#180) +1 similar issue > • igt@kms_flip@plain-flip-fb-recreate-interruptible@c-edp1: > > □ shard-skl: PASS -> FAIL (i915#2122) > • igt@kms_frontbuffer_tracking@fbc-suspend: > > □ shard-apl: NOTRUN -> DMESG-WARN (i915#180) > • igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-pri-indfb-draw-mmap-gtt: > > □ shard-iclb: NOTRUN -> SKIP (fdo#109280) +1 similar issue > > □ shard-tglb: NOTRUN -> SKIP (fdo#111825) +2 similar issues > > • igt@kms_hdr@bpc-switch-suspend: > > □ shard-skl: PASS -> FAIL (i915#1188) > • igt@kms_plane_alpha_blend@pipe-a-alpha-7efc: > > □ shard-apl: NOTRUN -> FAIL (fdo#108145 / [i915#265]) +2 similar issues > • igt@kms_plane_alpha_blend@pipe-c-coverage-7efc: > > □ shard-skl: PASS -> FAIL (fdo#108145 / [i915#265]) +1 similar issue > • igt@kms_plane_cursor@pipe-c-viewport-size-128: > > □ shard-snb: NOTRUN -> SKIP (fdo#109271) +224 similar issues > • igt@kms_plane_cursor@pipe-d-viewport-size-64: > > □ shard-iclb: NOTRUN -> SKIP (fdo#109278) +3 similar issues > • igt@kms_plane_lowres@pipe-d-tiling-none: > > □ shard-tglb: NOTRUN -> SKIP ([i915#3536]) > • igt@kms_prime@basic-crc@first-to-second: > > □ shard-tglb: NOTRUN -> SKIP (i915#1836) > • igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area-1: > > □ shard-tglb: NOTRUN -> SKIP ([i915#2920]) > > □ shard-skl: NOTRUN -> SKIP (fdo#109271 / [i915#658]) > > □ shard-iclb: NOTRUN -> SKIP ([i915#658]) > > • igt@kms_psr2_sf@primary-plane-update-sf-dmg-area-5: > > □ shard-apl: NOTRUN -> SKIP (fdo#109271 / [i915#658]) +3 similar issues > • igt@kms_psr@psr2_suspend: > > □ shard-iclb: PASS -> SKIP (fdo#109441) > • igt@kms_vblank@pipe-d-wait-idle: > > □ shard-apl: NOTRUN -> SKIP (fdo#109271 / [i915#533]) > • igt@kms_writeback@writeback-check-output: > > □ shard-iclb: NOTRUN -> SKIP ([i915#2437]) > > □ shard-skl: NOTRUN -> SKIP (fdo#109271 / [i915#2437]) > > □ shard-tglb: NOTRUN -> SKIP ([i915#2437]) > > • igt@kms_writeback@writeback-invalid-parameters: > > □ shard-apl: NOTRUN -> SKIP (fdo#109271 / [i915#2437]) > • igt@nouveau_crc@pipe-d-ctx-flip-detection: > > □ shard-tglb: NOTRUN -> SKIP ([i915#2530]) > • igt@perf@polling-parameterized: > > □ shard-skl: PASS -> FAIL (i915#1542) > • igt@prime_nv_api@i915_nv_reimport_twice_check_flink_name: > > □ shard-tglb: NOTRUN -> SKIP (fdo#109291) > • igt@sysfs_clients@fair-1: > > □ shard-skl: NOTRUN -> SKIP (fdo#109271 / [i915#2994]) > • igt@sysfs_clients@fair-7: > > □ shard-apl: NOTRUN -> SKIP (fdo#109271 / [i915#2994]) +1 similar issue > • igt@sysfs_clients@sema-10: > > □ shard-tglb: NOTRUN -> SKIP ([i915#2994]) +1 similar issue > > Possible fixes > > • igt@gem_eio@in-flight-10ms: > > □ {shard-rkl}: TIMEOUT ([i915#3063]) -> PASS > • igt@gem_eio@unwedge-stress: > > □ shard-tglb: TIMEOUT (i915#2369 / [i915#3063] / [i915#3648]) -> PASS > > □ {shard-rkl}: FAIL ([i915#3115]) -> PASS > > • igt@gem_exec_fair@basic-flow@rcs0: > > □ shard-kbl: SKIP (fdo#109271) -> PASS > • igt@gem_exec_fair@basic-none-share@rcs0: > > □ shard-apl: SKIP (fdo#109271) -> PASS > • igt@gem_exec_fair@basic-none-solo@rcs0: > > □ shard-kbl: FAIL ([i915#2842]) -> PASS +1 similar issue > • igt@gem_exec_fair@basic-none@rcs0: > > □ shard-glk: FAIL ([i915#2842]) -> PASS > • igt@gem_exec_fair@basic-throttle@rcs0: > > □ {shard-rkl}: FAIL ([i915#2842]) -> PASS > > □ shard-tglb: FAIL ([i915#2842]) -> PASS > > • igt@gem_mmap_gtt@cpuset-big-copy-xy: > > □ {shard-rkl}: FAIL ([i915#307]) -> PASS > • igt@kms_cursor_crc@pipe-c-cursor-suspend: > > □ shard-skl: INCOMPLETE ([i915#300]) -> PASS > • igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size: > > □ shard-skl: FAIL (i915#2346 / [i915#533]) -> PASS > > □ shard-glk: FAIL (i915#2346 / [i915#533]) -> PASS > > • igt@kms_plane_alpha_blend@pipe-c-constant-alpha-min: > > □ shard-skl: FAIL (fdo#108145 / [i915#265]) -> PASS > • igt@perf@blocking: > > □ shard-skl: FAIL (i915#1542) -> PASS > • igt@perf@polling-parameterized: > > □ {shard-rkl}: FAIL (i915#1542) -> PASS > > Warnings > > • igt@gem_exec_fair@basic-pace-solo@rcs0: > > □ shard-iclb: FAIL ([i915#2851]) -> FAIL ([i915#2842]) > • igt@gem_exec_fair@basic-pace@vcs0: > > □ shard-kbl: FAIL ([i915#2842]) -> SKIP (fdo#109271) > • igt@kms_psr2_sf@plane-move-sf-dmg-area-0: > > □ shard-iclb: SKIP ([i915#2920]) -> SKIP ([i915#658]) > • igt@runner@aborted: > > □ shard-apl: (FAIL, FAIL) ([i915#3002] / [i915#3363]) -> (FAIL, FAIL, > FAIL) (fdo#109271 / i915#180 / i915#1814 / [i915#3363]) > > {name}: This element is suppressed. This means it is ignored when computing > the status of the difference (SUCCESS, WARNING, or FAILURE). > > [i915#2 > > SECURITY NOTE: file ~/.netrc must not be accessible by others
Daniel Vetter pointed out that locking in the GuC submission code was overly complicated, let's clean this up a bit before introducing more features in the GuC submission backend. Also fix some CI failures, port fixes from our internal tree, and add a few more selftests for coverage. Lastly, add some kernel DOC explaining how the GuC submission backend works. v2: Fix logic error in 'Workaround reset G2H is received after schedule done G2H', don't propagate errors to dependent fences in execlists submissiom, resolve checkpatch issues, resend to correct lists v3: Fix issue kicking tasklet, drop guc_active, fix ref counting in xarray, add guc_id sub structure, drop inline fuctions, and various other cleanup suggested by Daniel v4: Address Daniele's feedback, rebase to tip, resend for CI Signed-off-by: Matthew Brost <matthew.brost@intel.com> Matthew Brost (27): drm/i915/guc: Fix blocked context accounting drm/i915/guc: Fix outstanding G2H accounting drm/i915/guc: Unwind context requests in reverse order drm/i915/guc: Don't drop ce->guc_active.lock when unwinding context drm/i915/guc: Process all G2H message at once in work queue drm/i915/guc: Workaround reset G2H is received after schedule done G2H Revert "drm/i915/gt: Propagate change in error status to children on unhold" drm/i915/selftests: Add a cancel request selftest that triggers a reset drm/i915/guc: Kick tasklet after queuing a request drm/i915/guc: Don't enable scheduling on a banned context, guc_id invalid, not registered drm/i915/guc: Copy whole golden context, set engine state size of subset drm/i915/selftests: Add initial GuC selftest for scrubbing lost G2H drm/i915/guc: Take context ref when cancelling request drm/i915/guc: Don't touch guc_state.sched_state without a lock drm/i915/guc: Reset LRC descriptor if register returns -ENODEV drm/i915: Allocate error capture in nowait context drm/i915/guc: Flush G2H work queue during reset drm/i915/guc: Release submit fence from an irq_work drm/i915/guc: Move guc_blocked fence to struct guc_state drm/i915/guc: Rework and simplify locking drm/i915/guc: Proper xarray usage for contexts_lookup drm/i915/guc: Drop pin count check trick between sched_disable and re-pin drm/i915/guc: Move GuC priority fields in context under guc_active drm/i915/guc: Move fields protected by guc->contexts_lock into sub structure drm/i915/guc: Drop guc_active move everything into guc_state drm/i915/guc: Add GuC kernel doc drm/i915/guc: Drop static inline functions intel_guc_submission.c drivers/gpu/drm/i915/gt/intel_context.c | 19 +- drivers/gpu/drm/i915/gt/intel_context_types.h | 81 +- .../drm/i915/gt/intel_execlists_submission.c | 4 - drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 6 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 19 +- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 28 +- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 6 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 996 +++++++++++------- drivers/gpu/drm/i915/gt/uc/selftest_guc.c | 127 +++ drivers/gpu/drm/i915/i915_gpu_error.c | 39 +- drivers/gpu/drm/i915/i915_request.h | 23 +- drivers/gpu/drm/i915/i915_trace.h | 12 +- .../drm/i915/selftests/i915_live_selftests.h | 1 + drivers/gpu/drm/i915/selftests/i915_request.c | 100 ++ .../i915/selftests/intel_scheduler_helpers.c | 12 + .../i915/selftests/intel_scheduler_helpers.h | 2 + 16 files changed, 983 insertions(+), 492 deletions(-) create mode 100644 drivers/gpu/drm/i915/gt/uc/selftest_guc.c