Message ID | 1454095171-22475-7-git-send-email-david.s.gordon@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Jan 29, 2016 at 07:19:31PM +0000, Dave Gordon wrote: > From: Nick Hoath <nicholas.hoath@intel.com> > > Swap the order of context & engine cleanup, so that contexts are cleaned > up first, and *then* engines. This is a more sensible order anyway, but > in particular has become necessary since the 'intel_ring_initialized() > must be simple and inline' patch, which now uses ring->dev as an > 'initialised' flag, so it can now be NULL after engine teardown. This in > turn can cause a problem in the context code, which (used to) check the > ring->dev->struct_mutex -- causing a fault if ring->dev was NULL. > > Also rename the cleanup function to reflect what it actually does > (cleanup engines, not a ringbuffer), and fix an annoying whitespace > issue. > > v2: Also make the fix in i915_load_modeset_init, not just in > i915_driver_unload (Chris Wilson) > v3: Had extra stuff in it. > v4: Reverted extra stuff (so we're back to v2). > Rebased and updated commentary above (Dave Gordon). > > Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> > Signed-off-by: David Gordon <david.s.gordon@intel.com> > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Have to drop that, with the recent context changes. You have to move the gpu-reset now for execlists. Basically pull it out into: static void i915_unload_gem(struct drm_device *dev) { /* * Neither the BIOS, ourselves or any other kernel * expects the system to be in execlists mode on startup, * so we need to reset the GPU back to legacy mode. And the only * known way to disable logical contexts is through a GPU reset. * * So in order to leave the system in a known default configration, * always reset the GPU upon unload. This also cleans up the GEM * state tracking, flushing off the requests and leaving the system * idle. * * Note that is of the upmost importance that the GPU is idle and * all stray writes are flushed *before* we dismantle the backing * storage for the pinned objects. */ i915_reset(dev); mutex_lock(&dev->struct_mutex); i915_gem_context_fini(dev); i915_gem_cleanup_ringbuffer(dev); mutex_unlock(&dev->struct_mutex); } and then kill the intel_gpu_reset along both the cleanup pathsh -Chris
On Sat, Jan 30, 2016 at 11:17:07AM +0000, Chris Wilson wrote: > On Fri, Jan 29, 2016 at 07:19:31PM +0000, Dave Gordon wrote: > > From: Nick Hoath <nicholas.hoath@intel.com> > > > > Swap the order of context & engine cleanup, so that contexts are cleaned > > up first, and *then* engines. This is a more sensible order anyway, but > > in particular has become necessary since the 'intel_ring_initialized() > > must be simple and inline' patch, which now uses ring->dev as an > > 'initialised' flag, so it can now be NULL after engine teardown. This in > > turn can cause a problem in the context code, which (used to) check the > > ring->dev->struct_mutex -- causing a fault if ring->dev was NULL. > > > > Also rename the cleanup function to reflect what it actually does > > (cleanup engines, not a ringbuffer), and fix an annoying whitespace > > issue. > > > > v2: Also make the fix in i915_load_modeset_init, not just in > > i915_driver_unload (Chris Wilson) > > v3: Had extra stuff in it. > > v4: Reverted extra stuff (so we're back to v2). > > Rebased and updated commentary above (Dave Gordon). > > > > Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> > > Signed-off-by: David Gordon <david.s.gordon@intel.com> > > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> > > Have to drop that, with the recent context changes. > > You have to move the gpu-reset now for execlists. > > Basically pull it out into: > > static void i915_unload_gem(struct drm_device *dev) > { > /* > * Neither the BIOS, ourselves or any other kernel > * expects the system to be in execlists mode on startup, > * so we need to reset the GPU back to legacy mode. And the only > * known way to disable logical contexts is through a GPU reset. > * > * So in order to leave the system in a known default configration, > * always reset the GPU upon unload. This also cleans up the GEM > * state tracking, flushing off the requests and leaving the system > * idle. > * > * Note that is of the upmost importance that the GPU is idle and > * all stray writes are flushed *before* we dismantle the backing > * storage for the pinned objects. > */ > i915_reset(dev); > > mutex_lock(&dev->struct_mutex); > i915_gem_context_fini(dev); > i915_gem_cleanup_ringbuffer(dev); > mutex_unlock(&dev->struct_mutex); > } > > and then kill the intel_gpu_reset along both the cleanup pathsh It appears this patch was applied without dropping my r-b for the issue I pointed out above. -Chris
Chris Wilson <chris@chris-wilson.co.uk> writes: > On Sat, Jan 30, 2016 at 11:17:07AM +0000, Chris Wilson wrote: >> On Fri, Jan 29, 2016 at 07:19:31PM +0000, Dave Gordon wrote: >> > From: Nick Hoath <nicholas.hoath@intel.com> >> > >> > Swap the order of context & engine cleanup, so that contexts are cleaned >> > up first, and *then* engines. This is a more sensible order anyway, but >> > in particular has become necessary since the 'intel_ring_initialized() >> > must be simple and inline' patch, which now uses ring->dev as an >> > 'initialised' flag, so it can now be NULL after engine teardown. This in >> > turn can cause a problem in the context code, which (used to) check the >> > ring->dev->struct_mutex -- causing a fault if ring->dev was NULL. >> > >> > Also rename the cleanup function to reflect what it actually does >> > (cleanup engines, not a ringbuffer), and fix an annoying whitespace >> > issue. >> > >> > v2: Also make the fix in i915_load_modeset_init, not just in >> > i915_driver_unload (Chris Wilson) >> > v3: Had extra stuff in it. >> > v4: Reverted extra stuff (so we're back to v2). >> > Rebased and updated commentary above (Dave Gordon). >> > >> > Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> >> > Signed-off-by: David Gordon <david.s.gordon@intel.com> >> > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> >> >> Have to drop that, with the recent context changes. >> >> You have to move the gpu-reset now for execlists. >> >> Basically pull it out into: >> >> static void i915_unload_gem(struct drm_device *dev) >> { >> /* >> * Neither the BIOS, ourselves or any other kernel >> * expects the system to be in execlists mode on startup, >> * so we need to reset the GPU back to legacy mode. And the only >> * known way to disable logical contexts is through a GPU reset. >> * >> * So in order to leave the system in a known default configration, >> * always reset the GPU upon unload. This also cleans up the GEM >> * state tracking, flushing off the requests and leaving the system >> * idle. >> * >> * Note that is of the upmost importance that the GPU is idle and >> * all stray writes are flushed *before* we dismantle the backing >> * storage for the pinned objects. >> */ >> i915_reset(dev); >> >> mutex_lock(&dev->struct_mutex); >> i915_gem_context_fini(dev); >> i915_gem_cleanup_ringbuffer(dev); >> mutex_unlock(&dev->struct_mutex); >> } >> >> and then kill the intel_gpu_reset along both the cleanup pathsh > > It appears this patch was applied without dropping my r-b for the issue > I pointed out above. Now causes a splat in intel_logical_ring_cleanup when unloading module. Best to revert and rework on top of Dave's cleanup set? -Mika [ 58.170369] BUG: unable to handle kernel NULL pointer dereference at (null) [ 58.170374] IP: [<ffffffffa00e04d3>] intel_logical_ring_cleanup+0x83/0x100 [i915] [ 58.170389] PGD 0 [ 58.170391] Oops: 0000 [#1] PREEMPT SMP [ 58.170394] Modules linked in: x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i915(-) mei_me mei i2c_hid e1000e ptp pps_core [ 58.170404] CPU: 0 PID: 5766 Comm: rmmod Not tainted 4.5.0-rc3+ #43 [ 58.170407] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 0505 11/16/2015 [ 58.170410] task: ffff880036aab380 ti: ffff8802374c0000 task.ti: ffff8802374c0000 [ 58.170412] RIP: 0010:[<ffffffffa00e04d3>] [<ffffffffa00e04d3>] intel_logical_ring_cleanup+0x83/0x100 [i915] [ 58.170424] RSP: 0018:ffff8802374c3d30 EFLAGS: 00010282 [ 58.170425] RAX: 0000000000000000 RBX: ffff880237203788 RCX: 0000000000000001 [ 58.170428] RDX: 0000000087654321 RSI: 000000000000000d RDI: ffff8802372037c0 [ 58.170430] RBP: ffff8802374c3d40 R08: 0000000000000000 R09: 0000000ad856c000 [ 58.170432] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880237200000 [ 58.170434] R13: ffff880237209638 R14: ffff880237323000 R15: 0000558a770bd090 [ 58.170438] FS: 00007f8b439ff700(0000) GS:ffff880239800000(0000) knlGS:0000000000000000 [ 58.170442] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 58.170444] CR2: 0000000000000000 CR3: 000000023532d000 CR4: 00000000003406f0 [ 58.170447] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 58.170450] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400 [ 58.170453] Stack: [ 58.170455] ffff880237203788 ffff880237200000 ffff8802374c3d70 ffffffffa00d0ed4 [ 58.170460] ffff880237200000 ffff880237323000 ffff880237323060 ffffffffa0194360 [ 58.170465] ffff8802374c3d98 ffffffffa0154520 ffff880237323000 ffff880237323000 [ 58.170469] Call Trace: [ 58.170479] [<ffffffffa00d0ed4>] i915_gem_cleanup_engines+0x34/0x60 [i915] [ 58.170493] [<ffffffffa0154520>] i915_driver_unload+0x140/0x220 [i915] [ 58.170497] [<ffffffff8154a4f4>] drm_dev_unregister+0x24/0xa0 [ 58.170501] [<ffffffff8154aace>] drm_put_dev+0x1e/0x60 [ 58.170506] [<ffffffffa00912a0>] i915_pci_remove+0x10/0x20 [i915] [ 58.170510] [<ffffffff814766e4>] pci_device_remove+0x34/0xb0 [ 58.170514] [<ffffffff8156e7d5>] __device_release_driver+0x95/0x140 [ 58.170518] [<ffffffff8156e97c>] driver_detach+0xbc/0xc0 [ 58.170521] [<ffffffff8156d883>] bus_remove_driver+0x53/0xd0 [ 58.170525] [<ffffffff8156f3a7>] driver_unregister+0x27/0x50 [ 58.170528] [<ffffffff81475725>] pci_unregister_driver+0x25/0x70 [ 58.170531] [<ffffffff8154c274>] drm_pci_exit+0x74/0x90 [ 58.170543] [<ffffffffa0154cb0>] i915_exit+0x20/0x1aa [i915] [ 58.170548] [<ffffffff8111846f>] SyS_delete_module+0x18f/0x1f0 > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre
On 11/02/16 15:02, Mika Kuoppala wrote: > Chris Wilson <chris@chris-wilson.co.uk> writes: > >> On Sat, Jan 30, 2016 at 11:17:07AM +0000, Chris Wilson wrote: >>> On Fri, Jan 29, 2016 at 07:19:31PM +0000, Dave Gordon wrote: >>>> From: Nick Hoath <nicholas.hoath@intel.com> >>>> >>>> Swap the order of context & engine cleanup, so that contexts are cleaned >>>> up first, and *then* engines. This is a more sensible order anyway, but >>>> in particular has become necessary since the 'intel_ring_initialized() >>>> must be simple and inline' patch, which now uses ring->dev as an >>>> 'initialised' flag, so it can now be NULL after engine teardown. This in >>>> turn can cause a problem in the context code, which (used to) check the >>>> ring->dev->struct_mutex -- causing a fault if ring->dev was NULL. >>>> >>>> Also rename the cleanup function to reflect what it actually does >>>> (cleanup engines, not a ringbuffer), and fix an annoying whitespace >>>> issue. >>>> >>>> v2: Also make the fix in i915_load_modeset_init, not just in >>>> i915_driver_unload (Chris Wilson) >>>> v3: Had extra stuff in it. >>>> v4: Reverted extra stuff (so we're back to v2). >>>> Rebased and updated commentary above (Dave Gordon). >>>> >>>> Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> >>>> Signed-off-by: David Gordon <david.s.gordon@intel.com> >>>> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> >>> >>> Have to drop that, with the recent context changes. >>> >>> You have to move the gpu-reset now for execlists. >>> >>> Basically pull it out into: >>> >>> static void i915_unload_gem(struct drm_device *dev) >>> { >>> /* >>> * Neither the BIOS, ourselves or any other kernel >>> * expects the system to be in execlists mode on startup, >>> * so we need to reset the GPU back to legacy mode. And the only >>> * known way to disable logical contexts is through a GPU reset. >>> * >>> * So in order to leave the system in a known default configration, >>> * always reset the GPU upon unload. This also cleans up the GEM >>> * state tracking, flushing off the requests and leaving the system >>> * idle. >>> * >>> * Note that is of the upmost importance that the GPU is idle and >>> * all stray writes are flushed *before* we dismantle the backing >>> * storage for the pinned objects. >>> */ >>> i915_reset(dev); >>> >>> mutex_lock(&dev->struct_mutex); >>> i915_gem_context_fini(dev); >>> i915_gem_cleanup_ringbuffer(dev); >>> mutex_unlock(&dev->struct_mutex); >>> } >>> >>> and then kill the intel_gpu_reset along both the cleanup pathsh >> >> It appears this patch was applied without dropping my r-b for the issue >> I pointed out above. > > Now causes a splat in intel_logical_ring_cleanup when unloading module. > > Best to revert and rework on top of Dave's cleanup set? > -Mika This whole patchset was already superseded by a newer version before Daniel merged it. The newer version doesn't have Chris's R-B on this (context/engine) patch, only on two others that are still as they were when he reviewed them. Please go and look instead at [v5, 11 patches] posted 2016-02-05. .Dave.
On 11/02/16 13:36, Chris Wilson wrote: > On Sat, Jan 30, 2016 at 11:17:07AM +0000, Chris Wilson wrote: >> On Fri, Jan 29, 2016 at 07:19:31PM +0000, Dave Gordon wrote: >>> From: Nick Hoath <nicholas.hoath@intel.com> >>> >>> Swap the order of context & engine cleanup, so that contexts are cleaned >>> up first, and *then* engines. This is a more sensible order anyway, but >>> in particular has become necessary since the 'intel_ring_initialized() >>> must be simple and inline' patch, which now uses ring->dev as an >>> 'initialised' flag, so it can now be NULL after engine teardown. This in >>> turn can cause a problem in the context code, which (used to) check the >>> ring->dev->struct_mutex -- causing a fault if ring->dev was NULL. [snip] > It appears this patch was applied without dropping my r-b for the issue > I pointed out above. > -Chris Not only that, but it was plucked from the end of the set of 6 patches without the earlier ones in the sequence, despite the cover letter [0/6] saying this: Patches reordered again; the bug fixes are now in patches 3 and 5. The former could stand alone; the latter depends on patch 4 and is a prerequisite for Nick's patch [6/6] to function. So there's no chance that this one alone could be expected to work :( .Dave.
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index db9b0c6..3eb6844 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -444,8 +444,8 @@ static int i915_load_modeset_init(struct drm_device *dev) cleanup_gem: mutex_lock(&dev->struct_mutex); - i915_gem_cleanup_ringbuffer(dev); i915_gem_context_fini(dev); + i915_gem_cleanup_engines(dev); mutex_unlock(&dev->struct_mutex); cleanup_irq: intel_guc_ucode_fini(dev); @@ -1220,8 +1220,8 @@ int i915_driver_unload(struct drm_device *dev) intel_guc_ucode_fini(dev); mutex_lock(&dev->struct_mutex); - i915_gem_cleanup_ringbuffer(dev); i915_gem_context_fini(dev); + i915_gem_cleanup_engines(dev); mutex_unlock(&dev->struct_mutex); intel_fbc_cleanup_cfb(dev_priv); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 905e90f..8baaca7 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -3020,7 +3020,7 @@ int i915_gem_init_rings(struct drm_device *dev); int __must_check i915_gem_init_hw(struct drm_device *dev); int i915_gem_l3_remap(struct drm_i915_gem_request *req, int slice); void i915_gem_init_swizzling(struct drm_device *dev); -void i915_gem_cleanup_ringbuffer(struct drm_device *dev); +void i915_gem_cleanup_engines(struct drm_device *dev); int __must_check i915_gpu_idle(struct drm_device *dev); int __must_check i915_gem_suspend(struct drm_device *dev); void __i915_add_request(struct drm_i915_gem_request *req, diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 5a4d468..4ee6df9 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4912,7 +4912,7 @@ int i915_gem_init_rings(struct drm_device *dev) req = i915_gem_request_alloc(ring, NULL); if (IS_ERR(req)) { ret = PTR_ERR(req); - i915_gem_cleanup_ringbuffer(dev); + i915_gem_cleanup_engines(dev); goto out; } @@ -4925,7 +4925,7 @@ int i915_gem_init_rings(struct drm_device *dev) if (ret && ret != -EIO) { DRM_ERROR("PPGTT enable ring #%d failed %d\n", i, ret); i915_gem_request_cancel(req); - i915_gem_cleanup_ringbuffer(dev); + i915_gem_cleanup_engines(dev); goto out; } @@ -4933,7 +4933,7 @@ int i915_gem_init_rings(struct drm_device *dev) if (ret && ret != -EIO) { DRM_ERROR("Context enable ring #%d failed %d\n", i, ret); i915_gem_request_cancel(req); - i915_gem_cleanup_ringbuffer(dev); + i915_gem_cleanup_engines(dev); goto out; } @@ -5011,7 +5011,7 @@ int i915_gem_init(struct drm_device *dev) } void -i915_gem_cleanup_ringbuffer(struct drm_device *dev) +i915_gem_cleanup_engines(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; struct intel_engine_cs *ring; @@ -5020,13 +5020,14 @@ int i915_gem_init(struct drm_device *dev) for_each_ring(ring, dev_priv, i) dev_priv->gt.cleanup_ring(ring); - if (i915.enable_execlists) - /* - * Neither the BIOS, ourselves or any other kernel - * expects the system to be in execlists mode on startup, - * so we need to reset the GPU back to legacy mode. - */ - intel_gpu_reset(dev); + if (i915.enable_execlists) { + /* + * Neither the BIOS, ourselves or any other kernel + * expects the system to be in execlists mode on startup, + * so we need to reset the GPU back to legacy mode. + */ + intel_gpu_reset(dev); + } } static void