diff mbox

drm/i915: Hold a RPM reference during i915_driver_unload

Message ID 1451386533-1811-1-git-send-email-gabriel.feceoru@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Feceoru, Gabriel Dec. 29, 2015, 10:55 a.m. UTC
This fixes an issue added with: "1f814da drm/i915: add support for checking
if we hold an RPM reference", noticed while running drv_module_reload_basic.

WARNING: CPU: 1 PID: 2032 at drivers/gpu/drm/i915/intel_drv.h:1446 gen6_read32+0x1ca/0x1e0 [i915]()
[  138.682686] RPM wakelock ref not held during HW access
[  138.682687] Modules linked in:
[  138.682688]  i915(-) drm_kms_helper drm snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec snd_hwdep x86_pkg_temp_thermal snd_hda_core i2c_algo_bit syscopyarea sysfillrect sysimgblt fb_sys_fops xhci_pci ehci_pci r8169 xhci_hcd mii ehci_hcd video [last unloaded: snd_hda_intel]
[  138.682699] CPU: 1 PID: 2032 Comm: rmmod Tainted: G        W       4.4.0-rc4+ #44
[  138.682701] Hardware name: Dell Inc. Inspiron 3847/088DT1       , BIOS A06 01/15/2015
[  138.682702]  ffffffffc03b6358 ffff880210d8ba58 ffffffff813e0c0f ffff880210d8baa0
[  138.682703]  ffff880210d8ba90 ffffffff8105f6a2 ffff8800daa40000 0000000000064400
[  138.682705]  0000000000000004 ffff880210d8bb9c ffff8800daa40000 ffff880210d8baf0
[  138.682706] Call Trace:
[  138.682710]  [<ffffffff813e0c0f>] dump_stack+0x44/0x55
[  138.682713]  [<ffffffff8105f6a2>] warn_slowpath_common+0x82/0xc0
[  138.682715]  [<ffffffff8105f72c>] warn_slowpath_fmt+0x4c/0x50
[  138.682725]  [<ffffffffc031aefc>] ? i915_gem_object_unpin_from_display_plane+0x1c/0x50 [i915]
[  138.682734]  [<ffffffffc0333b9a>] gen6_read32+0x1ca/0x1e0 [i915]
[  138.682737]  [<ffffffff8172c562>] ? mutex_lock+0x12/0x30
[  138.682747]  [<ffffffffc03715ca>] intel_ddi_get_hw_state+0x7a/0x180 [i915]
[  138.682758]  [<ffffffffc0355c88>] intel_connector_get_hw_state+0x28/0x30 [i915]
[  138.682767]  [<ffffffffc03543fc>] intel_atomic_commit+0xa9c/0x17e0 [i915]
[  138.682779]  [<ffffffffc00a7e8e>] ? drm_atomic_check_only+0x18e/0x590 [drm]
[  138.682786]  [<ffffffffc00a78cc>] ? drm_atomic_add_affected_connectors+0x8c/0xf0 [drm]
[  138.682792]  [<ffffffffc00a82c7>] drm_atomic_commit+0x37/0x60 [drm]
[  138.682797]  [<ffffffffc0163356>] drm_atomic_helper_set_config+0x76/0xb0 [drm_kms_helper]
[  138.682804]  [<ffffffffc00a696a>] ? drm_modeset_lock_all_ctx+0x9a/0xb0 [drm]
[  138.682809]  [<ffffffffc00979c2>] drm_mode_set_config_internal+0x62/0x100 [drm]
[  138.682814]  [<ffffffffc0097b48>] drm_framebuffer_remove+0xe8/0x120 [drm]
[  138.682826]  [<ffffffffc036bb4d>] intel_fbdev_fini+0x6d/0x90 [i915]
[  138.682838]  [<ffffffffc0396b9a>] i915_driver_unload+0x1a/0x290 [i915]
[  138.682844]  [<ffffffffc0090ff9>] drm_dev_unregister+0x29/0xb0 [drm]
[  138.682848]  [<ffffffffc0091673>] drm_put_dev+0x23/0x60 [drm]
[  138.682854]  [<ffffffffc02dc315>] i915_pci_remove+0x15/0x20 [i915]
[  138.682856]  [<ffffffff8141f409>] pci_device_remove+0x39/0xc0
[  138.682859]  [<ffffffff814e3d61>] __device_release_driver+0xa1/0x150
[  138.682860]  [<ffffffff814e4833>] driver_detach+0xa3/0xb0
[  138.682862]  [<ffffffff814e3825>] bus_remove_driver+0x55/0xd0
[  138.682864]  [<ffffffff814e4e2c>] driver_unregister+0x2c/0x50
[  138.682866]  [<ffffffff8141db31>] pci_unregister_driver+0x21/0x90
[  138.682871]  [<ffffffffc0092ec4>] drm_pci_exit+0x94/0xb0 [drm]
[  138.682883]  [<ffffffffc0397404>] i915_exit+0x20/0xc1c [i915]

Reported-by: Marius Vlad <marius.c.vlad@intel.com>
Signed-off-by: Gabriel Feceoru <gabriel.feceoru@intel.com>
---
 drivers/gpu/drm/i915/i915_dma.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Joonas Lahtinen Dec. 30, 2015, 1:03 p.m. UTC | #1
Hi,

On ti, 2015-12-29 at 12:55 +0200, Gabriel Feceoru wrote:
> This fixes an issue added with: "1f814da drm/i915: add support for
> checking
> if we hold an RPM reference", noticed while running
> drv_module_reload_basic.
> 
> WARNING: CPU: 1 PID: 2032 at drivers/gpu/drm/i915/intel_drv.h:1446
> gen6_read32+0x1ca/0x1e0 [i915]()
> [  138.682686] RPM wakelock ref not held during HW access
> [  138.682687] Modules linked in:
> [  138.682688]  i915(-) drm_kms_helper drm snd_hda_codec_hdmi
> snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec snd_hwdep
> x86_pkg_temp_thermal snd_hda_core i2c_algo_bit syscopyarea
> sysfillrect sysimgblt fb_sys_fops xhci_pci ehci_pci r8169 xhci_hcd
> mii ehci_hcd video [last unloaded: snd_hda_intel]
> [  138.682699] CPU: 1 PID: 2032 Comm: rmmod Tainted: G        W      
>  4.4.0-rc4+ #44
> [  138.682701] Hardware name: Dell Inc. Inspiron 3847/088DT1       ,
> BIOS A06 01/15/2015
> [  138.682702]  ffffffffc03b6358 ffff880210d8ba58 ffffffff813e0c0f
> ffff880210d8baa0
> [  138.682703]  ffff880210d8ba90 ffffffff8105f6a2 ffff8800daa40000
> 0000000000064400
> [  138.682705]  0000000000000004 ffff880210d8bb9c ffff8800daa40000
> ffff880210d8baf0
> [  138.682706] Call Trace:
> [  138.682710]  [<ffffffff813e0c0f>] dump_stack+0x44/0x55
> [  138.682713]  [<ffffffff8105f6a2>] warn_slowpath_common+0x82/0xc0
> [  138.682715]  [<ffffffff8105f72c>] warn_slowpath_fmt+0x4c/0x50
> [  138.682725]  [<ffffffffc031aefc>] ?
> i915_gem_object_unpin_from_display_plane+0x1c/0x50 [i915]
> [  138.682734]  [<ffffffffc0333b9a>] gen6_read32+0x1ca/0x1e0 [i915]
> [  138.682737]  [<ffffffff8172c562>] ? mutex_lock+0x12/0x30
> [  138.682747]  [<ffffffffc03715ca>]
> intel_ddi_get_hw_state+0x7a/0x180 [i915]
> [  138.682758]  [<ffffffffc0355c88>]
> intel_connector_get_hw_state+0x28/0x30 [i915]
> [  138.682767]  [<ffffffffc03543fc>] intel_atomic_commit+0xa9c/0x17e0
> [i915]
> [  138.682779]  [<ffffffffc00a7e8e>] ?
> drm_atomic_check_only+0x18e/0x590 [drm]
> [  138.682786]  [<ffffffffc00a78cc>] ?
> drm_atomic_add_affected_connectors+0x8c/0xf0 [drm]
> [  138.682792]  [<ffffffffc00a82c7>] drm_atomic_commit+0x37/0x60
> [drm]
> [  138.682797]  [<ffffffffc0163356>]
> drm_atomic_helper_set_config+0x76/0xb0 [drm_kms_helper]
> [  138.682804]  [<ffffffffc00a696a>] ?
> drm_modeset_lock_all_ctx+0x9a/0xb0 [drm]
> [  138.682809]  [<ffffffffc00979c2>]
> drm_mode_set_config_internal+0x62/0x100 [drm]
> [  138.682814]  [<ffffffffc0097b48>]
> drm_framebuffer_remove+0xe8/0x120 [drm]
> [  138.682826]  [<ffffffffc036bb4d>] intel_fbdev_fini+0x6d/0x90
> [i915]
> [  138.682838]  [<ffffffffc0396b9a>] i915_driver_unload+0x1a/0x290
> [i915]
> [  138.682844]  [<ffffffffc0090ff9>] drm_dev_unregister+0x29/0xb0
> [drm]
> [  138.682848]  [<ffffffffc0091673>] drm_put_dev+0x23/0x60 [drm]
> [  138.682854]  [<ffffffffc02dc315>] i915_pci_remove+0x15/0x20 [i915]
> [  138.682856]  [<ffffffff8141f409>] pci_device_remove+0x39/0xc0
> [  138.682859]  [<ffffffff814e3d61>]
> __device_release_driver+0xa1/0x150
> [  138.682860]  [<ffffffff814e4833>] driver_detach+0xa3/0xb0
> [  138.682862]  [<ffffffff814e3825>] bus_remove_driver+0x55/0xd0
> [  138.682864]  [<ffffffff814e4e2c>] driver_unregister+0x2c/0x50
> [  138.682866]  [<ffffffff8141db31>] pci_unregister_driver+0x21/0x90
> [  138.682871]  [<ffffffffc0092ec4>] drm_pci_exit+0x94/0xb0 [drm]
> [  138.682883]  [<ffffffffc0397404>] i915_exit+0x20/0xc1c [i915]
> 
> Reported-by: Marius Vlad <marius.c.vlad@intel.com>
> Signed-off-by: Gabriel Feceoru <gabriel.feceoru@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_dma.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c
> b/drivers/gpu/drm/i915/i915_dma.c
> index 988a380..08ad01f0 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1136,6 +1136,8 @@ int i915_driver_unload(struct drm_device *dev)
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	int ret;
>  
> +	intel_runtime_pm_get(dev_priv);
> +
>  	intel_fbdev_fini(dev);
>  
>  	i915_audio_component_cleanup(dev_priv);
> @@ -1143,6 +1145,7 @@ int i915_driver_unload(struct drm_device *dev)
>  	ret = i915_gem_suspend(dev);
>  	if (ret) {
>  		DRM_ERROR("failed to idle hardware: %d\n", ret);
> +		intel_runtime_pm_put(dev_priv);

This should be made into goto construct.

>  		return ret;
>  	}
>  
> @@ -1221,6 +1224,9 @@ int i915_driver_unload(struct drm_device *dev)
>  	kmem_cache_destroy(dev_priv->vmas);
>  	kmem_cache_destroy(dev_priv->objects);
>  	pci_dev_put(dev_priv->bridge_dev);
> +
> +	intel_runtime_pm_put(dev_priv);
> +

Not sure if we should/can keep the runtime reference until this point.
At worst this could lead into the runtime_pm_put function poking at the
hardware registers after the pci_dev has been released.

Also if we change the hangcheck task to execute depending on the
runtime_pm count, this will surely cause trouble. Added Imre as CC to
comment on this.

>  	kfree(dev_priv);
> 
> 	return 0;

Insert goto label around here and make it "return ret;".

Regards, Joonas

>
Feceoru, Gabriel Dec. 31, 2015, 10:44 a.m. UTC | #2
On 30.12.2015 15:03, Joonas Lahtinen wrote:
> Hi,
>
> On ti, 2015-12-29 at 12:55 +0200, Gabriel Feceoru wrote:
>> This fixes an issue added with: "1f814da drm/i915: add support for
>> checking
>> if we hold an RPM reference", noticed while running
>> drv_module_reload_basic.
>>
>> WARNING: CPU: 1 PID: 2032 at drivers/gpu/drm/i915/intel_drv.h:1446
>> gen6_read32+0x1ca/0x1e0 [i915]()
>> [  138.682686] RPM wakelock ref not held during HW access
>> [  138.682687] Modules linked in:
>> [  138.682688]  i915(-) drm_kms_helper drm snd_hda_codec_hdmi
>> snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec snd_hwdep
>> x86_pkg_temp_thermal snd_hda_core i2c_algo_bit syscopyarea
>> sysfillrect sysimgblt fb_sys_fops xhci_pci ehci_pci r8169 xhci_hcd
>> mii ehci_hcd video [last unloaded: snd_hda_intel]
>> [  138.682699] CPU: 1 PID: 2032 Comm: rmmod Tainted: G        W
>>   4.4.0-rc4+ #44
>> [  138.682701] Hardware name: Dell Inc. Inspiron 3847/088DT1       ,
>> BIOS A06 01/15/2015
>> [  138.682702]  ffffffffc03b6358 ffff880210d8ba58 ffffffff813e0c0f
>> ffff880210d8baa0
>> [  138.682703]  ffff880210d8ba90 ffffffff8105f6a2 ffff8800daa40000
>> 0000000000064400
>> [  138.682705]  0000000000000004 ffff880210d8bb9c ffff8800daa40000
>> ffff880210d8baf0
>> [  138.682706] Call Trace:
>> [  138.682710]  [<ffffffff813e0c0f>] dump_stack+0x44/0x55
>> [  138.682713]  [<ffffffff8105f6a2>] warn_slowpath_common+0x82/0xc0
>> [  138.682715]  [<ffffffff8105f72c>] warn_slowpath_fmt+0x4c/0x50
>> [  138.682725]  [<ffffffffc031aefc>] ?
>> i915_gem_object_unpin_from_display_plane+0x1c/0x50 [i915]
>> [  138.682734]  [<ffffffffc0333b9a>] gen6_read32+0x1ca/0x1e0 [i915]
>> [  138.682737]  [<ffffffff8172c562>] ? mutex_lock+0x12/0x30
>> [  138.682747]  [<ffffffffc03715ca>]
>> intel_ddi_get_hw_state+0x7a/0x180 [i915]
>> [  138.682758]  [<ffffffffc0355c88>]
>> intel_connector_get_hw_state+0x28/0x30 [i915]
>> [  138.682767]  [<ffffffffc03543fc>] intel_atomic_commit+0xa9c/0x17e0
>> [i915]
>> [  138.682779]  [<ffffffffc00a7e8e>] ?
>> drm_atomic_check_only+0x18e/0x590 [drm]
>> [  138.682786]  [<ffffffffc00a78cc>] ?
>> drm_atomic_add_affected_connectors+0x8c/0xf0 [drm]
>> [  138.682792]  [<ffffffffc00a82c7>] drm_atomic_commit+0x37/0x60
>> [drm]
>> [  138.682797]  [<ffffffffc0163356>]
>> drm_atomic_helper_set_config+0x76/0xb0 [drm_kms_helper]
>> [  138.682804]  [<ffffffffc00a696a>] ?
>> drm_modeset_lock_all_ctx+0x9a/0xb0 [drm]
>> [  138.682809]  [<ffffffffc00979c2>]
>> drm_mode_set_config_internal+0x62/0x100 [drm]
>> [  138.682814]  [<ffffffffc0097b48>]
>> drm_framebuffer_remove+0xe8/0x120 [drm]
>> [  138.682826]  [<ffffffffc036bb4d>] intel_fbdev_fini+0x6d/0x90
>> [i915]
>> [  138.682838]  [<ffffffffc0396b9a>] i915_driver_unload+0x1a/0x290
>> [i915]
>> [  138.682844]  [<ffffffffc0090ff9>] drm_dev_unregister+0x29/0xb0
>> [drm]
>> [  138.682848]  [<ffffffffc0091673>] drm_put_dev+0x23/0x60 [drm]
>> [  138.682854]  [<ffffffffc02dc315>] i915_pci_remove+0x15/0x20 [i915]
>> [  138.682856]  [<ffffffff8141f409>] pci_device_remove+0x39/0xc0
>> [  138.682859]  [<ffffffff814e3d61>]
>> __device_release_driver+0xa1/0x150
>> [  138.682860]  [<ffffffff814e4833>] driver_detach+0xa3/0xb0
>> [  138.682862]  [<ffffffff814e3825>] bus_remove_driver+0x55/0xd0
>> [  138.682864]  [<ffffffff814e4e2c>] driver_unregister+0x2c/0x50
>> [  138.682866]  [<ffffffff8141db31>] pci_unregister_driver+0x21/0x90
>> [  138.682871]  [<ffffffffc0092ec4>] drm_pci_exit+0x94/0xb0 [drm]
>> [  138.682883]  [<ffffffffc0397404>] i915_exit+0x20/0xc1c [i915]
>>
>> Reported-by: Marius Vlad <marius.c.vlad@intel.com>
>> Signed-off-by: Gabriel Feceoru <gabriel.feceoru@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_dma.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_dma.c
>> b/drivers/gpu/drm/i915/i915_dma.c
>> index 988a380..08ad01f0 100644
>> --- a/drivers/gpu/drm/i915/i915_dma.c
>> +++ b/drivers/gpu/drm/i915/i915_dma.c
>> @@ -1136,6 +1136,8 @@ int i915_driver_unload(struct drm_device *dev)
>>   	struct drm_i915_private *dev_priv = dev->dev_private;
>>   	int ret;
>>
>> +	intel_runtime_pm_get(dev_priv);
>> +
>>   	intel_fbdev_fini(dev);
>>
>>   	i915_audio_component_cleanup(dev_priv);
>> @@ -1143,6 +1145,7 @@ int i915_driver_unload(struct drm_device *dev)
>>   	ret = i915_gem_suspend(dev);
>>   	if (ret) {
>>   		DRM_ERROR("failed to idle hardware: %d\n", ret);
>> +		intel_runtime_pm_put(dev_priv);
>
> This should be made into goto construct.
I cannot have one instance only of intel_runtime_put() at the end of the 
function, since one exit branch uses kfree(dev_priv) and the other 
doesn't. Similar to i915_driver_load()
If this comment refers just to the 'return ret' - this is a legacy issue 
not related to this fix. Could be seen in many places (i915_driver_load 
again), just check i915_driver_open() where the last 3 lines of code 
could be replaced with 'return ret'
>
>>   		return ret;
>>   	}
>>
>> @@ -1221,6 +1224,9 @@ int i915_driver_unload(struct drm_device *dev)
>>   	kmem_cache_destroy(dev_priv->vmas);
>>   	kmem_cache_destroy(dev_priv->objects);
>>   	pci_dev_put(dev_priv->bridge_dev);
>> +
>> +	intel_runtime_pm_put(dev_priv);
>> +
>
> Not sure if we should/can keep the runtime reference until this point.
> At worst this could lead into the runtime_pm_put function poking at the
> hardware registers after the pci_dev has been released.
>
> Also if we change the hangcheck task to execute depending on the
> runtime_pm count, this will surely cause trouble. Added Imre as CC to
> comment on this.
I'm not sure either, but again, the same is done in i915_driver_load().
Waiting for Imre's feedback on this.
>
>>   	kfree(dev_priv);
>>
>> 	return 0;
>
> Insert goto label around here and make it "return ret;".
>
> Regards, Joonas
>
>>
Joonas Lahtinen Dec. 31, 2015, 11:27 a.m. UTC | #3
On to, 2015-12-31 at 12:44 +0200, Gabriel Feceoru wrote:
> 
> On 30.12.2015 15:03, Joonas Lahtinen wrote:
> > Hi

<SNIP>

> > > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > > @@ -1136,6 +1136,8 @@ int i915_driver_unload(struct drm_device
> > > *dev)
> > >   	struct drm_i915_private *dev_priv = dev->dev_private;
> > >   	int ret;
> > > 
> > > +	intel_runtime_pm_get(dev_priv);
> > > +
> > >   	intel_fbdev_fini(dev);
> > > 
> > >   	i915_audio_component_cleanup(dev_priv);
> > > @@ -1143,6 +1145,7 @@ int i915_driver_unload(struct drm_device
> > > *dev)
> > >   	ret = i915_gem_suspend(dev);
> > >   	if (ret) {
> > >   		DRM_ERROR("failed to idle hardware: %d\n",
> > > ret);
> > > +		intel_runtime_pm_put(dev_priv);
> > 
> > This should be made into goto construct.
> I cannot have one instance only of intel_runtime_put() at the end of
> the 
> function, since one exit branch uses kfree(dev_priv) and the other 
> doesn't. Similar to i915_driver_load()
> If this comment refers just to the 'return ret' - this is a legacy
> issue 
> not related to this fix. Could be seen in many places
> (i915_driver_load 
> again), just check i915_driver_open() where the last 3 lines of code 
> could be replaced with 'return ret'

In that case a second path is made for erroring out of the function
(see i915_driver_load):
	...
	return 0;
label1:
	...
labelN:
	intel_runtime_pm_put(dev_priv)
	return ret;

> > 
> > >   		return ret;
> > >   	}
> > > 
> > > @@ -1221,6 +1224,9 @@ int i915_driver_unload(struct drm_device
> > > *dev)
> > >   	kmem_cache_destroy(dev_priv->vmas);
> > >   	kmem_cache_destroy(dev_priv->objects);
> > >   	pci_dev_put(dev_priv->bridge_dev);
> > > +
> > > +	intel_runtime_pm_put(dev_priv);
> > > +
> > 
> > Not sure if we should/can keep the runtime reference until this
> > point.
> > At worst this could lead into the runtime_pm_put function poking at
> > the
> > hardware registers after the pci_dev has been released.
> > 
> > Also if we change the hangcheck task to execute depending on the
> > runtime_pm count, this will surely cause trouble. Added Imre as CC
> > to
> > comment on this.
> I'm not sure either, but again, the same is done in
> i915_driver_load().

Except that i915_driver_load is the function during which autosuspend
is enabled by calling intel_runtime_pm_enable so it's quite a different
animal. But I think that function has it incorrectly too, the return
path for no errors, _put is correct, but the return path for errors,
the _put should really be _put_noidle because the whole RPM thing was
torn down already see below for explanation.

The load/unload sequences are really not in sync (compare the _unload
function to the _load function teardown path for example), and in
addition to that it is not exactly clear (without going through all the
functions one by one) which functions might end up altering the
hardware state and do require the wakeref and which are tearing down
pure software constructs, so I might be off here, but this is what it
looks like to me.

I feel uneasy for the fact we're calling runtime_pm_put after releasing
the PCI device which basically should be the last put and which should
end up suspending the device when there is no device being held
anymore.

Also, in i915_driver_unload function the PM system teardown has already
been performed by intel_power_domains_fini way earlier in the function,
after which I think we can call at most runtime_pm_put_noidle because
essentially the device has been shut down already. 

I do not think those error paths have been stressed much.

Regards, Joonas

> Waiting for Imre's feedback on this.
> > 
> > >   	kfree(dev_priv);
> > > 
> > > 	return 0;
> > 
> > Insert goto label around here and make it "return ret;".
> > 
> > Regards, Joonas
> > 
> > >
Imre Deak Jan. 14, 2016, 5:30 p.m. UTC | #4
On ke, 2015-12-30 at 15:03 +0200, Joonas Lahtinen wrote:
> Hi,
> 
> On ti, 2015-12-29 at 12:55 +0200, Gabriel Feceoru wrote:
> > This fixes an issue added with: "1f814da drm/i915: add support for
> > checking
> > if we hold an RPM reference", noticed while running
> > drv_module_reload_basic.
> > 
> > WARNING: CPU: 1 PID: 2032 at drivers/gpu/drm/i915/intel_drv.h:1446
> > gen6_read32+0x1ca/0x1e0 [i915]()
> > [  138.682686] RPM wakelock ref not held during HW access
> > [  138.682687] Modules linked in:
> > [  138.682688]  i915(-) drm_kms_helper drm snd_hda_codec_hdmi
> > snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec snd_hwdep
> > x86_pkg_temp_thermal snd_hda_core i2c_algo_bit syscopyarea
> > sysfillrect sysimgblt fb_sys_fops xhci_pci ehci_pci r8169 xhci_hcd
> > mii ehci_hcd video [last unloaded: snd_hda_intel]
> > [  138.682699] CPU: 1 PID: 2032 Comm: rmmod Tainted: G        W      
> >  4.4.0-rc4+ #44
> > [  138.682701] Hardware name: Dell Inc. Inspiron 3847/088DT1       ,
> > BIOS A06 01/15/2015
> > [  138.682702]  ffffffffc03b6358 ffff880210d8ba58 ffffffff813e0c0f
> > ffff880210d8baa0
> > [  138.682703]  ffff880210d8ba90 ffffffff8105f6a2 ffff8800daa40000
> > 0000000000064400
> > [  138.682705]  0000000000000004 ffff880210d8bb9c ffff8800daa40000
> > ffff880210d8baf0
> > [  138.682706] Call Trace:
> > [  138.682710]  [] dump_stack+0x44/0x55
> > [  138.682713]  [] warn_slowpath_common+0x82/0xc0
> > [  138.682715]  [] warn_slowpath_fmt+0x4c/0x50
> > [  138.682725]  [] ?
> > i915_gem_object_unpin_from_display_plane+0x1c/0x50 [i915]
> > [  138.682734]  [] gen6_read32+0x1ca/0x1e0 [i915]
> > [  138.682737]  [] ? mutex_lock+0x12/0x30
> > [  138.682747]  []
> > intel_ddi_get_hw_state+0x7a/0x180 [i915]
> > [  138.682758]  []
> > intel_connector_get_hw_state+0x28/0x30 [i915]
> > [  138.682767]  [] intel_atomic_commit+0xa9c/0x17e0
> > [i915]
> > [  138.682779]  [] ?
> > drm_atomic_check_only+0x18e/0x590 [drm]
> > [  138.682786]  [] ?
> > drm_atomic_add_affected_connectors+0x8c/0xf0 [drm]
> > [  138.682792]  [] drm_atomic_commit+0x37/0x60
> > [drm]
> > [  138.682797]  []
> > drm_atomic_helper_set_config+0x76/0xb0 [drm_kms_helper]
> > [  138.682804]  [] ?
> > drm_modeset_lock_all_ctx+0x9a/0xb0 [drm]
> > [  138.682809]  []
> > drm_mode_set_config_internal+0x62/0x100 [drm]
> > [  138.682814]  []
> > drm_framebuffer_remove+0xe8/0x120 [drm]
> > [  138.682826]  [] intel_fbdev_fini+0x6d/0x90
> > [i915]
> > [  138.682838]  [] i915_driver_unload+0x1a/0x290
> > [i915]
> > [  138.682844]  [] drm_dev_unregister+0x29/0xb0
> > [drm]
> > [  138.682848]  [] drm_put_dev+0x23/0x60 [drm]
> > [  138.682854]  [] i915_pci_remove+0x15/0x20 [i915]
> > [  138.682856]  [] pci_device_remove+0x39/0xc0
> > [  138.682859]  []
> > __device_release_driver+0xa1/0x150
> > [  138.682860]  [] driver_detach+0xa3/0xb0
> > [  138.682862]  [] bus_remove_driver+0x55/0xd0
> > [  138.682864]  [] driver_unregister+0x2c/0x50
> > [  138.682866]  [] pci_unregister_driver+0x21/0x90
> > [  138.682871]  [] drm_pci_exit+0x94/0xb0 [drm]
> > [  138.682883]  [] i915_exit+0x20/0xc1c [i915]
> > 
> > Reported-by: Marius Vlad <marius.c.vlad@intel.com>
> > Signed-off-by: Gabriel Feceoru <gabriel.feceoru@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_dma.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_dma.c
> > b/drivers/gpu/drm/i915/i915_dma.c
> > index 988a380..08ad01f0 100644
> > --- a/drivers/gpu/drm/i915/i915_dma.c
> > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > @@ -1136,6 +1136,8 @@ int i915_driver_unload(struct drm_device
> > *dev)
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> >  	int ret;
> >  
> > +	intel_runtime_pm_get(dev_priv);
> > +
> >  	intel_fbdev_fini(dev);
> >  
> >  	i915_audio_component_cleanup(dev_priv);
> > @@ -1143,6 +1145,7 @@ int i915_driver_unload(struct drm_device
> > *dev)
> >  	ret = i915_gem_suspend(dev);
> >  	if (ret) {
> >  		DRM_ERROR("failed to idle hardware: %d\n", ret);
> > +		intel_runtime_pm_put(dev_priv);
> 
> This should be made into goto construct.
> 
> >  		return ret;
> >  	}
> >  
> > @@ -1221,6 +1224,9 @@ int i915_driver_unload(struct drm_device
> > *dev)
> >  	kmem_cache_destroy(dev_priv->vmas);
> >  	kmem_cache_destroy(dev_priv->objects);
> >  	pci_dev_put(dev_priv->bridge_dev);
> > +
> > +	intel_runtime_pm_put(dev_priv);
> > +
> 
> Not sure if we should/can keep the runtime reference until this
> point.
> At worst this could lead into the runtime_pm_put function poking at
> the
> hardware registers after the pci_dev has been released.

Well if you meant the above bridge_dev PCI device it's not relevant as
far as power management goes, we only care about the main PCI device.
That one in turn is guaranteed to exist while we are in the load/unload
functions.

But since this looks like fixing the problem where the HW readout code
in the modeset state checker doesn't hold an RPM reference I would
prefer fixing it there, see the discussion with Chris about it. Also
note that intel_power_domains_fini() will already ensure that an RPM
reference is held for the rest of the function, if we really needed we
could bring that one earlier, but I think the state checker fix would
be enough.

> 
> Also if we change the hangcheck task to execute depending on the
> runtime_pm count, this will surely cause trouble. Added Imre as CC to
> comment on this.
> 
> >  	kfree(dev_priv);
> > 
> > 	return 0;
> 
> Insert goto label around here and make it "return ret;".
> 
> Regards, Joonas
> 
> >
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 988a380..08ad01f0 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1136,6 +1136,8 @@  int i915_driver_unload(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int ret;
 
+	intel_runtime_pm_get(dev_priv);
+
 	intel_fbdev_fini(dev);
 
 	i915_audio_component_cleanup(dev_priv);
@@ -1143,6 +1145,7 @@  int i915_driver_unload(struct drm_device *dev)
 	ret = i915_gem_suspend(dev);
 	if (ret) {
 		DRM_ERROR("failed to idle hardware: %d\n", ret);
+		intel_runtime_pm_put(dev_priv);
 		return ret;
 	}
 
@@ -1221,6 +1224,9 @@  int i915_driver_unload(struct drm_device *dev)
 	kmem_cache_destroy(dev_priv->vmas);
 	kmem_cache_destroy(dev_priv->objects);
 	pci_dev_put(dev_priv->bridge_dev);
+
+	intel_runtime_pm_put(dev_priv);
+
 	kfree(dev_priv);
 
 	return 0;