diff mbox

drm/i915: Don't leak command parser tables on suspend/resume

Message ID 1411399521-2752-1-git-send-email-bradley.d.volkin@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

bradley.d.volkin@intel.com Sept. 22, 2014, 3:25 p.m. UTC
From: Brad Volkin <bradley.d.volkin@intel.com>

Ring init and cleanup are not balanced because we re-init the rings on
resume without having cleaned them up on suspend. This leads to the
driver leaking the parser's hash tables with a kmemleak signature such
as this:

unreferenced object 0xffff880405960980 (size 32):
  comm "systemd-udevd", pid 516, jiffies 4294896961 (age 10202.044s)
  hex dump (first 32 bytes):
    d0 85 46 c0 ff ff ff ff 00 00 00 00 00 00 00 00  ..F.............
    98 60 28 04 04 88 ff ff 00 00 00 00 00 00 00 00  .`(.............
  backtrace:
    [<ffffffff81816f9e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff811fa678>] kmem_cache_alloc_trace+0x168/0x2f0
    [<ffffffffc03e20a5>] i915_cmd_parser_init_ring+0x2a5/0x3e0 [i915]
    [<ffffffffc04088a2>] intel_init_ring_buffer+0x202/0x470 [i915]
    [<ffffffffc040c998>] intel_init_vebox_ring_buffer+0x1e8/0x2b0 [i915]
    [<ffffffffc03eff59>] i915_gem_init_hw+0x2f9/0x3a0 [i915]
    [<ffffffffc03f0057>] i915_gem_init+0x57/0x1d0 [i915]
    [<ffffffffc045e26a>] i915_driver_load+0xc0a/0x10e0 [i915]
    [<ffffffffc02e0d5d>] drm_dev_register+0xad/0x100 [drm]
    [<ffffffffc02e3b9f>] drm_get_pci_dev+0x8f/0x200 [drm]
    [<ffffffffc03c934b>] i915_pci_probe+0x3b/0x60 [i915]
    [<ffffffff81436725>] local_pci_probe+0x45/0xa0
    [<ffffffff81437a69>] pci_device_probe+0xd9/0x130
    [<ffffffff81524f4d>] driver_probe_device+0x12d/0x3e0
    [<ffffffff815252d3>] __driver_attach+0x93/0xa0
    [<ffffffff81522e1b>] bus_for_each_dev+0x6b/0xb0

This patch extends the current convention of checking whether a
resource is already allocated before allocating it during ring init.
Longer term it might make sense to only init the rings once.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83794
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---

According to the report in bugzilla, this happens in linux-next-20140919 as
well. I'm not sure what the path is for getting the fix there in addition to
nightly.

 drivers/gpu/drm/i915/i915_cmd_parser.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

Comments

Daniel Vetter Sept. 23, 2014, 8:34 a.m. UTC | #1
On Mon, Sep 22, 2014 at 08:25:21AM -0700, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
> 
> Ring init and cleanup are not balanced because we re-init the rings on
> resume without having cleaned them up on suspend. This leads to the
> driver leaking the parser's hash tables with a kmemleak signature such
> as this:
> 
> unreferenced object 0xffff880405960980 (size 32):
>   comm "systemd-udevd", pid 516, jiffies 4294896961 (age 10202.044s)
>   hex dump (first 32 bytes):
>     d0 85 46 c0 ff ff ff ff 00 00 00 00 00 00 00 00  ..F.............
>     98 60 28 04 04 88 ff ff 00 00 00 00 00 00 00 00  .`(.............
>   backtrace:
>     [<ffffffff81816f9e>] kmemleak_alloc+0x4e/0xb0
>     [<ffffffff811fa678>] kmem_cache_alloc_trace+0x168/0x2f0
>     [<ffffffffc03e20a5>] i915_cmd_parser_init_ring+0x2a5/0x3e0 [i915]
>     [<ffffffffc04088a2>] intel_init_ring_buffer+0x202/0x470 [i915]
>     [<ffffffffc040c998>] intel_init_vebox_ring_buffer+0x1e8/0x2b0 [i915]
>     [<ffffffffc03eff59>] i915_gem_init_hw+0x2f9/0x3a0 [i915]
>     [<ffffffffc03f0057>] i915_gem_init+0x57/0x1d0 [i915]
>     [<ffffffffc045e26a>] i915_driver_load+0xc0a/0x10e0 [i915]
>     [<ffffffffc02e0d5d>] drm_dev_register+0xad/0x100 [drm]
>     [<ffffffffc02e3b9f>] drm_get_pci_dev+0x8f/0x200 [drm]
>     [<ffffffffc03c934b>] i915_pci_probe+0x3b/0x60 [i915]
>     [<ffffffff81436725>] local_pci_probe+0x45/0xa0
>     [<ffffffff81437a69>] pci_device_probe+0xd9/0x130
>     [<ffffffff81524f4d>] driver_probe_device+0x12d/0x3e0
>     [<ffffffff815252d3>] __driver_attach+0x93/0xa0
>     [<ffffffff81522e1b>] bus_for_each_dev+0x6b/0xb0
> 
> This patch extends the current convention of checking whether a
> resource is already allocated before allocating it during ring init.
> Longer term it might make sense to only init the rings once.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83794
> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>

We should finally get around to split the ring init code properly into
software init in _init functions (done once) and _init_hw functions for
rpm/resume ... Anyway, this looks sane.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org

> ---
> 
> According to the report in bugzilla, this happens in linux-next-20140919 as
> well. I'm not sure what the path is for getting the fix there in addition to
> nightly.
> 
>  drivers/gpu/drm/i915/i915_cmd_parser.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index 4c35e2a..86b3ae0 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -709,11 +709,13 @@ int i915_cmd_parser_init_ring(struct intel_engine_cs *ring)
>  	BUG_ON(!validate_cmds_sorted(ring, cmd_tables, cmd_table_count));
>  	BUG_ON(!validate_regs_sorted(ring));
>  
> -	ret = init_hash_table(ring, cmd_tables, cmd_table_count);
> -	if (ret) {
> -		DRM_ERROR("CMD: cmd_parser_init failed!\n");
> -		fini_hash_table(ring);
> -		return ret;
> +	if (hash_empty(ring->cmd_hash)) {
> +		ret = init_hash_table(ring, cmd_tables, cmd_table_count);
> +		if (ret) {
> +			DRM_ERROR("CMD: cmd_parser_init failed!\n");
> +			fini_hash_table(ring);
> +			return ret;
> +		}
>  	}
>  
>  	ring->needs_cmd_parser = true;
> -- 
> 1.8.3.2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson Sept. 23, 2014, 9:14 a.m. UTC | #2
On Tue, Sep 23, 2014 at 10:34:46AM +0200, Daniel Vetter wrote:
> On Mon, Sep 22, 2014 at 08:25:21AM -0700, bradley.d.volkin@intel.com wrote:
> > From: Brad Volkin <bradley.d.volkin@intel.com>
> > 
> > Ring init and cleanup are not balanced because we re-init the rings on
> > resume without having cleaned them up on suspend. This leads to the
> > driver leaking the parser's hash tables with a kmemleak signature such
> > as this:
> > 
> > unreferenced object 0xffff880405960980 (size 32):
> >   comm "systemd-udevd", pid 516, jiffies 4294896961 (age 10202.044s)
> >   hex dump (first 32 bytes):
> >     d0 85 46 c0 ff ff ff ff 00 00 00 00 00 00 00 00  ..F.............
> >     98 60 28 04 04 88 ff ff 00 00 00 00 00 00 00 00  .`(.............
> >   backtrace:
> >     [<ffffffff81816f9e>] kmemleak_alloc+0x4e/0xb0
> >     [<ffffffff811fa678>] kmem_cache_alloc_trace+0x168/0x2f0
> >     [<ffffffffc03e20a5>] i915_cmd_parser_init_ring+0x2a5/0x3e0 [i915]
> >     [<ffffffffc04088a2>] intel_init_ring_buffer+0x202/0x470 [i915]
> >     [<ffffffffc040c998>] intel_init_vebox_ring_buffer+0x1e8/0x2b0 [i915]
> >     [<ffffffffc03eff59>] i915_gem_init_hw+0x2f9/0x3a0 [i915]
> >     [<ffffffffc03f0057>] i915_gem_init+0x57/0x1d0 [i915]
> >     [<ffffffffc045e26a>] i915_driver_load+0xc0a/0x10e0 [i915]
> >     [<ffffffffc02e0d5d>] drm_dev_register+0xad/0x100 [drm]
> >     [<ffffffffc02e3b9f>] drm_get_pci_dev+0x8f/0x200 [drm]
> >     [<ffffffffc03c934b>] i915_pci_probe+0x3b/0x60 [i915]
> >     [<ffffffff81436725>] local_pci_probe+0x45/0xa0
> >     [<ffffffff81437a69>] pci_device_probe+0xd9/0x130
> >     [<ffffffff81524f4d>] driver_probe_device+0x12d/0x3e0
> >     [<ffffffff815252d3>] __driver_attach+0x93/0xa0
> >     [<ffffffff81522e1b>] bus_for_each_dev+0x6b/0xb0
> > 
> > This patch extends the current convention of checking whether a
> > resource is already allocated before allocating it during ring init.
> > Longer term it might make sense to only init the rings once.
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83794
> > Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> 
> We should finally get around to split the ring init code properly into
> software init in _init functions (done once) and _init_hw functions for
> rpm/resume ... Anyway, this looks sane.

Maybe someone already sent patches to do so.
-Chris
Daniel Vetter Sept. 23, 2014, 11:26 a.m. UTC | #3
On Tue, Sep 23, 2014 at 10:14:33AM +0100, Chris Wilson wrote:
> On Tue, Sep 23, 2014 at 10:34:46AM +0200, Daniel Vetter wrote:
> > On Mon, Sep 22, 2014 at 08:25:21AM -0700, bradley.d.volkin@intel.com wrote:
> > > From: Brad Volkin <bradley.d.volkin@intel.com>
> > > 
> > > Ring init and cleanup are not balanced because we re-init the rings on
> > > resume without having cleaned them up on suspend. This leads to the
> > > driver leaking the parser's hash tables with a kmemleak signature such
> > > as this:
> > > 
> > > unreferenced object 0xffff880405960980 (size 32):
> > >   comm "systemd-udevd", pid 516, jiffies 4294896961 (age 10202.044s)
> > >   hex dump (first 32 bytes):
> > >     d0 85 46 c0 ff ff ff ff 00 00 00 00 00 00 00 00  ..F.............
> > >     98 60 28 04 04 88 ff ff 00 00 00 00 00 00 00 00  .`(.............
> > >   backtrace:
> > >     [<ffffffff81816f9e>] kmemleak_alloc+0x4e/0xb0
> > >     [<ffffffff811fa678>] kmem_cache_alloc_trace+0x168/0x2f0
> > >     [<ffffffffc03e20a5>] i915_cmd_parser_init_ring+0x2a5/0x3e0 [i915]
> > >     [<ffffffffc04088a2>] intel_init_ring_buffer+0x202/0x470 [i915]
> > >     [<ffffffffc040c998>] intel_init_vebox_ring_buffer+0x1e8/0x2b0 [i915]
> > >     [<ffffffffc03eff59>] i915_gem_init_hw+0x2f9/0x3a0 [i915]
> > >     [<ffffffffc03f0057>] i915_gem_init+0x57/0x1d0 [i915]
> > >     [<ffffffffc045e26a>] i915_driver_load+0xc0a/0x10e0 [i915]
> > >     [<ffffffffc02e0d5d>] drm_dev_register+0xad/0x100 [drm]
> > >     [<ffffffffc02e3b9f>] drm_get_pci_dev+0x8f/0x200 [drm]
> > >     [<ffffffffc03c934b>] i915_pci_probe+0x3b/0x60 [i915]
> > >     [<ffffffff81436725>] local_pci_probe+0x45/0xa0
> > >     [<ffffffff81437a69>] pci_device_probe+0xd9/0x130
> > >     [<ffffffff81524f4d>] driver_probe_device+0x12d/0x3e0
> > >     [<ffffffff815252d3>] __driver_attach+0x93/0xa0
> > >     [<ffffffff81522e1b>] bus_for_each_dev+0x6b/0xb0
> > > 
> > > This patch extends the current convention of checking whether a
> > > resource is already allocated before allocating it during ring init.
> > > Longer term it might make sense to only init the rings once.
> > > 
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83794
> > > Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> > 
> > We should finally get around to split the ring init code properly into
> > software init in _init functions (done once) and _init_hw functions for
> > rpm/resume ... Anyway, this looks sane.
> 
> Maybe someone already sent patches to do so.

Hm, I've thought I've merge the ones that avoid the re-init, but those
didn't go the full path to split the init code. Which ones am I still
missing?
-Daniel
Jani Nikula Sept. 23, 2014, 12:57 p.m. UTC | #4
On Tue, 23 Sep 2014, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Mon, Sep 22, 2014 at 08:25:21AM -0700, bradley.d.volkin@intel.com wrote:
>> From: Brad Volkin <bradley.d.volkin@intel.com>
>> 
>> Ring init and cleanup are not balanced because we re-init the rings on
>> resume without having cleaned them up on suspend. This leads to the
>> driver leaking the parser's hash tables with a kmemleak signature such
>> as this:
>> 
>> unreferenced object 0xffff880405960980 (size 32):
>>   comm "systemd-udevd", pid 516, jiffies 4294896961 (age 10202.044s)
>>   hex dump (first 32 bytes):
>>     d0 85 46 c0 ff ff ff ff 00 00 00 00 00 00 00 00  ..F.............
>>     98 60 28 04 04 88 ff ff 00 00 00 00 00 00 00 00  .`(.............
>>   backtrace:
>>     [<ffffffff81816f9e>] kmemleak_alloc+0x4e/0xb0
>>     [<ffffffff811fa678>] kmem_cache_alloc_trace+0x168/0x2f0
>>     [<ffffffffc03e20a5>] i915_cmd_parser_init_ring+0x2a5/0x3e0 [i915]
>>     [<ffffffffc04088a2>] intel_init_ring_buffer+0x202/0x470 [i915]
>>     [<ffffffffc040c998>] intel_init_vebox_ring_buffer+0x1e8/0x2b0 [i915]
>>     [<ffffffffc03eff59>] i915_gem_init_hw+0x2f9/0x3a0 [i915]
>>     [<ffffffffc03f0057>] i915_gem_init+0x57/0x1d0 [i915]
>>     [<ffffffffc045e26a>] i915_driver_load+0xc0a/0x10e0 [i915]
>>     [<ffffffffc02e0d5d>] drm_dev_register+0xad/0x100 [drm]
>>     [<ffffffffc02e3b9f>] drm_get_pci_dev+0x8f/0x200 [drm]
>>     [<ffffffffc03c934b>] i915_pci_probe+0x3b/0x60 [i915]
>>     [<ffffffff81436725>] local_pci_probe+0x45/0xa0
>>     [<ffffffff81437a69>] pci_device_probe+0xd9/0x130
>>     [<ffffffff81524f4d>] driver_probe_device+0x12d/0x3e0
>>     [<ffffffff815252d3>] __driver_attach+0x93/0xa0
>>     [<ffffffff81522e1b>] bus_for_each_dev+0x6b/0xb0
>> 
>> This patch extends the current convention of checking whether a
>> resource is already allocated before allocating it during ring init.
>> Longer term it might make sense to only init the rings once.
>> 
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83794
>> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
>
> We should finally get around to split the ring init code properly into
> software init in _init functions (done once) and _init_hw functions for
> rpm/resume ... Anyway, this looks sane.
>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: stable@vger.kernel.org

Pushed to drm-intel-fixes, thanks for the patch and review.

BR,
Jani.


>
>> ---
>> 
>> According to the report in bugzilla, this happens in linux-next-20140919 as
>> well. I'm not sure what the path is for getting the fix there in addition to
>> nightly.
>> 
>>  drivers/gpu/drm/i915/i915_cmd_parser.c | 12 +++++++-----
>>  1 file changed, 7 insertions(+), 5 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
>> index 4c35e2a..86b3ae0 100644
>> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
>> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
>> @@ -709,11 +709,13 @@ int i915_cmd_parser_init_ring(struct intel_engine_cs *ring)
>>  	BUG_ON(!validate_cmds_sorted(ring, cmd_tables, cmd_table_count));
>>  	BUG_ON(!validate_regs_sorted(ring));
>>  
>> -	ret = init_hash_table(ring, cmd_tables, cmd_table_count);
>> -	if (ret) {
>> -		DRM_ERROR("CMD: cmd_parser_init failed!\n");
>> -		fini_hash_table(ring);
>> -		return ret;
>> +	if (hash_empty(ring->cmd_hash)) {
>> +		ret = init_hash_table(ring, cmd_tables, cmd_table_count);
>> +		if (ret) {
>> +			DRM_ERROR("CMD: cmd_parser_init failed!\n");
>> +			fini_hash_table(ring);
>> +			return ret;
>> +		}
>>  	}
>>  
>>  	ring->needs_cmd_parser = true;
>> -- 
>> 1.8.3.2
>> 
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 4c35e2a..86b3ae0 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -709,11 +709,13 @@  int i915_cmd_parser_init_ring(struct intel_engine_cs *ring)
 	BUG_ON(!validate_cmds_sorted(ring, cmd_tables, cmd_table_count));
 	BUG_ON(!validate_regs_sorted(ring));
 
-	ret = init_hash_table(ring, cmd_tables, cmd_table_count);
-	if (ret) {
-		DRM_ERROR("CMD: cmd_parser_init failed!\n");
-		fini_hash_table(ring);
-		return ret;
+	if (hash_empty(ring->cmd_hash)) {
+		ret = init_hash_table(ring, cmd_tables, cmd_table_count);
+		if (ret) {
+			DRM_ERROR("CMD: cmd_parser_init failed!\n");
+			fini_hash_table(ring);
+			return ret;
+		}
 	}
 
 	ring->needs_cmd_parser = true;