diff mbox series

drm/ttm: Fix an invalid freeing on already freed page in error path

Message ID 20240221073324.3303-1-thomas.hellstrom@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series drm/ttm: Fix an invalid freeing on already freed page in error path | expand

Commit Message

Thomas Hellstrom Feb. 21, 2024, 7:33 a.m. UTC
If caching mode change fails due to, for example, OOM we
free the allocated pages in a two-step process. First the pages
for which the caching change has already succeeded. Secondly
the pages for which a caching change did not succeed.

However the second step was incorrectly freeing the pages already
freed in the first step.

Fix.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Fixes: 379989e7cbdc ("drm/ttm/pool: Fix ttm_pool_alloc error path")
Cc: Christian König <christian.koenig@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v6.4+
---
 drivers/gpu/drm/ttm/ttm_pool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Matthew Auld Feb. 21, 2024, 9:21 a.m. UTC | #1
On 21/02/2024 07:33, Thomas Hellström wrote:
> If caching mode change fails due to, for example, OOM we
> free the allocated pages in a two-step process. First the pages
> for which the caching change has already succeeded. Secondly
> the pages for which a caching change did not succeed.
> 
> However the second step was incorrectly freeing the pages already
> freed in the first step.
> 
> Fix.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Fixes: 379989e7cbdc ("drm/ttm/pool: Fix ttm_pool_alloc error path")
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Huang Rui <ray.huang@amd.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: <stable@vger.kernel.org> # v6.4+
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Christian König Feb. 21, 2024, 10:26 a.m. UTC | #2
Am 21.02.24 um 08:33 schrieb Thomas Hellström:
> If caching mode change fails due to, for example, OOM we
> free the allocated pages in a two-step process. First the pages
> for which the caching change has already succeeded. Secondly
> the pages for which a caching change did not succeed.
>
> However the second step was incorrectly freeing the pages already
> freed in the first step.
>
> Fix.
>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Fixes: 379989e7cbdc ("drm/ttm/pool: Fix ttm_pool_alloc error path")
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Huang Rui <ray.huang@amd.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: <stable@vger.kernel.org> # v6.4+

You don't know how much time I've spend staring at this line to find the 
bug in it and haven't seen it. Got bug reports about that for month as well.

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/ttm/ttm_pool.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> index b62f420a9f96..112438d965ff 100644
> --- a/drivers/gpu/drm/ttm/ttm_pool.c
> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> @@ -387,7 +387,7 @@ static void ttm_pool_free_range(struct ttm_pool *pool, struct ttm_tt *tt,
>   				enum ttm_caching caching,
>   				pgoff_t start_page, pgoff_t end_page)
>   {
> -	struct page **pages = tt->pages;
> +	struct page **pages = &tt->pages[start_page];
>   	unsigned int order;
>   	pgoff_t i, nr;
>
Thomas Hellstrom Feb. 22, 2024, 7:34 a.m. UTC | #3
On Wed, 2024-02-21 at 11:26 +0100, Christian König wrote:
> Am 21.02.24 um 08:33 schrieb Thomas Hellström:
> > If caching mode change fails due to, for example, OOM we
> > free the allocated pages in a two-step process. First the pages
> > for which the caching change has already succeeded. Secondly
> > the pages for which a caching change did not succeed.
> > 
> > However the second step was incorrectly freeing the pages already
> > freed in the first step.
> > 
> > Fix.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > Fixes: 379989e7cbdc ("drm/ttm/pool: Fix ttm_pool_alloc error path")
> > Cc: Christian König <christian.koenig@amd.com>
> > Cc: Dave Airlie <airlied@redhat.com>
> > Cc: Christian Koenig <christian.koenig@amd.com>
> > Cc: Huang Rui <ray.huang@amd.com>
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: <stable@vger.kernel.org> # v6.4+
> 
> You don't know how much time I've spend staring at this line to find
> the 
> bug in it and haven't seen it. Got bug reports about that for month
> as well.


Yeah, sorry about that. We should probably have Kunit tests exercising
OOM in the pool code involving WC pages.

I'll push this to drm-misc-next.

/Thomas

> 
> Reviewed-by: Christian König <christian.koenig@amd.com>
> 
> > ---
> >   drivers/gpu/drm/ttm/ttm_pool.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
> > b/drivers/gpu/drm/ttm/ttm_pool.c
> > index b62f420a9f96..112438d965ff 100644
> > --- a/drivers/gpu/drm/ttm/ttm_pool.c
> > +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> > @@ -387,7 +387,7 @@ static void ttm_pool_free_range(struct ttm_pool
> > *pool, struct ttm_tt *tt,
> >   				enum ttm_caching caching,
> >   				pgoff_t start_page, pgoff_t
> > end_page)
> >   {
> > -	struct page **pages = tt->pages;
> > +	struct page **pages = &tt->pages[start_page];
> >   	unsigned int order;
> >   	pgoff_t i, nr;
> >   
>
Thomas Hellstrom Feb. 22, 2024, 8:33 a.m. UTC | #4
On Thu, 2024-02-22 at 08:34 +0100, Thomas Hellström wrote:
> On Wed, 2024-02-21 at 11:26 +0100, Christian König wrote:
> > Am 21.02.24 um 08:33 schrieb Thomas Hellström:
> > > If caching mode change fails due to, for example, OOM we
> > > free the allocated pages in a two-step process. First the pages
> > > for which the caching change has already succeeded. Secondly
> > > the pages for which a caching change did not succeed.
> > > 
> > > However the second step was incorrectly freeing the pages already
> > > freed in the first step.
> > > 
> > > Fix.
> > > 
> > > Signed-off-by: Thomas Hellström
> > > <thomas.hellstrom@linux.intel.com>
> > > Fixes: 379989e7cbdc ("drm/ttm/pool: Fix ttm_pool_alloc error
> > > path")
> > > Cc: Christian König <christian.koenig@amd.com>
> > > Cc: Dave Airlie <airlied@redhat.com>
> > > Cc: Christian Koenig <christian.koenig@amd.com>
> > > Cc: Huang Rui <ray.huang@amd.com>
> > > Cc: dri-devel@lists.freedesktop.org
> > > Cc: <stable@vger.kernel.org> # v6.4+
> > 
> > You don't know how much time I've spend staring at this line to
> > find
> > the 
> > bug in it and haven't seen it. Got bug reports about that for month
> > as well.
> 
> 
> Yeah, sorry about that. We should probably have Kunit tests
> exercising
> OOM in the pool code involving WC pages.
> 
> I'll push this to drm-misc-next.

drm-misc-fixes..
/Thomas


> 
> /Thomas
> 
> > 
> > Reviewed-by: Christian König <christian.koenig@amd.com>
> > 
> > > ---
> > >   drivers/gpu/drm/ttm/ttm_pool.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
> > > b/drivers/gpu/drm/ttm/ttm_pool.c
> > > index b62f420a9f96..112438d965ff 100644
> > > --- a/drivers/gpu/drm/ttm/ttm_pool.c
> > > +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> > > @@ -387,7 +387,7 @@ static void ttm_pool_free_range(struct
> > > ttm_pool
> > > *pool, struct ttm_tt *tt,
> > >   				enum ttm_caching caching,
> > >   				pgoff_t start_page, pgoff_t
> > > end_page)
> > >   {
> > > -	struct page **pages = tt->pages;
> > > +	struct page **pages = &tt->pages[start_page];
> > >   	unsigned int order;
> > >   	pgoff_t i, nr;
> > >   
> > 
>
Christian König Feb. 22, 2024, 9:47 a.m. UTC | #5
Am 22.02.24 um 08:34 schrieb Thomas Hellström:
> On Wed, 2024-02-21 at 11:26 +0100, Christian König wrote:
>> Am 21.02.24 um 08:33 schrieb Thomas Hellström:
>>> If caching mode change fails due to, for example, OOM we
>>> free the allocated pages in a two-step process. First the pages
>>> for which the caching change has already succeeded. Secondly
>>> the pages for which a caching change did not succeed.
>>>
>>> However the second step was incorrectly freeing the pages already
>>> freed in the first step.
>>>
>>> Fix.
>>>
>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> Fixes: 379989e7cbdc ("drm/ttm/pool: Fix ttm_pool_alloc error path")
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Cc: Dave Airlie <airlied@redhat.com>
>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>> Cc: Huang Rui <ray.huang@amd.com>
>>> Cc: dri-devel@lists.freedesktop.org
>>> Cc: <stable@vger.kernel.org> # v6.4+
>> You don't know how much time I've spend staring at this line to find
>> the
>> bug in it and haven't seen it. Got bug reports about that for month
>> as well.
>
> Yeah, sorry about that. We should probably have Kunit tests exercising
> OOM in the pool code involving WC pages.
>
> I'll push this to drm-misc-next.

drm-misc-fixes please! That needs to be backported ASAP.

Need to dig up the bug report for this again.

Thanks,
Christian.

>
> /Thomas
>
>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>
>>> ---
>>>    drivers/gpu/drm/ttm/ttm_pool.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
>>> b/drivers/gpu/drm/ttm/ttm_pool.c
>>> index b62f420a9f96..112438d965ff 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_pool.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
>>> @@ -387,7 +387,7 @@ static void ttm_pool_free_range(struct ttm_pool
>>> *pool, struct ttm_tt *tt,
>>>    				enum ttm_caching caching,
>>>    				pgoff_t start_page, pgoff_t
>>> end_page)
>>>    {
>>> -	struct page **pages = tt->pages;
>>> +	struct page **pages = &tt->pages[start_page];
>>>    	unsigned int order;
>>>    	pgoff_t i, nr;
>>>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index b62f420a9f96..112438d965ff 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -387,7 +387,7 @@  static void ttm_pool_free_range(struct ttm_pool *pool, struct ttm_tt *tt,
 				enum ttm_caching caching,
 				pgoff_t start_page, pgoff_t end_page)
 {
-	struct page **pages = tt->pages;
+	struct page **pages = &tt->pages[start_page];
 	unsigned int order;
 	pgoff_t i, nr;