diff mbox series

[5.12,regression] ttm->pages NULL dereference with radeon driver

Message ID s5ho8dmh98p.wl-tiwai@suse.de (mailing list archive)
State New, archived
Headers show
Series [5.12,regression] ttm->pages NULL dereference with radeon driver | expand

Commit Message

Takashi Iwai May 7, 2021, 3:08 p.m. UTC
Hi,

we've received a regression report showing NULL dereference Oops with
radeon driver on 5.12 kernel:
  https://bugzilla.opensuse.org/show_bug.cgi?id=1185516

It turned out that the recent TTM cleanup / refactoring via commit
0575ff3d33cd ("drm/radeon: stop using pages with
drm_prime_sg_to_page_addr_arrays v2") is the culprit.  On 5.12 kernel,
ttm->pages is no longer allocated / set up, while the radeon driver
still has a few places assuming the valid ttm->pages, and for the
reporter (running the modesettig driver), radeon_gart_bind() hits the
problem.

A hackish patch below was confirmed to work, at least, but obviously
we need a proper fix.

Could you take a look at it?


thanks,

Takashi

Comments

Christian König May 7, 2021, 3:11 p.m. UTC | #1
Hi Takashi,

Am 07.05.21 um 17:08 schrieb Takashi Iwai:
> Hi,
>
> we've received a regression report showing NULL dereference Oops with
> radeon driver on 5.12 kernel:
>    https://bugzilla.opensuse.org/show_bug.cgi?id=1185516
>
> It turned out that the recent TTM cleanup / refactoring via commit
> 0575ff3d33cd ("drm/radeon: stop using pages with
> drm_prime_sg_to_page_addr_arrays v2") is the culprit.  On 5.12 kernel,
> ttm->pages is no longer allocated / set up, while the radeon driver
> still has a few places assuming the valid ttm->pages, and for the
> reporter (running the modesettig driver), radeon_gart_bind() hits the
> problem.
>
> A hackish patch below was confirmed to work, at least, but obviously
> we need a proper fix.
>
> Could you take a look at it?

If that's all then that looks trivial to me.

Going to provide a patch on Monday.

Thanks for the notice,
Christian.

>
>
> thanks,
>
> Takashi
>
> --- a/drivers/gpu/drm/radeon/radeon_gart.c
> +++ b/drivers/gpu/drm/radeon/radeon_gart.c
> @@ -253,7 +253,7 @@ void radeon_gart_unbind(struct radeon_de
>   	t = offset / RADEON_GPU_PAGE_SIZE;
>   	p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE);
>   	for (i = 0; i < pages; i++, p++) {
> -		if (rdev->gart.pages[p]) {
> +		if (1 /*rdev->gart.pages[p]*/) {
>   			rdev->gart.pages[p] = NULL;
>   			for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); j++, t++) {
>   				rdev->gart.pages_entry[t] = rdev->dummy_page.entry;
> @@ -301,7 +301,7 @@ int radeon_gart_bind(struct radeon_devic
>   	p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE);
>   
>   	for (i = 0; i < pages; i++, p++) {
> -		rdev->gart.pages[p] = pagelist[i];
> +		/* rdev->gart.pages[p] = pagelist[i]; */
>   		page_base = dma_addr[i];
>   		for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); j++, t++) {
>   			page_entry = radeon_gart_get_page_entry(page_base, flags);
> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> @@ -360,6 +360,8 @@ static int radeon_ttm_tt_pin_userptr(str
>   
>   	if (current->mm != gtt->usermm)
>   		return -EPERM;
> +	if (!ttm->pages)
> +		return -EPERM;
>   
>   	if (gtt->userflags & RADEON_GEM_USERPTR_ANONONLY) {
>   		/* check that we only pin down anonymous memory
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Christian König May 12, 2021, 10:20 a.m. UTC | #2
Hi guys,

adding a few people who ran into the problem as well and opened a kernel 
bug.

Am 07.05.21 um 17:11 schrieb Christian König:
> Hi Takashi,
>
> Am 07.05.21 um 17:08 schrieb Takashi Iwai:
>> Hi,
>>
>> we've received a regression report showing NULL dereference Oops with
>> radeon driver on 5.12 kernel:
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.opensuse.org%2Fshow_bug.cgi%3Fid%3D1185516&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C64447e9e97604aaf117008d9116a742a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637559971181187178%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1LJMLxuuMjkfgnIt%2B5Z5n19ri%2BMTLMQvEEB%2F%2Fd6SVkg%3D&amp;reserved=0
>>
>> It turned out that the recent TTM cleanup / refactoring via commit
>> 0575ff3d33cd ("drm/radeon: stop using pages with
>> drm_prime_sg_to_page_addr_arrays v2") is the culprit.  On 5.12 kernel,
>> ttm->pages is no longer allocated / set up, while the radeon driver
>> still has a few places assuming the valid ttm->pages, and for the
>> reporter (running the modesettig driver), radeon_gart_bind() hits the
>> problem.
>>
>> A hackish patch below was confirmed to work, at least, but obviously
>> we need a proper fix.
>>
>> Could you take a look at it?
>
> If that's all then that looks trivial to me.
>
> Going to provide a patch on Monday.

Sorry, was a busy start into the week. I've just send a patch which 
should address the issue to you and the mailing list.

Please test since I can't reproduce the problem locally.

Thanks,
Christian.

>
> Thanks for the notice,
> Christian.
>
>>
>>
>> thanks,
>>
>> Takashi
>>
>> --- a/drivers/gpu/drm/radeon/radeon_gart.c
>> +++ b/drivers/gpu/drm/radeon/radeon_gart.c
>> @@ -253,7 +253,7 @@ void radeon_gart_unbind(struct radeon_de
>>       t = offset / RADEON_GPU_PAGE_SIZE;
>>       p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE);
>>       for (i = 0; i < pages; i++, p++) {
>> -        if (rdev->gart.pages[p]) {
>> +        if (1 /*rdev->gart.pages[p]*/) {
>>               rdev->gart.pages[p] = NULL;
>>               for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); 
>> j++, t++) {
>>                   rdev->gart.pages_entry[t] = rdev->dummy_page.entry;
>> @@ -301,7 +301,7 @@ int radeon_gart_bind(struct radeon_devic
>>       p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE);
>>         for (i = 0; i < pages; i++, p++) {
>> -        rdev->gart.pages[p] = pagelist[i];
>> +        /* rdev->gart.pages[p] = pagelist[i]; */
>>           page_base = dma_addr[i];
>>           for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); j++, 
>> t++) {
>>               page_entry = radeon_gart_get_page_entry(page_base, flags);
>> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
>> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
>> @@ -360,6 +360,8 @@ static int radeon_ttm_tt_pin_userptr(str
>>         if (current->mm != gtt->usermm)
>>           return -EPERM;
>> +    if (!ttm->pages)
>> +        return -EPERM;
>>         if (gtt->userflags & RADEON_GEM_USERPTR_ANONONLY) {
>>           /* check that we only pin down anonymous memory
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C64447e9e97604aaf117008d9116a742a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637559971181187178%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=RdzGp1FLxBKE%2ByPclrUBfQomRU0pQT%2F78Ewcj%2FBZ73g%3D&amp;reserved=0 
>>
>
Dennis Foster May 12, 2021, 12:21 p.m. UTC | #3
Hi Christian,

I can confirm that the patch provided fixes the issue in kernel version 5.12.2
Thank you.


On Wed, May 12, 2021 at 6:21 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Hi guys,
>
> adding a few people who ran into the problem as well and opened a kernel
> bug.
>
> Am 07.05.21 um 17:11 schrieb Christian König:
> > Hi Takashi,
> >
> > Am 07.05.21 um 17:08 schrieb Takashi Iwai:
> >> Hi,
> >>
> >> we've received a regression report showing NULL dereference Oops with
> >> radeon driver on 5.12 kernel:
> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.opensuse.org%2Fshow_bug.cgi%3Fid%3D1185516&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C64447e9e97604aaf117008d9116a742a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637559971181187178%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1LJMLxuuMjkfgnIt%2B5Z5n19ri%2BMTLMQvEEB%2F%2Fd6SVkg%3D&amp;reserved=0
> >>
> >> It turned out that the recent TTM cleanup / refactoring via commit
> >> 0575ff3d33cd ("drm/radeon: stop using pages with
> >> drm_prime_sg_to_page_addr_arrays v2") is the culprit.  On 5.12 kernel,
> >> ttm->pages is no longer allocated / set up, while the radeon driver
> >> still has a few places assuming the valid ttm->pages, and for the
> >> reporter (running the modesettig driver), radeon_gart_bind() hits the
> >> problem.
> >>
> >> A hackish patch below was confirmed to work, at least, but obviously
> >> we need a proper fix.
> >>
> >> Could you take a look at it?
> >
> > If that's all then that looks trivial to me.
> >
> > Going to provide a patch on Monday.
>
> Sorry, was a busy start into the week. I've just send a patch which
> should address the issue to you and the mailing list.
>
> Please test since I can't reproduce the problem locally.
>
> Thanks,
> Christian.
>
> >
> > Thanks for the notice,
> > Christian.
> >
> >>
> >>
> >> thanks,
> >>
> >> Takashi
> >>
> >> --- a/drivers/gpu/drm/radeon/radeon_gart.c
> >> +++ b/drivers/gpu/drm/radeon/radeon_gart.c
> >> @@ -253,7 +253,7 @@ void radeon_gart_unbind(struct radeon_de
> >>       t = offset / RADEON_GPU_PAGE_SIZE;
> >>       p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE);
> >>       for (i = 0; i < pages; i++, p++) {
> >> -        if (rdev->gart.pages[p]) {
> >> +        if (1 /*rdev->gart.pages[p]*/) {
> >>               rdev->gart.pages[p] = NULL;
> >>               for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE);
> >> j++, t++) {
> >>                   rdev->gart.pages_entry[t] = rdev->dummy_page.entry;
> >> @@ -301,7 +301,7 @@ int radeon_gart_bind(struct radeon_devic
> >>       p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE);
> >>         for (i = 0; i < pages; i++, p++) {
> >> -        rdev->gart.pages[p] = pagelist[i];
> >> +        /* rdev->gart.pages[p] = pagelist[i]; */
> >>           page_base = dma_addr[i];
> >>           for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); j++,
> >> t++) {
> >>               page_entry = radeon_gart_get_page_entry(page_base, flags);
> >> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> >> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> >> @@ -360,6 +360,8 @@ static int radeon_ttm_tt_pin_userptr(str
> >>         if (current->mm != gtt->usermm)
> >>           return -EPERM;
> >> +    if (!ttm->pages)
> >> +        return -EPERM;
> >>         if (gtt->userflags & RADEON_GEM_USERPTR_ANONONLY) {
> >>           /* check that we only pin down anonymous memory
> >> _______________________________________________
> >> amd-gfx mailing list
> >> amd-gfx@lists.freedesktop.org
> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C64447e9e97604aaf117008d9116a742a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637559971181187178%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=RdzGp1FLxBKE%2ByPclrUBfQomRU0pQT%2F78Ewcj%2FBZ73g%3D&amp;reserved=0
> >>
> >
diff mbox series

Patch

--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -253,7 +253,7 @@  void radeon_gart_unbind(struct radeon_de
 	t = offset / RADEON_GPU_PAGE_SIZE;
 	p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE);
 	for (i = 0; i < pages; i++, p++) {
-		if (rdev->gart.pages[p]) {
+		if (1 /*rdev->gart.pages[p]*/) {
 			rdev->gart.pages[p] = NULL;
 			for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); j++, t++) {
 				rdev->gart.pages_entry[t] = rdev->dummy_page.entry;
@@ -301,7 +301,7 @@  int radeon_gart_bind(struct radeon_devic
 	p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE);
 
 	for (i = 0; i < pages; i++, p++) {
-		rdev->gart.pages[p] = pagelist[i];
+		/* rdev->gart.pages[p] = pagelist[i]; */
 		page_base = dma_addr[i];
 		for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); j++, t++) {
 			page_entry = radeon_gart_get_page_entry(page_base, flags);
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -360,6 +360,8 @@  static int radeon_ttm_tt_pin_userptr(str
 
 	if (current->mm != gtt->usermm)
 		return -EPERM;
+	if (!ttm->pages)
+		return -EPERM;
 
 	if (gtt->userflags & RADEON_GEM_USERPTR_ANONONLY) {
 		/* check that we only pin down anonymous memory