Message ID | s5ho8dmh98p.wl-tiwai@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [5.12,regression] ttm->pages NULL dereference with radeon driver | expand |
Hi Takashi, Am 07.05.21 um 17:08 schrieb Takashi Iwai: > Hi, > > we've received a regression report showing NULL dereference Oops with > radeon driver on 5.12 kernel: > https://bugzilla.opensuse.org/show_bug.cgi?id=1185516 > > It turned out that the recent TTM cleanup / refactoring via commit > 0575ff3d33cd ("drm/radeon: stop using pages with > drm_prime_sg_to_page_addr_arrays v2") is the culprit. On 5.12 kernel, > ttm->pages is no longer allocated / set up, while the radeon driver > still has a few places assuming the valid ttm->pages, and for the > reporter (running the modesettig driver), radeon_gart_bind() hits the > problem. > > A hackish patch below was confirmed to work, at least, but obviously > we need a proper fix. > > Could you take a look at it? If that's all then that looks trivial to me. Going to provide a patch on Monday. Thanks for the notice, Christian. > > > thanks, > > Takashi > > --- a/drivers/gpu/drm/radeon/radeon_gart.c > +++ b/drivers/gpu/drm/radeon/radeon_gart.c > @@ -253,7 +253,7 @@ void radeon_gart_unbind(struct radeon_de > t = offset / RADEON_GPU_PAGE_SIZE; > p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); > for (i = 0; i < pages; i++, p++) { > - if (rdev->gart.pages[p]) { > + if (1 /*rdev->gart.pages[p]*/) { > rdev->gart.pages[p] = NULL; > for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); j++, t++) { > rdev->gart.pages_entry[t] = rdev->dummy_page.entry; > @@ -301,7 +301,7 @@ int radeon_gart_bind(struct radeon_devic > p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); > > for (i = 0; i < pages; i++, p++) { > - rdev->gart.pages[p] = pagelist[i]; > + /* rdev->gart.pages[p] = pagelist[i]; */ > page_base = dma_addr[i]; > for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); j++, t++) { > page_entry = radeon_gart_get_page_entry(page_base, flags); > --- a/drivers/gpu/drm/radeon/radeon_ttm.c > +++ b/drivers/gpu/drm/radeon/radeon_ttm.c > @@ -360,6 +360,8 @@ static int radeon_ttm_tt_pin_userptr(str > > if (current->mm != gtt->usermm) > return -EPERM; > + if (!ttm->pages) > + return -EPERM; > > if (gtt->userflags & RADEON_GEM_USERPTR_ANONONLY) { > /* check that we only pin down anonymous memory > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Hi guys, adding a few people who ran into the problem as well and opened a kernel bug. Am 07.05.21 um 17:11 schrieb Christian König: > Hi Takashi, > > Am 07.05.21 um 17:08 schrieb Takashi Iwai: >> Hi, >> >> we've received a regression report showing NULL dereference Oops with >> radeon driver on 5.12 kernel: >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.opensuse.org%2Fshow_bug.cgi%3Fid%3D1185516&data=04%7C01%7Cchristian.koenig%40amd.com%7C64447e9e97604aaf117008d9116a742a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637559971181187178%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=1LJMLxuuMjkfgnIt%2B5Z5n19ri%2BMTLMQvEEB%2F%2Fd6SVkg%3D&reserved=0 >> >> It turned out that the recent TTM cleanup / refactoring via commit >> 0575ff3d33cd ("drm/radeon: stop using pages with >> drm_prime_sg_to_page_addr_arrays v2") is the culprit. On 5.12 kernel, >> ttm->pages is no longer allocated / set up, while the radeon driver >> still has a few places assuming the valid ttm->pages, and for the >> reporter (running the modesettig driver), radeon_gart_bind() hits the >> problem. >> >> A hackish patch below was confirmed to work, at least, but obviously >> we need a proper fix. >> >> Could you take a look at it? > > If that's all then that looks trivial to me. > > Going to provide a patch on Monday. Sorry, was a busy start into the week. I've just send a patch which should address the issue to you and the mailing list. Please test since I can't reproduce the problem locally. Thanks, Christian. > > Thanks for the notice, > Christian. > >> >> >> thanks, >> >> Takashi >> >> --- a/drivers/gpu/drm/radeon/radeon_gart.c >> +++ b/drivers/gpu/drm/radeon/radeon_gart.c >> @@ -253,7 +253,7 @@ void radeon_gart_unbind(struct radeon_de >> t = offset / RADEON_GPU_PAGE_SIZE; >> p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); >> for (i = 0; i < pages; i++, p++) { >> - if (rdev->gart.pages[p]) { >> + if (1 /*rdev->gart.pages[p]*/) { >> rdev->gart.pages[p] = NULL; >> for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); >> j++, t++) { >> rdev->gart.pages_entry[t] = rdev->dummy_page.entry; >> @@ -301,7 +301,7 @@ int radeon_gart_bind(struct radeon_devic >> p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); >> for (i = 0; i < pages; i++, p++) { >> - rdev->gart.pages[p] = pagelist[i]; >> + /* rdev->gart.pages[p] = pagelist[i]; */ >> page_base = dma_addr[i]; >> for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); j++, >> t++) { >> page_entry = radeon_gart_get_page_entry(page_base, flags); >> --- a/drivers/gpu/drm/radeon/radeon_ttm.c >> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c >> @@ -360,6 +360,8 @@ static int radeon_ttm_tt_pin_userptr(str >> if (current->mm != gtt->usermm) >> return -EPERM; >> + if (!ttm->pages) >> + return -EPERM; >> if (gtt->userflags & RADEON_GEM_USERPTR_ANONONLY) { >> /* check that we only pin down anonymous memory >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Cchristian.koenig%40amd.com%7C64447e9e97604aaf117008d9116a742a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637559971181187178%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=RdzGp1FLxBKE%2ByPclrUBfQomRU0pQT%2F78Ewcj%2FBZ73g%3D&reserved=0 >> >
Hi Christian, I can confirm that the patch provided fixes the issue in kernel version 5.12.2 Thank you. On Wed, May 12, 2021 at 6:21 AM Christian König <christian.koenig@amd.com> wrote: > > Hi guys, > > adding a few people who ran into the problem as well and opened a kernel > bug. > > Am 07.05.21 um 17:11 schrieb Christian König: > > Hi Takashi, > > > > Am 07.05.21 um 17:08 schrieb Takashi Iwai: > >> Hi, > >> > >> we've received a regression report showing NULL dereference Oops with > >> radeon driver on 5.12 kernel: > >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.opensuse.org%2Fshow_bug.cgi%3Fid%3D1185516&data=04%7C01%7Cchristian.koenig%40amd.com%7C64447e9e97604aaf117008d9116a742a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637559971181187178%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=1LJMLxuuMjkfgnIt%2B5Z5n19ri%2BMTLMQvEEB%2F%2Fd6SVkg%3D&reserved=0 > >> > >> It turned out that the recent TTM cleanup / refactoring via commit > >> 0575ff3d33cd ("drm/radeon: stop using pages with > >> drm_prime_sg_to_page_addr_arrays v2") is the culprit. On 5.12 kernel, > >> ttm->pages is no longer allocated / set up, while the radeon driver > >> still has a few places assuming the valid ttm->pages, and for the > >> reporter (running the modesettig driver), radeon_gart_bind() hits the > >> problem. > >> > >> A hackish patch below was confirmed to work, at least, but obviously > >> we need a proper fix. > >> > >> Could you take a look at it? > > > > If that's all then that looks trivial to me. > > > > Going to provide a patch on Monday. > > Sorry, was a busy start into the week. I've just send a patch which > should address the issue to you and the mailing list. > > Please test since I can't reproduce the problem locally. > > Thanks, > Christian. > > > > > Thanks for the notice, > > Christian. > > > >> > >> > >> thanks, > >> > >> Takashi > >> > >> --- a/drivers/gpu/drm/radeon/radeon_gart.c > >> +++ b/drivers/gpu/drm/radeon/radeon_gart.c > >> @@ -253,7 +253,7 @@ void radeon_gart_unbind(struct radeon_de > >> t = offset / RADEON_GPU_PAGE_SIZE; > >> p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); > >> for (i = 0; i < pages; i++, p++) { > >> - if (rdev->gart.pages[p]) { > >> + if (1 /*rdev->gart.pages[p]*/) { > >> rdev->gart.pages[p] = NULL; > >> for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); > >> j++, t++) { > >> rdev->gart.pages_entry[t] = rdev->dummy_page.entry; > >> @@ -301,7 +301,7 @@ int radeon_gart_bind(struct radeon_devic > >> p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); > >> for (i = 0; i < pages; i++, p++) { > >> - rdev->gart.pages[p] = pagelist[i]; > >> + /* rdev->gart.pages[p] = pagelist[i]; */ > >> page_base = dma_addr[i]; > >> for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); j++, > >> t++) { > >> page_entry = radeon_gart_get_page_entry(page_base, flags); > >> --- a/drivers/gpu/drm/radeon/radeon_ttm.c > >> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c > >> @@ -360,6 +360,8 @@ static int radeon_ttm_tt_pin_userptr(str > >> if (current->mm != gtt->usermm) > >> return -EPERM; > >> + if (!ttm->pages) > >> + return -EPERM; > >> if (gtt->userflags & RADEON_GEM_USERPTR_ANONONLY) { > >> /* check that we only pin down anonymous memory > >> _______________________________________________ > >> amd-gfx mailing list > >> amd-gfx@lists.freedesktop.org > >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Cchristian.koenig%40amd.com%7C64447e9e97604aaf117008d9116a742a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637559971181187178%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=RdzGp1FLxBKE%2ByPclrUBfQomRU0pQT%2F78Ewcj%2FBZ73g%3D&reserved=0 > >> > >
--- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -253,7 +253,7 @@ void radeon_gart_unbind(struct radeon_de t = offset / RADEON_GPU_PAGE_SIZE; p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); for (i = 0; i < pages; i++, p++) { - if (rdev->gart.pages[p]) { + if (1 /*rdev->gart.pages[p]*/) { rdev->gart.pages[p] = NULL; for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); j++, t++) { rdev->gart.pages_entry[t] = rdev->dummy_page.entry; @@ -301,7 +301,7 @@ int radeon_gart_bind(struct radeon_devic p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); for (i = 0; i < pages; i++, p++) { - rdev->gart.pages[p] = pagelist[i]; + /* rdev->gart.pages[p] = pagelist[i]; */ page_base = dma_addr[i]; for (j = 0; j < (PAGE_SIZE / RADEON_GPU_PAGE_SIZE); j++, t++) { page_entry = radeon_gart_get_page_entry(page_base, flags); --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -360,6 +360,8 @@ static int radeon_ttm_tt_pin_userptr(str if (current->mm != gtt->usermm) return -EPERM; + if (!ttm->pages) + return -EPERM; if (gtt->userflags & RADEON_GEM_USERPTR_ANONONLY) { /* check that we only pin down anonymous memory