Message ID | 20240523173031.4212-1-W_Armin@gmx.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Revert "drm/amdgpu: init iommu after amdkfd device init" | expand |
Am 23.05.24 um 19:30 schrieb Armin Wolf: > This reverts commit 56b522f4668167096a50c39446d6263c96219f5f. > > A user reported that this commit breaks the integrated gpu of his > notebook, causing a black screen. He was able to bisect the problematic > commit and verified that by reverting it the notebook works again. > He also confirmed that kernel 6.8.1 also works on his device, so the > upstream commit itself seems to be ok. > > An amdgpu developer (Alex Deucher) confirmed that this patch should > have never been ported to 5.15 in the first place, so revert this > commit from the 5.15 stable series. Hi, what is the status of this? Armin Wolf > > Reported-by: Barry Kauler <bkauler@gmail.com> > Signed-off-by: Armin Wolf <W_Armin@gmx.de> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 222a1d9ecf16..5f6c32ec674d 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) > if (r) > goto init_failed; > > + r = amdgpu_amdkfd_resume_iommu(adev); > + if (r) > + goto init_failed; > + > r = amdgpu_device_ip_hw_init_phase1(adev); > if (r) > goto init_failed; > @@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) > if (!adev->gmc.xgmi.pending_reset) > amdgpu_amdkfd_device_init(adev); > > - r = amdgpu_amdkfd_resume_iommu(adev); > - if (r) > - goto init_failed; > - > amdgpu_fru_get_product_info(adev); > > init_failed: > -- > 2.39.2 > >
On 2024-06-03 18:19, Armin Wolf wrote: > Am 23.05.24 um 19:30 schrieb Armin Wolf: > >> This reverts commit 56b522f4668167096a50c39446d6263c96219f5f. >> >> A user reported that this commit breaks the integrated gpu of his >> notebook, causing a black screen. He was able to bisect the problematic >> commit and verified that by reverting it the notebook works again. >> He also confirmed that kernel 6.8.1 also works on his device, so the >> upstream commit itself seems to be ok. >> >> An amdgpu developer (Alex Deucher) confirmed that this patch should >> have never been ported to 5.15 in the first place, so revert this >> commit from the 5.15 stable series. > > Hi, > > what is the status of this? Which branch is this for? This patch won't apply to anything after Linux 6.5. Support for IOMMUv2 was removed from amdgpu in Linux 6.6 by: commit c99a2e7ae291e5b19b60443eb6397320ef9e8571 Author: Alex Deucher <alexander.deucher@amd.com> Date: Fri Jul 28 12:20:12 2023 -0400 drm/amdkfd: drop IOMMUv2 support Now that we use the dGPU path for all APUs, drop the IOMMUv2 support. v2: drop the now unused queue manager functions for gfx7/8 APUs Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Regards, Felix > > Armin Wolf > >> >> Reported-by: Barry Kauler <bkauler@gmail.com> >> Signed-off-by: Armin Wolf <W_Armin@gmx.de> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++---- >> 1 file changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index 222a1d9ecf16..5f6c32ec674d 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct >> amdgpu_device *adev) >> if (r) >> goto init_failed; >> >> + r = amdgpu_amdkfd_resume_iommu(adev); >> + if (r) >> + goto init_failed; >> + >> r = amdgpu_device_ip_hw_init_phase1(adev); >> if (r) >> goto init_failed; >> @@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct >> amdgpu_device *adev) >> if (!adev->gmc.xgmi.pending_reset) >> amdgpu_amdkfd_device_init(adev); >> >> - r = amdgpu_amdkfd_resume_iommu(adev); >> - if (r) >> - goto init_failed; >> - >> amdgpu_fru_get_product_info(adev); >> >> init_failed: >> -- >> 2.39.2 >> >>
[AMD Official Use Only - AMD Internal Distribution Only] > -----Original Message----- > From: Kuehling, Felix <Felix.Kuehling@amd.com> > Sent: Tuesday, June 4, 2024 2:25 PM > To: Armin Wolf <W_Armin@gmx.de>; Deucher, Alexander > <Alexander.Deucher@amd.com>; Koenig, Christian > <Christian.Koenig@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>; > gregkh@linuxfoundation.org; sashal@kernel.org > Cc: stable@vger.kernel.org; bkauler@gmail.com; Zhang, Yifan > <Yifan1.Zhang@amd.com>; Liang, Prike <Prike.Liang@amd.com>; dri- > devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org > Subject: Re: [PATCH] Revert "drm/amdgpu: init iommu after amdkfd device > init" > > > On 2024-06-03 18:19, Armin Wolf wrote: > > Am 23.05.24 um 19:30 schrieb Armin Wolf: > > > >> This reverts commit 56b522f4668167096a50c39446d6263c96219f5f. > >> > >> A user reported that this commit breaks the integrated gpu of his > >> notebook, causing a black screen. He was able to bisect the > >> problematic commit and verified that by reverting it the notebook works > again. > >> He also confirmed that kernel 6.8.1 also works on his device, so the > >> upstream commit itself seems to be ok. > >> > >> An amdgpu developer (Alex Deucher) confirmed that this patch should > >> have never been ported to 5.15 in the first place, so revert this > >> commit from the 5.15 stable series. > > > > Hi, > > > > what is the status of this? > > Which branch is this for? This patch won't apply to anything after Linux 6.5. It's applicable to 5.15 stable only. The original patch caused a regression on 5.15 so probably should not have been applied there. Alex > Support for IOMMUv2 was removed from amdgpu in Linux 6.6 by: > > commit c99a2e7ae291e5b19b60443eb6397320ef9e8571 > Author: Alex Deucher <alexander.deucher@amd.com> > Date: Fri Jul 28 12:20:12 2023 -0400 > > drm/amdkfd: drop IOMMUv2 support > > Now that we use the dGPU path for all APUs, drop the > IOMMUv2 support. > > v2: drop the now unused queue manager functions for gfx7/8 APUs > > Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> > Acked-by: Christian König <christian.koenig@amd.com> > Tested-by: Mike Lothian <mike@fireburn.co.uk> > Signed-off-by: Alex Deucher <alexander.deucher@amd.com> > > Regards, > Felix > > > > > > Armin Wolf > > > >> > >> Reported-by: Barry Kauler <bkauler@gmail.com> > >> Signed-off-by: Armin Wolf <W_Armin@gmx.de> > >> --- > >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++---- > >> 1 file changed, 4 insertions(+), 4 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >> index 222a1d9ecf16..5f6c32ec674d 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >> @@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct > >> amdgpu_device *adev) > >> if (r) > >> goto init_failed; > >> > >> + r = amdgpu_amdkfd_resume_iommu(adev); > >> + if (r) > >> + goto init_failed; > >> + > >> r = amdgpu_device_ip_hw_init_phase1(adev); > >> if (r) > >> goto init_failed; > >> @@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct > >> amdgpu_device *adev) > >> if (!adev->gmc.xgmi.pending_reset) > >> amdgpu_amdkfd_device_init(adev); > >> > >> - r = amdgpu_amdkfd_resume_iommu(adev); > >> - if (r) > >> - goto init_failed; > >> - > >> amdgpu_fru_get_product_info(adev); > >> > >> init_failed: > >> -- > >> 2.39.2 > >> > >>
Am 04.06.24 um 20:28 schrieb Deucher, Alexander: > [AMD Official Use Only - AMD Internal Distribution Only] > >> -----Original Message----- >> From: Kuehling, Felix <Felix.Kuehling@amd.com> >> Sent: Tuesday, June 4, 2024 2:25 PM >> To: Armin Wolf <W_Armin@gmx.de>; Deucher, Alexander >> <Alexander.Deucher@amd.com>; Koenig, Christian >> <Christian.Koenig@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>; >> gregkh@linuxfoundation.org; sashal@kernel.org >> Cc: stable@vger.kernel.org; bkauler@gmail.com; Zhang, Yifan >> <Yifan1.Zhang@amd.com>; Liang, Prike <Prike.Liang@amd.com>; dri- >> devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org >> Subject: Re: [PATCH] Revert "drm/amdgpu: init iommu after amdkfd device >> init" >> >> >> On 2024-06-03 18:19, Armin Wolf wrote: >>> Am 23.05.24 um 19:30 schrieb Armin Wolf: >>> >>>> This reverts commit 56b522f4668167096a50c39446d6263c96219f5f. >>>> >>>> A user reported that this commit breaks the integrated gpu of his >>>> notebook, causing a black screen. He was able to bisect the >>>> problematic commit and verified that by reverting it the notebook works >> again. >>>> He also confirmed that kernel 6.8.1 also works on his device, so the >>>> upstream commit itself seems to be ok. >>>> >>>> An amdgpu developer (Alex Deucher) confirmed that this patch should >>>> have never been ported to 5.15 in the first place, so revert this >>>> commit from the 5.15 stable series. >>> Hi, >>> >>> what is the status of this? >> Which branch is this for? This patch won't apply to anything after Linux 6.5. > It's applicable to 5.15 stable only. The original patch caused a regression on 5.15 so probably should not have been applied there. > > Alex > Correct, and i would be very grateful if this regression could be resolved in the near future. The user already wrote a blog post about the whole issue, see here: https://bkhome.org/news/202405/kernel-amd-gpu-disaster-fixed.html Thanks, Armin Wolf >> Support for IOMMUv2 was removed from amdgpu in Linux 6.6 by: >> >> commit c99a2e7ae291e5b19b60443eb6397320ef9e8571 >> Author: Alex Deucher <alexander.deucher@amd.com> >> Date: Fri Jul 28 12:20:12 2023 -0400 >> >> drm/amdkfd: drop IOMMUv2 support >> >> Now that we use the dGPU path for all APUs, drop the >> IOMMUv2 support. >> >> v2: drop the now unused queue manager functions for gfx7/8 APUs >> >> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> >> Acked-by: Christian König <christian.koenig@amd.com> >> Tested-by: Mike Lothian <mike@fireburn.co.uk> >> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> >> >> Regards, >> Felix >> >> >>> Armin Wolf >>> >>>> Reported-by: Barry Kauler <bkauler@gmail.com> >>>> Signed-off-by: Armin Wolf <W_Armin@gmx.de> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++---- >>>> 1 file changed, 4 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> index 222a1d9ecf16..5f6c32ec674d 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> @@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct >>>> amdgpu_device *adev) >>>> if (r) >>>> goto init_failed; >>>> >>>> + r = amdgpu_amdkfd_resume_iommu(adev); >>>> + if (r) >>>> + goto init_failed; >>>> + >>>> r = amdgpu_device_ip_hw_init_phase1(adev); >>>> if (r) >>>> goto init_failed; >>>> @@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct >>>> amdgpu_device *adev) >>>> if (!adev->gmc.xgmi.pending_reset) >>>> amdgpu_amdkfd_device_init(adev); >>>> >>>> - r = amdgpu_amdkfd_resume_iommu(adev); >>>> - if (r) >>>> - goto init_failed; >>>> - >>>> amdgpu_fru_get_product_info(adev); >>>> >>>> init_failed: >>>> -- >>>> 2.39.2 >>>> >>>>
Hi Greg KH, Sasha, Please pick up this patch for 5.15 stable tree. I have built a test kernel and can confirm that it fixes affected users. Downstream bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738 Thanks, Matthew
On Wed, Jun 12, 2024 at 12:10:37PM +1200, Matthew Ruffell wrote: > Hi Greg KH, Sasha, > > Please pick up this patch for 5.15 stable tree. I have built a test kernel and > can confirm that it fixes affected users. > > Downstream bug: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738 Sorry for the delay, now picked up. greg k-h
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 222a1d9ecf16..5f6c32ec674d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) if (r) goto init_failed; + r = amdgpu_amdkfd_resume_iommu(adev); + if (r) + goto init_failed; + r = amdgpu_device_ip_hw_init_phase1(adev); if (r) goto init_failed; @@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) if (!adev->gmc.xgmi.pending_reset) amdgpu_amdkfd_device_init(adev); - r = amdgpu_amdkfd_resume_iommu(adev); - if (r) - goto init_failed; - amdgpu_fru_get_product_info(adev); init_failed:
This reverts commit 56b522f4668167096a50c39446d6263c96219f5f. A user reported that this commit breaks the integrated gpu of his notebook, causing a black screen. He was able to bisect the problematic commit and verified that by reverting it the notebook works again. He also confirmed that kernel 6.8.1 also works on his device, so the upstream commit itself seems to be ok. An amdgpu developer (Alex Deucher) confirmed that this patch should have never been ported to 5.15 in the first place, so revert this commit from the 5.15 stable series. Reported-by: Barry Kauler <bkauler@gmail.com> Signed-off-by: Armin Wolf <W_Armin@gmx.de> --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) -- 2.39.2