Message ID | 20230105170138.717-1-mario.limonciello@amd.com (mailing list archive) |
---|---|
Headers | show |
Series | Recover from failure to probe GPU | expand |
On Thu, Jan 5, 2023 at 12:02 PM Mario Limonciello <mario.limonciello@amd.com> wrote: > > One of the first thing that KMS drivers do during initialization is > destroy the system firmware framebuffer by means of > `drm_aperture_remove_conflicting_pci_framebuffers` > > This means that if for any reason the GPU failed to probe the user > will be stuck with at best a screen frozen at the last thing that > was shown before the KMS driver continued it's probe. > > The problem is most pronounced when new GPU support is introduced > because users will need to have a recent linux-firmware snapshot > on their system when they boot a kernel with matching support. > > However the problem is further exaggerated in the case of amdgpu because > it has migrated to "IP discovery" where amdgpu will attempt to load > on "ALL" AMD GPUs even if the driver is missing support for IP blocks > contained in that GPU. > > IP discovery requires some probing and isn't run until after the > framebuffer has been destroyed. > > This means a situation can occur where a user purchases a new GPU not > yet supported by a distribution and when booting the installer it will > "freeze" even if the distribution doesn't have the matching kernel support > for those IP blocks. > > The perfect example of this is Ubuntu 22.10 and the new dGPUs just > launched by AMD. The installation media ships with kernel 5.19 (which > has IP discovery) but the amdgpu support for those IP blocks landed in > kernel 6.0. The matching linux-firmware was released after 22.10's launch. > The screen will freeze without nomodeset. Even if a user manages to install > and then upgrades to kernel 6.0 after install they'll still have the > problem of missing firmware, and the same experience. > > This is quite jarring for users, particularly if they don't know > that they have to use "nomodeset" to install. > > To help the situation make changes to GPU discovery: > 1) Delay releasing the firmware framebuffer until after early_init > completed. This will help the situation of an older kernel that doesn't > yet support the IP blocks probing a new GPU. IP discovery will have failed. > 2) Request loading all PSP, VCN, SDMA, SMU, DMCUB, MES and GC microcode > into memory during early_init. This will help the situation of new enough > kernel for the IP discovery phase to otherwise pass but missing microcode > from linux-firmware.git. Series is: Reviewed-by: Alex Deucher <alexander.deucher@amd.com> > > v6->v7: > * Pick up tags > * Fix PSP TAv1 handling to match previous behavior (securedisplay_context > only is set on PSPv10 and PSPv12/Renoir) > v5->v6: > * Fix arguments for amdgpu_ucode_release to allow clearing pointer > * Fix whitespace mistake in VCN > * Pick up tags > v4->v5: > * Rename amdgpu_ucode_load to amdgpu_ucode_request > * Add and utilize amdgpu_ucode_release throughout existing patches > * Update all amdgpu code to stop using request_firmware and > release_firmware for microcode > * Drop export of amdgpu_ucode_validate outside of amdgpu_ucode.c > * Pick up relevant tags for some patches > v3->v4: > * Rework to delay framebuffer release until early_init is done > * Make IP load microcode during early init phase > * Add SMU and DMCUB checks for early_init loading > * Add some new helper code for wrapping request_firmware calls (needed for > early_init to return something besides -ENOENT) > v2->v3: > * Pick up tags for patches 1-10 > * Rework patch 11 to not validate during discovery > * Fix bugs with GFX9 due to gfx.num_gfx_rings not being set during > discovery > * Fix naming scheme for SDMA on dGPUs > v1->v2: > * Take the suggestion from v1 thread to delay the framebuffer release > until ip discovery is done. This patch is CC to stable to that older > stable kernels with IP discovery won't try to probe unknown IP. > * Drop changes to drm aperature. > * Fetch SDMA, VCN, MES, GC and PSP microcode during IP discovery. > > Mario Limonciello (27): > drm/amd: Delay removal of the firmware framebuffer > drm/amd: Add a legacy mapping to "amdgpu_ucode_ip_version_decode" > drm/amd: Convert SMUv11 microcode to use > `amdgpu_ucode_ip_version_decode` > drm/amd: Convert SMUv13 microcode to use > `amdgpu_ucode_ip_version_decode` > drm/amd: Add a new helper for loading/validating microcode > drm/amd: Use `amdgpu_ucode_request` helper for SDMA > drm/amd: Convert SDMA to use `amdgpu_ucode_ip_version_decode` > drm/amd: Make SDMA firmware load failures less noisy. > drm/amd: Use `amdgpu_ucode_*` helpers for VCN > drm/amd: Load VCN microcode during early_init > drm/amd: Load MES microcode during early_init > drm/amd: Use `amdgpu_ucode_*` helpers for MES > drm/amd: Remove superfluous assignment for `adev->mes.adev` > drm/amd: Use `amdgpu_ucode_*` helpers for GFX9 > drm/amd: Load GFX9 microcode during early_init > drm/amd: Use `amdgpu_ucode_*` helpers for GFX10 > drm/amd: Load GFX10 microcode during early_init > drm/amd: Use `amdgpu_ucode_*` helpers for GFX11 > drm/amd: Load GFX11 microcode during early_init > drm/amd: Parse both v1 and v2 TA microcode headers using same function > drm/amd: Avoid BUG() for case of SRIOV missing IP version > drm/amd: Load PSP microcode during early_init > drm/amd: Use `amdgpu_ucode_*` helpers for PSP > drm/amd/display: Load DMUB microcode during early_init > drm/amd: Use `amdgpu_ucode_release` helper for DMUB > drm/amd: Use `amdgpu_ucode_*` helpers for SMU > drm/amd: Load SMU microcode during early_init > drm/amd: Optimize SRIOV switch/case for PSP microcode load > drm/amd: Use `amdgpu_ucode_*` helpers for GFX6 > drm/amd: Use `amdgpu_ucode_*` helpers for GFX7 > drm/amd: Use `amdgpu_ucode_*` helpers for GFX8 > drm/amd: Use `amdgpu_ucode_*` helpers for GMC6 > drm/amd: Use `amdgpu_ucode_*` helpers for GMC7 > drm/amd: Use `amdgpu_ucode_*` helpers for GMC8 > drm/amd: Use `amdgpu_ucode_*` helpers for SDMA2.4 > drm/amd: Use `amdgpu_ucode_*` helpers for SDMA3.0 > drm/amd: Use `amdgpu_ucode_*` helpers for SDMA on CIK > drm/amd: Use `amdgpu_ucode_*` helpers for UVD > drm/amd: Use `amdgpu_ucode_*` helpers for VCE > drm/amd: Use `amdgpu_ucode_*` helpers for CGS > drm/amd: Use `amdgpu_ucode_*` helpers for GPU info bin > drm/amd: Use `amdgpu_ucode_*` helpers for DMCU > drm/amd: Use `amdgpu_ucode_release` helper for powerplay > drm/amd: Use `amdgpu_ucode_release` helper for si > drm/amd: make amdgpu_ucode_validate static > > drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c | 11 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 22 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 - > drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 59 ++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 299 +++++++++--------- > drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 25 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 4 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 259 ++++++++++++++- > drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h | 4 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 14 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 14 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 65 +--- > drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 1 + > drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 16 +- > drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 155 +++------ > drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 124 +++----- > drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 30 +- > drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 68 +--- > drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 94 ++---- > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 140 ++------ > drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 14 +- > drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 13 +- > drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 13 +- > drivers/gpu/drm/amd/amdgpu/imu_v11_0.c | 7 +- > drivers/gpu/drm/amd/amdgpu/mes_v10_1.c | 108 ++----- > drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 99 ++---- > drivers/gpu/drm/amd/amdgpu/psp_v10_0.c | 80 +---- > drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 131 +------- > drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 79 +---- > drivers/gpu/drm/amd/amdgpu/psp_v13_0.c | 27 +- > drivers/gpu/drm/amd/amdgpu/psp_v13_0_4.c | 14 +- > drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 16 +- > drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 18 +- > drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 18 +- > drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 47 +-- > drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 30 +- > drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 55 +--- > drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 25 +- > drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 5 +- > drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 5 +- > drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 5 +- > drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 5 +- > drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 5 +- > .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 110 ++++--- > drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 11 +- > .../gpu/drm/amd/pm/powerplay/amd_powerplay.c | 3 +- > drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 12 +- > .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c | 51 +-- > .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 28 +- > 50 files changed, 900 insertions(+), 1545 deletions(-) > > -- > 2.34.1 >