diff mbox series

[v4] PCI/VGA: Make the vga_is_firmware_default() less arch-dependent

Message ID 20230815220527.1316537-1-suijingfeng@loongson.cn (mailing list archive)
State Superseded
Delegated to: Bjorn Helgaas
Headers show
Series [v4] PCI/VGA: Make the vga_is_firmware_default() less arch-dependent | expand

Commit Message

Sui Jingfeng Aug. 15, 2023, 10:05 p.m. UTC
Currently, the vga_is_firmware_default() function only works on x86 and
ia64, it is a no-op on ARM, ARM64, PPC, RISC-V, etc. This patch completes
the implementation for the rest of the architectures. The added code tries
to identify the PCI(e) VGA device that owns the firmware framebuffer
before PCI resource reallocation happens.

This patch makes the vga_is_firmware_default() function works on whatever
arch that has UEFI GOP support. But we make it available only on platforms
where PCI resource relocation happens. if the provided method proves to be
effective and reliable, it can be expanded to other arch easily.

v2:
	* Fix test robot warnnings and fix typos

v3:
	* Fix linkage problems if the global screen_info is not exported

v4:
	* Handle linkage problems by hiding behind of CONFIG_SYSFB,
	* Drop side-effects and simplify.

Since only one GPU could owns the firmware fb in normal case, things
are almost done once we determine the boot VGA selected by firmware.
By using the DECLARE_PCI_FIXUP_CLASS_HEADER(), we also ensure that we
could identify the primary display (if there do have one) before PCI
resource reallocation happen.

The method provided in this patch will be effective as long as the target
platform has a way set up the kernel's screen_info. For the machines with
UEFI firmware, the screen_info is typically set up by the UEFI stub with
respect to the UEFI GOP protocol. Since the UEFI stub executes in the
context of the decompressor, and it cannot access the kernel's screen_info
directly. Hence, the efi stub copies the screen_info provided by firmware
to kernel's counterpart. Therefore, we handle linkage problems by using
the CONFIG_EFI guard. Since when CONFIG_EFI is set, the kernel's
screen_info is guaranteed to be static linkable for arm, arm64, riscv.
V4 of this patch handle linkage problems by hiding behind of CONFIG_SYSFB,
since sysfb reference the global screen_info as well.

This patch is tested on:

1) LS3A5000+LS7A1000 platform with three video cards

$ lspci | grep VGA

 00:06.1 VGA compatible controller: Loongson Technology LLC DC (Display Controller) (rev 01)
 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Caicos XT [Radeon HD 7470/8470 / R5 235/310 OEM]
 07:00.0 VGA compatible controller: S3 Graphics Ltd. Device 9070 (rev 01)
 08:00.0 VGA compatible controller: S3 Graphics Ltd. Device 9070 (rev 01)

Before apply this patch:

[    0.361748] pci 0000:00:06.1: vgaarb: setting as boot VGA device
[    0.361751] pci 0000:00:06.1: vgaarb: bridge control possible
[    0.361753] pci 0000:00:06.1: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[    0.361763] pci 0000:03:00.0: vgaarb: bridge control possible
[    0.361765] pci 0000:03:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    0.361771] pci 0000:07:00.0: vgaarb: bridge control possible
[    0.361773] pci 0000:07:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    0.361777] pci 0000:08:00.0: vgaarb: bridge control possible
[    0.361779] pci 0000:08:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    0.361781] vgaarb: loaded
[    0.367838] pci 0000:00:06.1: Overriding boot device as 1002:6778
[    0.367841] pci 0000:00:06.1: Overriding boot device as 5333:9070
[    0.367843] pci 0000:00:06.1: Overriding boot device as 5333:9070

After apply this patch:

[    0.357780] pci 0000:03:00.0: vgaarb: BAR 0 contains firmware FB
[    0.361726] pci 0000:00:06.1: vgaarb: setting as boot VGA device
[    0.361729] pci 0000:00:06.1: vgaarb: bridge control possible
[    0.361731] pci 0000:00:06.1: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[    0.361741] pci 0000:03:00.0: vgaarb: setting as boot VGA device (overriding previous)
[    0.361743] pci 0000:03:00.0: vgaarb: bridge control possible
[    0.361745] pci 0000:03:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    0.361751] pci 0000:07:00.0: vgaarb: bridge control possible
[    0.361753] pci 0000:07:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    0.361757] pci 0000:08:00.0: vgaarb: bridge control possible
[    0.361759] pci 0000:08:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    0.361761] vgaarb: loaded
[   14.730255] radeon 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none

Please note that before apply this patch, vgaarb can not select the right
boot vga due to weird logic introduced with the commit 57fc7323a8e7c ("LoongArch: Add PCI controller support")

Please also note that VARM Bar do moves on LoongArch:

$ dmesg | grep 0000:03:00.0

[    0.357688] pci 0000:03:00.0: [1002:6778] type 00 class 0x030000
[    0.357713] pci 0000:03:00.0: reg 0x10: [mem 0xe0050000000-0xe005fffffff 64bit pref]
[    0.357730] pci 0000:03:00.0: reg 0x18: [mem 0xe0065300000-0xe006531ffff 64bit]
[    0.357741] pci 0000:03:00.0: reg 0x20: [io  0x20000-0x200ff]
[    0.357762] pci 0000:03:00.0: reg 0x30: [mem 0xfffe0000-0xffffffff pref]
[    0.357780] pci 0000:03:00.0: vgaarb: BAR0 contains firmware FB

[    0.359809] pci 0000:03:00.0: BAR 0: assigned [mem 0xe0030000000-0xe003fffffff 64bit pref]
[    0.359821] pci 0000:03:00.0: BAR 2: assigned [mem 0xe0065200000-0xe006521ffff 64bit]
[    0.359832] pci 0000:03:00.0: BAR 6: assigned [mem 0xe0065220000-0xe006523ffff pref]
[    0.359846] pci 0000:03:00.0: BAR 4: assigned [io  0x5000-0x50ff]
[    0.361741] pci 0000:03:00.0: vgaarb: setting as boot VGA device (overriding previous)
[    0.361743] pci 0000:03:00.0: vgaarb: bridge control possible
[    0.361745] pci 0000:03:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none

2) LS3A5000+LS7A2000 platform with four video cards

$ lspci | grep VGA

00:06.1 VGA compatible controller: Loongson Technology LLC Device 7a36 (rev 02)
01:00.0 VGA compatible controller: Silicon Motion, Inc. Device 0768 (rev 03)
04:00.0 VGA compatible controller: Device 0709:0001 (rev 01)
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Lexa PRO [Radeon 540/540X/550/550X / RX 540X/550/550X] (rev c7)

Before apply this patch:

[    0.359345] pci 0000:00:06.1: vgaarb: setting as boot VGA device
[    0.359347] pci 0000:00:06.1: vgaarb: bridge control possible
[    0.359349] pci 0000:00:06.1: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[    0.359358] pci 0000:01:00.0: vgaarb: bridge control possible
[    0.359360] pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    0.359364] pci 0000:04:00.0: vgaarb: bridge control possible
[    0.359366] pci 0000:04:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    0.359369] pci 0000:05:00.0: vgaarb: bridge control possible
[    0.359370] pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    0.359373] vgaarb: loaded
[    0.365852] pci 0000:00:06.1: Overriding boot device as 126F:768
[    0.365855] pci 0000:00:06.1: Overriding boot device as 709:1
[    0.365857] pci 0000:00:06.1: Overriding boot device as 1002:699F
[   14.011380] amdgpu 0000:05:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none

After apply this patch:

[    0.357142] pci 0000:05:00.0: vgaarb: BAR 0 contains firmware FB
[    0.359291] pci 0000:00:06.1: vgaarb: setting as boot VGA device
[    0.359294] pci 0000:00:06.1: vgaarb: bridge control possible
[    0.359296] pci 0000:00:06.1: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[    0.359305] pci 0000:01:00.0: vgaarb: bridge control possible
[    0.359307] pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    0.359311] pci 0000:04:00.0: vgaarb: bridge control possible
[    0.359312] pci 0000:04:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    0.359316] pci 0000:05:00.0: vgaarb: setting as boot VGA device (overriding previous)
[    0.359318] pci 0000:05:00.0: vgaarb: bridge control possible
[    0.359319] pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    0.359321] vgaarb: loaded
[   14.399907] amdgpu 0000:05:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none

Again, the VRAM Bar do moves:

$ dmesg | grep 0000:05:00.0

[    0.357076] pci 0000:05:00.0: [1002:699f] type 00 class 0x030000
[    0.357093] pci 0000:05:00.0: reg 0x10: [mem 0xe0030000000-0xe003fffffff 64bit pref]
[    0.357104] pci 0000:05:00.0: reg 0x18: [mem 0xe0040000000-0xe00401fffff 64bit pref]
[    0.357112] pci 0000:05:00.0: reg 0x20: [io  0x40000-0x400ff]
[    0.357120] pci 0000:05:00.0: reg 0x24: [mem 0xe0074300000-0xe007433ffff]
[    0.357128] pci 0000:05:00.0: reg 0x30: [mem 0xfffe0000-0xffffffff pref]
[    0.357142] pci 0000:05:00.0: vgaarb: BAR0 contains firmware FB
[    0.357720] pci 0000:05:00.0: BAR 0: assigned [mem 0xe0060000000-0xe006fffffff 64bit pref]
[    0.357728] pci 0000:05:00.0: BAR 2: assigned [mem 0xe0058000000-0xe00581fffff 64bit pref]
[    0.357736] pci 0000:05:00.0: BAR 5: assigned [mem 0xe0072b00000-0xe0072b3ffff]
[    0.357740] pci 0000:05:00.0: BAR 6: assigned [mem 0xe0072b40000-0xe0072b5ffff pref]
[    0.357750] pci 0000:05:00.0: BAR 4: assigned [io  0x5000-0x50ff]
[    0.359316] pci 0000:05:00.0: vgaarb: setting as boot VGA device (overriding previous)
[    0.359318] pci 0000:05:00.0: vgaarb: bridge control possible
[    0.359319] pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none

3) Mips (LS3A4000/LS7A1000 + SM750 + Radeon)

Loongson Mips have no UEFI GOP support, we test this patch by modifying the
screen_info manually.

$ lspci | grep VGA

 04:00.0 VGA compatible controller: Silicon Motion, Inc. SM750 (rev a1)
 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland [Radeon HD 8570 / R7 240/340 OEM] (rev 87)

$ dmesg | grep 04:00.0

 pci 0000:04:00.0: [126f:0750] type 00 class 0x030000
 pci 0000:04:00.0: reg 0x10: [mem 0x58000000-0x5bffffff pref]
 pci 0000:04:00.0: reg 0x14: [mem 0x5f000000-0x5f1fffff]
 pci 0000:04:00.0: reg 0x30: [mem 0xffff0000-0xffffffff pref]
 pci 0000:04:00.0: BAR 0: assigned [mem 0x50000000-0x53ffffff pref]
 pci 0000:04:00.0: BAR 1: assigned [mem 0x54000000-0x541fffff]
 pci 0000:04:00.0: BAR 6: assigned [mem 0x54200000-0x5420ffff pref]

$ dmesg | grep 05:00.0

 pci 0000:05:00.0: [1002:6611] type 00 class 0x030000
 pci 0000:05:00.0: reg 0x10: [mem 0x40000000-0x4fffffff 64bit pref]
 pci 0000:05:00.0: reg 0x18: [mem 0x5f300000-0x5f33ffff 64bit]
 pci 0000:05:00.0: reg 0x20: [io  0x40000-0x400ff]
 pci 0000:05:00.0: reg 0x30: [mem 0xfffe0000-0xffffffff pref]
 pci 0000:05:00.0: BAR 0: assigned [mem 0x40000000-0x4fffffff 64bit pref]
 pci 0000:05:00.0: BAR 2: assigned [mem 0x5b300000-0x5b33ffff 64bit]
 pci 0000:05:00.0: BAR 6: assigned [mem 0x5b340000-0x5b35ffff pref]
 pci 0000:05:00.0: BAR 4: assigned [io  0x21000-0x210ff]

Specify the firmware fb at BAR 0 of SM750 by hardcode:

screen_info = (struct screen_info) {
    .orig_video_isVGA = VIDEO_TYPE_EFI,
    .lfb_base = 0x58000000,
    .lfb_size = 1280 * 800 * 4 * 2,
};

$ dmesg | grep vgaarb

 vgaarb: loaded
 pci 0000:04:00.0: vgaarb: BAR 0 contains firmware FB
 pci 0000:04:00.0: vgaarb: setting as boot VGA device
 pci 0000:04:00.0: vgaarb: bridge control possible
 pci 0000:04:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
 pci 0000:05:00.0: vgaarb: bridge control possible
 pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none

Specify the firmware fb at BAR 0 of ATI GPU by hardcode:

screen_info = (struct screen_info) {
    .orig_video_isVGA = VIDEO_TYPE_EFI,
    .lfb_base = 0x40000000,
    .lfb_size = 1280 * 800 * 4 * 2,
};

$ dmesg | grep vgaarb

 vgaarb: loaded
 pci 0000:04:00.0: vgaarb: setting as boot VGA device
 pci 0000:04:00.0: vgaarb: bridge control possible
 pci 0000:04:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
 pci 0000:05:00.0: vgaarb: BAR 0 contains firmware FB
 pci 0000:05:00.0: vgaarb: setting as boot VGA device (overriding previous)
 pci 0000:05:00.0: vgaarb: bridge control possible
 pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
 radeon 0000:05:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem

Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn>
---
 drivers/pci/vgaarb.c | 76 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 75 insertions(+), 1 deletion(-)

Comments

Bjorn Helgaas Aug. 17, 2023, 10:08 p.m. UTC | #1
On Wed, Aug 16, 2023 at 06:05:27AM +0800, Sui Jingfeng wrote:
> Currently, the vga_is_firmware_default() function only works on x86 and
> ia64, it is a no-op on ARM, ARM64, PPC, RISC-V, etc. This patch completes
> the implementation for the rest of the architectures. The added code tries
> to identify the PCI(e) VGA device that owns the firmware framebuffer
> before PCI resource reallocation happens.

As far as I can tell, this is basically identical to the existing
vga_is_firmware_default(), except that this patch funs that code as a
header fixup, so it happens before any PCI BAR reallocations happen.

That sounds like a good idea, because this is all based on the
framebuffer in screen_info, and screen_info was initialized before PCI
enumeration, and it certainly doesn't account for any BAR changes done
by the PCI core.

So why would we keep vga_is_firmware_default() at all?  If the header
fixup has already identified the firmware framebuffer, it seems
pointless to look again later.

> This patch makes the vga_is_firmware_default() function works on whatever
> arch that has UEFI GOP support. But we make it available only on platforms
> where PCI resource relocation happens. if the provided method proves to be
> effective and reliable, it can be expanded to other arch easily.
>
> v2:
> 	* Fix test robot warnnings and fix typos
> 
> v3:
> 	* Fix linkage problems if the global screen_info is not exported
> 
> v4:
> 	* Handle linkage problems by hiding behind of CONFIG_SYSFB,
> 	* Drop side-effects and simplify.

The v2, v3, v4 changelog is nice, but we don't need it in the commit
log itself, where it will become part of the git history.  It should
go in a cover letter or after the "---" marker:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v6.0#n678

> Since only one GPU could owns the firmware fb in normal case, things
> are almost done once we determine the boot VGA selected by firmware.
> By using the DECLARE_PCI_FIXUP_CLASS_HEADER(), we also ensure that we
> could identify the primary display (if there do have one) before PCI
> resource reallocation happen.
> 
> The method provided in this patch will be effective as long as the target
> platform has a way set up the kernel's screen_info. For the machines with
> UEFI firmware, the screen_info is typically set up by the UEFI stub with
> respect to the UEFI GOP protocol. Since the UEFI stub executes in the
> context of the decompressor, and it cannot access the kernel's screen_info
> directly. Hence, the efi stub copies the screen_info provided by firmware
> to kernel's counterpart. Therefore, we handle linkage problems by using
> the CONFIG_EFI guard.  Since when CONFIG_EFI is set, the kernel's
> screen_info is guaranteed to be static linkable for arm, arm64, riscv.
> V4 of this patch handle linkage problems by hiding behind of CONFIG_SYSFB,
> since sysfb reference the global screen_info as well.
>
> This patch is tested on:
> 
> 1) LS3A5000+LS7A1000 platform with three video cards
> 
> $ lspci | grep VGA
> 
>  00:06.1 VGA compatible controller: Loongson Technology LLC DC (Display Controller) (rev 01)
>  03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Caicos XT [Radeon HD 7470/8470 / R5 235/310 OEM]
>  07:00.0 VGA compatible controller: S3 Graphics Ltd. Device 9070 (rev 01)
>  08:00.0 VGA compatible controller: S3 Graphics Ltd. Device 9070 (rev 01)
> 
> Before apply this patch:
> 
> [    0.361748] pci 0000:00:06.1: vgaarb: setting as boot VGA device
> [    0.361751] pci 0000:00:06.1: vgaarb: bridge control possible
> [    0.361753] pci 0000:00:06.1: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
> [    0.361763] pci 0000:03:00.0: vgaarb: bridge control possible
> [    0.361765] pci 0000:03:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
> [    0.361771] pci 0000:07:00.0: vgaarb: bridge control possible
> [    0.361773] pci 0000:07:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
> [    0.361777] pci 0000:08:00.0: vgaarb: bridge control possible
> [    0.361779] pci 0000:08:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
> [    0.361781] vgaarb: loaded
> [    0.367838] pci 0000:00:06.1: Overriding boot device as 1002:6778
> [    0.367841] pci 0000:00:06.1: Overriding boot device as 5333:9070
> [    0.367843] pci 0000:00:06.1: Overriding boot device as 5333:9070
> 
> After apply this patch:
> 
> [    0.357780] pci 0000:03:00.0: vgaarb: BAR 0 contains firmware FB
> [    0.361726] pci 0000:00:06.1: vgaarb: setting as boot VGA device
> [    0.361729] pci 0000:00:06.1: vgaarb: bridge control possible
> [    0.361731] pci 0000:00:06.1: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
> [    0.361741] pci 0000:03:00.0: vgaarb: setting as boot VGA device (overriding previous)
> [    0.361743] pci 0000:03:00.0: vgaarb: bridge control possible
> [    0.361745] pci 0000:03:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
> [    0.361751] pci 0000:07:00.0: vgaarb: bridge control possible
> [    0.361753] pci 0000:07:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
> [    0.361757] pci 0000:08:00.0: vgaarb: bridge control possible
> [    0.361759] pci 0000:08:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
> [    0.361761] vgaarb: loaded
> [   14.730255] radeon 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
> 
> Please note that before apply this patch, vgaarb can not select the
> right boot vga due to weird logic introduced with the commit
> 57fc7323a8e7c ("LoongArch: Add PCI controller support")

If we need this reference to 57fc7323a8e7c, we need more specifics
about what the "weird logic" is.  pci_fixup_vgadev() is the only
obvious VGA connection, so I suppose it's related to that.

> Please also note that VARM Bar do moves on LoongArch:
> 
> $ dmesg | grep 0000:03:00.0
> 
> [    0.357688] pci 0000:03:00.0: [1002:6778] type 00 class 0x030000
> [    0.357713] pci 0000:03:00.0: reg 0x10: [mem 0xe0050000000-0xe005fffffff 64bit pref]
> [    0.357730] pci 0000:03:00.0: reg 0x18: [mem 0xe0065300000-0xe006531ffff 64bit]
> [    0.357741] pci 0000:03:00.0: reg 0x20: [io  0x20000-0x200ff]
> [    0.357762] pci 0000:03:00.0: reg 0x30: [mem 0xfffe0000-0xffffffff pref]
> [    0.357780] pci 0000:03:00.0: vgaarb: BAR0 contains firmware FB
> 
> [    0.359809] pci 0000:03:00.0: BAR 0: assigned [mem 0xe0030000000-0xe003fffffff 64bit pref]
> [    0.359821] pci 0000:03:00.0: BAR 2: assigned [mem 0xe0065200000-0xe006521ffff 64bit]
> [    0.359832] pci 0000:03:00.0: BAR 6: assigned [mem 0xe0065220000-0xe006523ffff pref]
> [    0.359846] pci 0000:03:00.0: BAR 4: assigned [io  0x5000-0x50ff]
> [    0.361741] pci 0000:03:00.0: vgaarb: setting as boot VGA device (overriding previous)
> [    0.361743] pci 0000:03:00.0: vgaarb: bridge control possible
> [    0.361745] pci 0000:03:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none

I guess the point here is that:

  - 03:00.0 BAR 0 is [mem 0xe0050000000-0xe005fffffff]

  - screen_info says the framebuffer is
    [mem 0xe0050000000-0xe005fffffff] (or part of it)

  - Therefore, we want 03:00.0 to be the default VGA

  - PCI core reassigns 03:00.0 BAR 0 to
    [mem 0xe0030000000-0xe003fffffff]

  - PCI core assigns a 00:06.1 BAR to contain
    [mem 0xe0050000000-0xe005fffffff]

  - vga_is_firmware_default() incorrectly decides 00:06.1 should be
    the default VGA because it has a BAR that contains the screen_info
    address range

Is that right?  If so, log messages like this would show the problem:

  pci 0000:00:06.0: reg 0x10: [mem ...]
  pci 0000:03:00.0: reg 0x10: [mem 0xe0050000000-0xe005fffffff]
  pci 0000:03:00.0: BAR 0: assigned [mem 0xe0030000000-0xe003fffffff]
  pci 0000:00:06.0: BAR X: assigned [mem 0xe0050000000-0xe005fffffff]
  pci 0000:00:06.1: vgaarb: setting as boot VGA device

and something like this would show the fix:

  pci 0000:00:06.0: reg 0x10: [mem ...]
  pci 0000:03:00.0: reg 0x10: [mem 0xe0050000000-0xe005fffffff]
  pci 0000:03:00.0: vgaarb: BAR 0 [mem 0xe0050000000-0xe005fffffff] contains firmware framebuffer [mem 0xe0050000000-0xe005fffffff]
  pci 0000:03:00.0: vgaarb: setting as boot VGA device
  pci 0000:03:00.0: BAR 0: assigned [mem 0xe0030000000-0xe003fffffff]
  pci 0000:00:06.0: BAR X: assigned [mem 0xe0050000000-0xe005fffffff]

I don't care about all the "bridge control possible" and "VGA device
added: decodes=..." messages.  Those aren't relevant to this problem.

I don't think the lspci output is relevant either.  That would be
relevant to a patch removing the Loongson pci_fixup_vgadev(), but as
far as I can see, only the screen_info framebuffer address, the BAR
contents, and the PCI core BAR reassignment are relevant to *this*
patch.

> 2) LS3A5000+LS7A2000 platform with four video cards
> 
> $ lspci | grep VGA
> 
> 00:06.1 VGA compatible controller: Loongson Technology LLC Device 7a36 (rev 02)
> 01:00.0 VGA compatible controller: Silicon Motion, Inc. Device 0768 (rev 03)
> 04:00.0 VGA compatible controller: Device 0709:0001 (rev 01)
> 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Lexa PRO [Radeon 540/540X/550/550X / RX 540X/550/550X] (rev c7)
> 
> Before apply this patch:
> 
> [    0.359345] pci 0000:00:06.1: vgaarb: setting as boot VGA device
> [    0.359347] pci 0000:00:06.1: vgaarb: bridge control possible
> [    0.359349] pci 0000:00:06.1: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
> [    0.359358] pci 0000:01:00.0: vgaarb: bridge control possible
> [    0.359360] pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
> [    0.359364] pci 0000:04:00.0: vgaarb: bridge control possible
> [    0.359366] pci 0000:04:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
> [    0.359369] pci 0000:05:00.0: vgaarb: bridge control possible
> [    0.359370] pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
> [    0.359373] vgaarb: loaded
> [    0.365852] pci 0000:00:06.1: Overriding boot device as 126F:768
> [    0.365855] pci 0000:00:06.1: Overriding boot device as 709:1
> [    0.365857] pci 0000:00:06.1: Overriding boot device as 1002:699F
> [   14.011380] amdgpu 0000:05:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
> 
> After apply this patch:
> 
> [    0.357142] pci 0000:05:00.0: vgaarb: BAR 0 contains firmware FB
> [    0.359291] pci 0000:00:06.1: vgaarb: setting as boot VGA device
> [    0.359294] pci 0000:00:06.1: vgaarb: bridge control possible
> [    0.359296] pci 0000:00:06.1: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
> [    0.359305] pci 0000:01:00.0: vgaarb: bridge control possible
> [    0.359307] pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
> [    0.359311] pci 0000:04:00.0: vgaarb: bridge control possible
> [    0.359312] pci 0000:04:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
> [    0.359316] pci 0000:05:00.0: vgaarb: setting as boot VGA device (overriding previous)
> [    0.359318] pci 0000:05:00.0: vgaarb: bridge control possible
> [    0.359319] pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
> [    0.359321] vgaarb: loaded
> [   14.399907] amdgpu 0000:05:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
> 
> Again, the VRAM Bar do moves:
> 
> $ dmesg | grep 0000:05:00.0
> 
> [    0.357076] pci 0000:05:00.0: [1002:699f] type 00 class 0x030000
> [    0.357093] pci 0000:05:00.0: reg 0x10: [mem 0xe0030000000-0xe003fffffff 64bit pref]
> [    0.357104] pci 0000:05:00.0: reg 0x18: [mem 0xe0040000000-0xe00401fffff 64bit pref]
> [    0.357112] pci 0000:05:00.0: reg 0x20: [io  0x40000-0x400ff]
> [    0.357120] pci 0000:05:00.0: reg 0x24: [mem 0xe0074300000-0xe007433ffff]
> [    0.357128] pci 0000:05:00.0: reg 0x30: [mem 0xfffe0000-0xffffffff pref]
> [    0.357142] pci 0000:05:00.0: vgaarb: BAR0 contains firmware FB
> [    0.357720] pci 0000:05:00.0: BAR 0: assigned [mem 0xe0060000000-0xe006fffffff 64bit pref]
> [    0.357728] pci 0000:05:00.0: BAR 2: assigned [mem 0xe0058000000-0xe00581fffff 64bit pref]
> [    0.357736] pci 0000:05:00.0: BAR 5: assigned [mem 0xe0072b00000-0xe0072b3ffff]
> [    0.357740] pci 0000:05:00.0: BAR 6: assigned [mem 0xe0072b40000-0xe0072b5ffff pref]
> [    0.357750] pci 0000:05:00.0: BAR 4: assigned [io  0x5000-0x50ff]
> [    0.359316] pci 0000:05:00.0: vgaarb: setting as boot VGA device (overriding previous)
> [    0.359318] pci 0000:05:00.0: vgaarb: bridge control possible
> [    0.359319] pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
> 
> 3) Mips (LS3A4000/LS7A1000 + SM750 + Radeon)
> 
> Loongson Mips have no UEFI GOP support, we test this patch by modifying the
> screen_info manually.
> 
> $ lspci | grep VGA
> 
>  04:00.0 VGA compatible controller: Silicon Motion, Inc. SM750 (rev a1)
>  05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland [Radeon HD 8570 / R7 240/340 OEM] (rev 87)
> 
> $ dmesg | grep 04:00.0
> 
>  pci 0000:04:00.0: [126f:0750] type 00 class 0x030000
>  pci 0000:04:00.0: reg 0x10: [mem 0x58000000-0x5bffffff pref]
>  pci 0000:04:00.0: reg 0x14: [mem 0x5f000000-0x5f1fffff]
>  pci 0000:04:00.0: reg 0x30: [mem 0xffff0000-0xffffffff pref]
>  pci 0000:04:00.0: BAR 0: assigned [mem 0x50000000-0x53ffffff pref]
>  pci 0000:04:00.0: BAR 1: assigned [mem 0x54000000-0x541fffff]
>  pci 0000:04:00.0: BAR 6: assigned [mem 0x54200000-0x5420ffff pref]
> 
> $ dmesg | grep 05:00.0
> 
>  pci 0000:05:00.0: [1002:6611] type 00 class 0x030000
>  pci 0000:05:00.0: reg 0x10: [mem 0x40000000-0x4fffffff 64bit pref]
>  pci 0000:05:00.0: reg 0x18: [mem 0x5f300000-0x5f33ffff 64bit]
>  pci 0000:05:00.0: reg 0x20: [io  0x40000-0x400ff]
>  pci 0000:05:00.0: reg 0x30: [mem 0xfffe0000-0xffffffff pref]
>  pci 0000:05:00.0: BAR 0: assigned [mem 0x40000000-0x4fffffff 64bit pref]
>  pci 0000:05:00.0: BAR 2: assigned [mem 0x5b300000-0x5b33ffff 64bit]
>  pci 0000:05:00.0: BAR 6: assigned [mem 0x5b340000-0x5b35ffff pref]
>  pci 0000:05:00.0: BAR 4: assigned [io  0x21000-0x210ff]
> 
> Specify the firmware fb at BAR 0 of SM750 by hardcode:
> 
> screen_info = (struct screen_info) {
>     .orig_video_isVGA = VIDEO_TYPE_EFI,
>     .lfb_base = 0x58000000,
>     .lfb_size = 1280 * 800 * 4 * 2,
> };
> 
> $ dmesg | grep vgaarb
> 
>  vgaarb: loaded
>  pci 0000:04:00.0: vgaarb: BAR 0 contains firmware FB
>  pci 0000:04:00.0: vgaarb: setting as boot VGA device
>  pci 0000:04:00.0: vgaarb: bridge control possible
>  pci 0000:04:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
>  pci 0000:05:00.0: vgaarb: bridge control possible
>  pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
> 
> Specify the firmware fb at BAR 0 of ATI GPU by hardcode:
> 
> screen_info = (struct screen_info) {
>     .orig_video_isVGA = VIDEO_TYPE_EFI,
>     .lfb_base = 0x40000000,
>     .lfb_size = 1280 * 800 * 4 * 2,
> };
> 
> $ dmesg | grep vgaarb
> 
>  vgaarb: loaded
>  pci 0000:04:00.0: vgaarb: setting as boot VGA device
>  pci 0000:04:00.0: vgaarb: bridge control possible
>  pci 0000:04:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
>  pci 0000:05:00.0: vgaarb: BAR 0 contains firmware FB
>  pci 0000:05:00.0: vgaarb: setting as boot VGA device (overriding previous)
>  pci 0000:05:00.0: vgaarb: bridge control possible
>  pci 0000:05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
>  radeon 0000:05:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
> 
> Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn>
> ---
>  drivers/pci/vgaarb.c | 76 +++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 75 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c
> index 5a696078b382..7b6c3a772e91 100644
> --- a/drivers/pci/vgaarb.c
> +++ b/drivers/pci/vgaarb.c
> @@ -60,7 +60,8 @@ static int vga_count, vga_decode_count;
>  static bool vga_arbiter_used;
>  static DEFINE_SPINLOCK(vga_lock);
>  static DECLARE_WAIT_QUEUE_HEAD(vga_wait_queue);
> -
> +/* The PCI(e) device who owns the firmware framebuffer */
> +static struct pci_dev *pdev_boot_vga;
>  
>  static const char *vga_iostate_to_str(unsigned int iostate)
>  {
> @@ -571,6 +572,9 @@ static bool vga_is_firmware_default(struct pci_dev *pdev)
>  
>  		return true;
>  	}
> +#else
> +	if (pdev_boot_vga && pdev_boot_vga == pdev)
> +		return true;
>  #endif
>  	return false;
>  }
> @@ -1555,3 +1559,73 @@ static int __init vga_arb_device_init(void)
>  	return rc;
>  }
>  subsys_initcall_sync(vga_arb_device_init);
> +
> +/*
> + * Get the physical address range that the firmware framebuffer occupies.
> + *
> + * The global screen_info is arch-specific, CONFIG_SYSFB is chosen as
> + * compile-time conditional to suppress linkage problems.
> + */
> +static bool vga_arb_get_firmware_fb_range(resource_size_t *start,
> +					  resource_size_t *end)
> +{
> +	resource_size_t fb_start = 0;
> +	resource_size_t fb_size = 0;
> +	resource_size_t fb_end;
> +
> +#if defined(CONFIG_X86) || defined(CONFIG_IA64) || defined(CONFIG_SYSFB)
> +	fb_start = screen_info.lfb_base;
> +	if (screen_info.capabilities & VIDEO_CAPABILITY_64BIT_BASE)
> +		fb_start |= (u64)screen_info.ext_lfb_base << 32;
> +
> +	fb_size = screen_info.lfb_size;
> +#endif
> +
> +	/* No firmware framebuffer support */
> +	if (!fb_start || !fb_size)
> +		return false;
> +
> +	fb_end = fb_start + fb_size - 1;
> +
> +	*start = fb_start;
> +	*end = fb_end;
> +
> +	return true;
> +}
> +
> +/*
> + * Identify the PCI VGA device that contains the firmware framebuffer
> + */
> +static void pci_boot_vga_finder(struct pci_dev *pdev)
> +{
> +	resource_size_t fb_start;
> +	resource_size_t fb_end;
> +	unsigned int i;
> +
> +	/* Already found the pdev which has firmware framebuffer ownership */
> +	if (pdev_boot_vga)
> +		return;
> +
> +	if (!vga_arb_get_firmware_fb_range(&fb_start, &fb_end))
> +		return;
> +
> +	for (i = 0; i < PCI_STD_NUM_BARS; i++) {
> +		struct resource *res = &pdev->resource[i];

This is essentially identical to vga_is_firmware_default() so it
should look the same, i.e., it should use pci_dev_for_each_resource().

> +		if (resource_type(res) != IORESOURCE_MEM)
> +			continue;
> +
> +		if (!res->start || !res->end)
> +			continue;
> +
> +		if (res->start <= fb_start && fb_end <= res->end) {
> +			pdev_boot_vga = pdev;
> +
> +			vgaarb_info(&pdev->dev,
> +				    "BAR %d contains firmware FB\n", i);

Print the BAR with %pR and include the framebuffer region from
screen_info in the same format.

> +			break;
> +		}
> +	}
> +}
> +DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_DISPLAY_VGA,
> +			       8, pci_boot_vga_finder);
> -- 
> 2.34.1
>
Sui Jingfeng Aug. 18, 2023, 1:48 a.m. UTC | #2
Hi,


On 2023/8/18 06:08, Bjorn Helgaas wrote:
>> +		if (resource_type(res) != IORESOURCE_MEM)
>> +			continue;
>> +
>> +		if (!res->start || !res->end)
>> +			continue;
>> +
>> +		if (res->start <= fb_start && fb_end <= res->end) {
>> +			pdev_boot_vga = pdev;
>> +
>> +			vgaarb_info(&pdev->dev,
>> +				    "BAR %d contains firmware FB\n", i);
> Print the BAR with %pR and include the framebuffer region from
> screen_info in the same format.
>

I do remember that you already told me to do this in V3, sorry for not 
replying to you at V3. Most of the time, what you tell me is right.But 
here, I think I need to explain. Because doing it that way will make the 
code line too long,and it will exceed 80 characters in the column if we 
print too much.
I believe that the vgaarb_info() at here is already the most compact and 
simplest form. Printing the BAR with %pR is not absolute necessary, 
because we can get the additional information by: $ lspci | grep VGA

$ dmesg | grep 05:00.0
$ dmesg | grep 0000:03:00.0
$ dmesg | grep PCI


Actually, I only add something that is absolute necessary.
Printing BAR with %pR and/or Printing the framebuffer region
is consider to only for *debugging* purpose.
Sui Jingfeng Aug. 18, 2023, 4:09 a.m. UTC | #3
Hi,


On 2023/8/18 06:08, Bjorn Helgaas wrote:
> On Wed, Aug 16, 2023 at 06:05:27AM +0800, Sui Jingfeng wrote:
>> Currently, the vga_is_firmware_default() function only works on x86 and
>> ia64, it is a no-op on ARM, ARM64, PPC, RISC-V, etc. This patch completes
>> the implementation for the rest of the architectures. The added code tries
>> to identify the PCI(e) VGA device that owns the firmware framebuffer
>> before PCI resource reallocation happens.
> As far as I can tell, this is basically identical to the existing
> vga_is_firmware_default(), except that this patch funs that code as a
> header fixup, so it happens before any PCI BAR reallocations happen.


Yes, what you said is right in overall.
But I think I should mention a few tiny points that make a difference.

1) My version is *less arch-dependent*


Again, since the global screen_info is arch-dependent.
The vga_is_firmware_default() mess up the arch-dependent part and arch-independent part.
It's a mess and it's a bit harder to make the cleanup on the top of it.

While my version is my version split the arch-dependent part and arch-independent part clearly.
Since we decide to make it less arch-dependent, we have to bear the pain.
Despite all other arches should always export the screen_info like the X86 and IA64 arch does,
or at least a arch should give a Kconfig token (for example, CONFIG_ARCH_HAS_SCREEN_INFO) to
demonstrate that an arch has the support for it.
While currently, the fact is that the dependence just populated to everywhere.
I think this is the hard part, you have to investigate how various arches defines and set up
the screen_info. And then process dependency and the linkage problem across arch properly.


2) My version focus on the address in ranges, weaken the size parameter.

Which make the code easy to read and follow the canonical convention to
express the address range. while the vga_is_firmware_default() is not.


3) A tiny change make a big difference.


The original vga_is_firmware_default() only works with the assumption
that the PCI resource reallocation won't happens. While I see no clue
that why this is true even on X86 and IA64. The original patch[1] not
mention this assumption explicitly.
  
[1] 86fd887b7fe3 ('vgaarb: Don't default exclusively to first video device with mem+io')


> That sounds like a good idea, because this is all based on the
> framebuffer in screen_info, and screen_info was initialized before PCI
> enumeration, and it certainly doesn't account for any BAR changes done
> by the PCI core.


Yes.


> So why would we keep vga_is_firmware_default() at all?  If the header
> fixup has already identified the firmware framebuffer, it seems
> pointless to look again later.
>

It need another patch to do the cleanup work, while my patch just add code to solve the real problem.
It focus on provide a solution for the architectures which have a decent way set up the screen_info.
Other things except that is secondary.
Sui Jingfeng Aug. 18, 2023, 9:31 a.m. UTC | #4
Hi,


On 2023/8/18 06:08, Bjorn Helgaas wrote:
>> This patch makes the vga_is_firmware_default() function works on whatever
>> arch that has UEFI GOP support. But we make it available only on platforms
>> where PCI resource relocation happens. if the provided method proves to be
>> effective and reliable, it can be expanded to other arch easily.
>>
>> v2:
>> 	* Fix test robot warnnings and fix typos
>>
>> v3:
>> 	* Fix linkage problems if the global screen_info is not exported
>>
>> v4:
>> 	* Handle linkage problems by hiding behind of CONFIG_SYSFB,
>> 	* Drop side-effects and simplify.
> The v2, v3, v4 changelog is nice, but we don't need it in the commit
> log itself, where it will become part of the git history.  It should
> go in a cover letter or after the "---" marker:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v6.0#n678

Thanks for point it out, now I know. Will be fixed at the next version.
Sui Jingfeng Aug. 18, 2023, 10:20 a.m. UTC | #5
Hi,


On 2023/8/18 06:08, Bjorn Helgaas wrote:
>> Please note that before apply this patch, vgaarb can not select the
>> right boot vga due to weird logic introduced with the commit
>> 57fc7323a8e7c ("LoongArch: Add PCI controller support")
> If we need this reference to 57fc7323a8e7c, we need more specifics
> about what the "weird logic" is.  pci_fixup_vgadev() is the only
> obvious VGA connection, so I suppose it's related to that.
>
Yes, you are right.

The pci_fixup_vgadev() function will set the last VGA device enumerated as the default boot device.
By "the last" VGA device, I mean that this device has the largest PCI bus, domain, and function triple.
Thus, it is added to vgaarb in the end of all VGA device.
So that logic expresses that the last one added will be the default.
This probably is not what we want.


On the LS3A5000+LS7A1000 platform, the last VGA device is a S3 graphics 
(08:00.0). This GPU has two cores. Say the log below:


$ lspci | grep VGA

  00:06.1 VGA compatible controller: Loongson Technology LLC DC (Display Controller) (rev 01)
  03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Caicos XT [Radeon HD 7470/8470 / R5 235/310 OEM]
  07:00.0 VGA compatible controller: S3 Graphics Ltd. Device 9070 (rev 01)
  08:00.0 VGA compatible controller: S3 Graphics Ltd. Device 9070 (rev 01)

[    0.361781] vgaarb: loaded
[    0.367838] pci 0000:00:06.1: Overriding boot device as 1002:6778
[    0.367841] pci 0000:00:06.1: Overriding boot device as 5333:9070
[    0.367843] pci 0000:00:06.1: Overriding boot device as 5333:9070

1) The "weird" logic completely overrides whatever decision VGAARB ever made.

It seems to say that the decision ever made by VGAARB is useless.
Well, I think VGAARB shouldn't endure this; VGAARB has to be small.

  

2) The results it gives are not correct either.

In the first testing example in my commit message,
it overrides the S3 graphics as the default boot VGA instead of the AMD/ATI GPU.
Actually, the firmware chooses the AMD/ATI GPU as the "frimware default".

  

3) It tries to make the decision for the end user instead of the firmware.

Therefore, that function is always wrong. Again, it's a policy, not a mechanism.


Since that already have been merge, I'm fine.
Maybe Huacai is busy, he might don't has the time to carry on a deep thinking.
But I think we should correct the mistake ever made,
let's merge this patch to make vgaarb great again ?


Well, that commit is not a dependency, I don't mind delete the referencing
to that commit. After all, I think my patch will be effective on other architectures.
Is additional testing on ARM64 and X86 is needed, if so I have to find the machine to
carry on the testing.
Sui Jingfeng Aug. 19, 2023, 2:16 a.m. UTC | #6
Hi,


On 2023/8/18 06:08, Bjorn Helgaas wrote:
>> Please note that before apply this patch, vgaarb can not select the
>> right boot vga due to weird logic introduced with the commit
>> 57fc7323a8e7c ("LoongArch: Add PCI controller support")
> If we need this reference to 57fc7323a8e7c, we need more specifics
> about what the "weird logic" is.  pci_fixup_vgadev() is the only
> obvious VGA connection, so I suppose it's related to that.
>

Sorry, re-reading your reviews,
I realized that I may miss the point,  here, the sentence is not well phased,
Correct expression should be:


Please note that before apply this patch, there don't have a reasonable, system level solution on loongarch.
VGAARB select a wrong device as primary GPU due to the pci_fixup_vgadev() function
introduced with the commit 57fc7323a8e7c ("LoongArch: Add PCI controller support").
Sui Jingfeng Aug. 19, 2023, 2:18 a.m. UTC | #7
Hi,

On 2023/8/18 06:08, Bjorn Helgaas wrote:
>> +
>> +/*
>> + * Identify the PCI VGA device that contains the firmware framebuffer
>> + */
>> +static void pci_boot_vga_finder(struct pci_dev *pdev)
>> +{
>> +	resource_size_t fb_start;
>> +	resource_size_t fb_end;
>> +	unsigned int i;
>> +
>> +	/* Already found the pdev which has firmware framebuffer ownership */
>> +	if (pdev_boot_vga)
>> +		return;
>> +
>> +	if (!vga_arb_get_firmware_fb_range(&fb_start, &fb_end))
>> +		return;
>> +
>> +	for (i = 0; i < PCI_STD_NUM_BARS; i++) {
>> +		struct resource *res = &pdev->resource[i];
> This is essentially identical to vga_is_firmware_default() so it
> should look the same, i.e., it should use pci_dev_for_each_resource().
>

OK, will be fixed at the next version. Thanks.
Sui Jingfeng Aug. 19, 2023, 2:40 a.m. UTC | #8
Hi,


On 2023/8/18 06:08, Bjorn Helgaas wrote:
> I guess the point here is that:
>
>    - 03:00.0 BAR 0 is [mem 0xe0050000000-0xe005fffffff]
>
>    - screen_info says the framebuffer is
>      [mem 0xe0050000000-0xe005fffffff] (or part of it)
>
>    - Therefore, we want 03:00.0 to be the default VGA
>
>    - PCI core reassigns 03:00.0 BAR 0 to
>      [mem 0xe0030000000-0xe003fffffff]
>
>    - PCI core assigns a 00:06.1 BAR to contain
>      [mem 0xe0050000000-0xe005fffffff]
>
>    - vga_is_firmware_default() incorrectly decides 00:06.1 should be
>      the default VGA because it has a BAR that contains the screen_info
>      address range
>
> Is that right?

Yes, The 00:06.1 is loongson integrated display controller, integrated in LS7A1000 North bridge.


On loongarch, before apply apply any patch, VGAARB always select 00:06.1 as the default boot device.
because it is enumerated first. It is always the first VGA compatible device found on our system.
Because its PCI domain, bus, function number is smallest. And it own IO and MEM, so the 00:06.1 will
always be selected as the default boot device. Even you plug a discrete GPU on the mother board.


Therefore we need help the vga_is_firmware_default() to overriding previous.
On a multiple GPU co-exist case (on loongson platform), if no "overriding previous" being printed.
then there something wrong. On normal case, we need the discrete GPU overriding the integrated one.
Because the discrete GPU is more powerful than the platform integrated.

But what we want is let the VGAARB determine the primary GPU by referencing the firmware.

If firmware put the firmware framebuffer in the VRAM of the integrated display card(00:06.1).
then the integrated display card should be the primary GPU.

If firmware put the firmware framebuffer in the VRAM of the discrete display card,
then the discrete display card should be the primary GPU.
Sui Jingfeng Aug. 21, 2023, 6:47 a.m. UTC | #9
Hi,


On 2023/8/18 18:20, suijingfeng wrote:
>
> 1) The "weird" logic completely overrides whatever decision VGAARB 
> ever made.
>
> It seems to say that the decision ever made by VGAARB is useless.
> Well, I think VGAARB shouldn't endure this; VGAARB has to be small. 


VGAARB have to be smart!

The "weird logic" here refer to the weird pci_fixup_vgadev() function.
Bjorn Helgaas Aug. 21, 2023, 5:33 p.m. UTC | #10
On Fri, Aug 18, 2023 at 09:48:46AM +0800, suijingfeng wrote:
> On 2023/8/18 06:08, Bjorn Helgaas wrote:
> > > +		if (resource_type(res) != IORESOURCE_MEM)
> > > +			continue;
> > > +
> > > +		if (!res->start || !res->end)
> > > +			continue;
> > > +
> > > +		if (res->start <= fb_start && fb_end <= res->end) {
> > > +			pdev_boot_vga = pdev;
> > > +
> > > +			vgaarb_info(&pdev->dev,
> > > +				    "BAR %d contains firmware FB\n", i);
> > Print the BAR with %pR and include the framebuffer region from
> > screen_info in the same format.
> 
> I do remember that you already told me to do this in V3, sorry for not
> replying to you at V3. Most of the time, what you tell me is right.But here,
> I think I need to explain. Because doing it that way will make the code line
> too long,and it will exceed 80 characters in the column if we print too
> much.
> I believe that the vgaarb_info() at here is already the most compact and
> simplest form. Printing the BAR with %pR is not absolute necessary, because
> we can get the additional information by: $ lspci | grep VGA
> 
> $ dmesg | grep 05:00.0
> $ dmesg | grep 0000:03:00.0
> $ dmesg | grep PCI

Fair enough.  The BAR info is already there.  But I don't think the
screen_info framebuffer data is in the dmesg log anywhere, and I think
that would be useful.

It's fine if dmesg lines or kernel printk lines exceed 80 columns.

> Actually, I only add something that is absolute necessary.
> Printing BAR with %pR and/or Printing the framebuffer region
> is consider to only for *debugging* purpose.

I think printing the framebuffer region is important because it makes
it clear *why* we're selecting the device as the default VGA device.
It's more than just debugging; it helps make the system more
transparent and more understandable.

Bjorn
Bjorn Helgaas Aug. 21, 2023, 5:38 p.m. UTC | #11
On Fri, Aug 18, 2023 at 12:09:29PM +0800, suijingfeng wrote:
> On 2023/8/18 06:08, Bjorn Helgaas wrote:
> > On Wed, Aug 16, 2023 at 06:05:27AM +0800, Sui Jingfeng wrote:
> > > Currently, the vga_is_firmware_default() function only works on x86 and
> > > ia64, it is a no-op on ARM, ARM64, PPC, RISC-V, etc. This patch completes
> > > the implementation for the rest of the architectures. The added code tries
> > > to identify the PCI(e) VGA device that owns the firmware framebuffer
> > > before PCI resource reallocation happens.
> >
> > As far as I can tell, this is basically identical to the existing
> > vga_is_firmware_default(), except that this patch funs that code as a
> > header fixup, so it happens before any PCI BAR reallocations happen.
> 
> Yes, what you said is right in overall.
> But I think I should mention a few tiny points that make a difference.
> 
> 1) My version is *less arch-dependent*

Of course.  If we make the patch simple and the commit log simple by
removing extraneous details, this will all be obvious.

> 2) My version focus on the address in ranges, weaken the size parameter.
> 
> Which make the code easy to read and follow the canonical convention to
> express the address range. while the vga_is_firmware_default() is not.

Whether it's start/size or start/end is a trivial question.  We don't
need to waste time on it now.

> 3) A tiny change make a big difference.
> 
> The original vga_is_firmware_default() only works with the assumption
> that the PCI resource reallocation won't happens. While I see no clue
> that why this is true even on X86 and IA64. The original patch[1] not
> mention this assumption explicitly.
> [1] 86fd887b7fe3 ('vgaarb: Don't default exclusively to first video device with mem+io')
> 
> > That sounds like a good idea, because this is all based on the
> > framebuffer in screen_info, and screen_info was initialized before PCI
> > enumeration, and it certainly doesn't account for any BAR changes done
> > by the PCI core.
> 
> Yes.
> 
> > So why would we keep vga_is_firmware_default() at all?  If the header
> > fixup has already identified the firmware framebuffer, it seems
> > pointless to look again later.
> 
> It need another patch to do the cleanup work, while my patch just
> add code to solve the real problem.  It focus on provide a solution
> for the architectures which have a decent way set up the
> screen_info.  Other things except that is secondary.

I don't want both mechanisms when only one of them is useful.  PCI BAR
reassignment is completely fine, and keeping the assumption in
vga_is_firmware_default() that we can compare reassigned BAR values to
the pre-reassignment screen_info range is a trap that we should
remove.

Bjorn
Sui Jingfeng Aug. 22, 2023, 1:19 a.m. UTC | #12
Hi,


On 2023/8/22 01:33, Bjorn Helgaas wrote:
> On Fri, Aug 18, 2023 at 09:48:46AM +0800, suijingfeng wrote:
>> On 2023/8/18 06:08, Bjorn Helgaas wrote:
>>>> +		if (resource_type(res) != IORESOURCE_MEM)
>>>> +			continue;
>>>> +
>>>> +		if (!res->start || !res->end)
>>>> +			continue;
>>>> +
>>>> +		if (res->start <= fb_start && fb_end <= res->end) {
>>>> +			pdev_boot_vga = pdev;
>>>> +
>>>> +			vgaarb_info(&pdev->dev,
>>>> +				    "BAR %d contains firmware FB\n", i);
>>> Print the BAR with %pR and include the framebuffer region from
>>> screen_info in the same format.
>> I do remember that you already told me to do this in V3, sorry for not
>> replying to you at V3. Most of the time, what you tell me is right.But here,
>> I think I need to explain. Because doing it that way will make the code line
>> too long,and it will exceed 80 characters in the column if we print too
>> much.
>> I believe that the vgaarb_info() at here is already the most compact and
>> simplest form. Printing the BAR with %pR is not absolute necessary, because
>> we can get the additional information by: $ lspci | grep VGA
>>
>> $ dmesg | grep 05:00.0
>> $ dmesg | grep 0000:03:00.0
>> $ dmesg | grep PCI
> Fair enough.  The BAR info is already there.  But I don't think the
> screen_info framebuffer data is in the dmesg log anywhere, and I think
> that would be useful.
>
> It's fine if dmesg lines or kernel printk lines exceed 80 columns.
>
>> Actually, I only add something that is absolute necessary.
>> Printing BAR with %pR and/or Printing the framebuffer region
>> is consider to only for *debugging* purpose.
> I think printing the framebuffer region is important because it makes
> it clear *why* we're selecting the device as the default VGA device.
> It's more than just debugging; it helps make the system more
> transparent and more understandable.

OK, I'm clear what to do next.
The printing will be added at the next version.

> Bjorn
Sui Jingfeng Aug. 22, 2023, 2:37 a.m. UTC | #13
Hi,

On 2023/8/22 01:38, Bjorn Helgaas wrote:
> On Fri, Aug 18, 2023 at 12:09:29PM +0800, suijingfeng wrote:
>> On 2023/8/18 06:08, Bjorn Helgaas wrote:
>>> On Wed, Aug 16, 2023 at 06:05:27AM +0800, Sui Jingfeng wrote:
>>>> Currently, the vga_is_firmware_default() function only works on x86 and
>>>> ia64, it is a no-op on ARM, ARM64, PPC, RISC-V, etc. This patch completes
>>>> the implementation for the rest of the architectures. The added code tries
>>>> to identify the PCI(e) VGA device that owns the firmware framebuffer
>>>> before PCI resource reallocation happens.
>>> As far as I can tell, this is basically identical to the existing
>>> vga_is_firmware_default(), except that this patch funs that code as a
>>> header fixup, so it happens before any PCI BAR reallocations happen.
>> Yes, what you said is right in overall.
>> But I think I should mention a few tiny points that make a difference.
>>
>> 1) My version is *less arch-dependent*
> Of course.  If we make the patch simple and the commit log simple by
> removing extraneous details, this will all be obvious.
>
>> 2) My version focus on the address in ranges, weaken the size parameter.
>>
>> Which make the code easy to read and follow the canonical convention to
>> express the address range. while the vga_is_firmware_default() is not.
> Whether it's start/size or start/end is a trivial question.  We don't
> need to waste time on it now.
>
>> 3) A tiny change make a big difference.
>>
>> The original vga_is_firmware_default() only works with the assumption
>> that the PCI resource reallocation won't happens. While I see no clue
>> that why this is true even on X86 and IA64. The original patch[1] not
>> mention this assumption explicitly.
>> [1] 86fd887b7fe3 ('vgaarb: Don't default exclusively to first video device with mem+io')
>>
>>> That sounds like a good idea, because this is all based on the
>>> framebuffer in screen_info, and screen_info was initialized before PCI
>>> enumeration, and it certainly doesn't account for any BAR changes done
>>> by the PCI core.
>> Yes.
>>
>>> So why would we keep vga_is_firmware_default() at all?  If the header
>>> fixup has already identified the firmware framebuffer, it seems
>>> pointless to look again later.
>> It need another patch to do the cleanup work, while my patch just
>> add code to solve the real problem.  It focus on provide a solution
>> for the architectures which have a decent way set up the
>> screen_info.  Other things except that is secondary.
> I don't want both mechanisms when only one of them is useful.  PCI BAR
> reassignment is completely fine, and keeping the assumption in
> vga_is_firmware_default() that we can compare reassigned BAR values to
> the pre-reassignment screen_info range is a trap that we should
> remove.


OK,it's clear now.  I know what to do next.
Thanks.


> Bjorn
diff mbox series

Patch

diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c
index 5a696078b382..7b6c3a772e91 100644
--- a/drivers/pci/vgaarb.c
+++ b/drivers/pci/vgaarb.c
@@ -60,7 +60,8 @@  static int vga_count, vga_decode_count;
 static bool vga_arbiter_used;
 static DEFINE_SPINLOCK(vga_lock);
 static DECLARE_WAIT_QUEUE_HEAD(vga_wait_queue);
-
+/* The PCI(e) device who owns the firmware framebuffer */
+static struct pci_dev *pdev_boot_vga;
 
 static const char *vga_iostate_to_str(unsigned int iostate)
 {
@@ -571,6 +572,9 @@  static bool vga_is_firmware_default(struct pci_dev *pdev)
 
 		return true;
 	}
+#else
+	if (pdev_boot_vga && pdev_boot_vga == pdev)
+		return true;
 #endif
 	return false;
 }
@@ -1555,3 +1559,73 @@  static int __init vga_arb_device_init(void)
 	return rc;
 }
 subsys_initcall_sync(vga_arb_device_init);
+
+/*
+ * Get the physical address range that the firmware framebuffer occupies.
+ *
+ * The global screen_info is arch-specific, CONFIG_SYSFB is chosen as
+ * compile-time conditional to suppress linkage problems.
+ */
+static bool vga_arb_get_firmware_fb_range(resource_size_t *start,
+					  resource_size_t *end)
+{
+	resource_size_t fb_start = 0;
+	resource_size_t fb_size = 0;
+	resource_size_t fb_end;
+
+#if defined(CONFIG_X86) || defined(CONFIG_IA64) || defined(CONFIG_SYSFB)
+	fb_start = screen_info.lfb_base;
+	if (screen_info.capabilities & VIDEO_CAPABILITY_64BIT_BASE)
+		fb_start |= (u64)screen_info.ext_lfb_base << 32;
+
+	fb_size = screen_info.lfb_size;
+#endif
+
+	/* No firmware framebuffer support */
+	if (!fb_start || !fb_size)
+		return false;
+
+	fb_end = fb_start + fb_size - 1;
+
+	*start = fb_start;
+	*end = fb_end;
+
+	return true;
+}
+
+/*
+ * Identify the PCI VGA device that contains the firmware framebuffer
+ */
+static void pci_boot_vga_finder(struct pci_dev *pdev)
+{
+	resource_size_t fb_start;
+	resource_size_t fb_end;
+	unsigned int i;
+
+	/* Already found the pdev which has firmware framebuffer ownership */
+	if (pdev_boot_vga)
+		return;
+
+	if (!vga_arb_get_firmware_fb_range(&fb_start, &fb_end))
+		return;
+
+	for (i = 0; i < PCI_STD_NUM_BARS; i++) {
+		struct resource *res = &pdev->resource[i];
+
+		if (resource_type(res) != IORESOURCE_MEM)
+			continue;
+
+		if (!res->start || !res->end)
+			continue;
+
+		if (res->start <= fb_start && fb_end <= res->end) {
+			pdev_boot_vga = pdev;
+
+			vgaarb_info(&pdev->dev,
+				    "BAR %d contains firmware FB\n", i);
+			break;
+		}
+	}
+}
+DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_DISPLAY_VGA,
+			       8, pci_boot_vga_finder);