Message ID | 9d5cd002-eb29-bc4d-89e8-a4a594dd84ad@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 09/30/16 12:35, Hans de Goede wrote: > Attached are 2 patches against the xserver which should fix this, > please give them a try. Sorry about the delay. The patches don't seem to fix the issue for me. Please see the Xorg log attached. I tested the patches as follows. Given that my bisection had been done in a Fedora 24 guest, using xorg-x11-server-1.18.4-4.fc24 http://koji.fedoraproject.org/koji/buildinfo?buildID=794494 I now rebuilt the guest kernel exactly at the failing commit (a325725 "drm: Lobotomize set_busid nonsense for !pci drivers"), and first reproduced the issue with the above X server. Then, I ported your patches to "xorg-server-1.18.4" (using the upstream xserver tree), and rebuilt the Fedora package with the backport. For the backport, I had to cherry-pick the following two patches from master first: 1 ca8d88e50310 xfree86: recognize primary BUS_PCI device in xf86IsPrimaryPlatform() 2 ea91db4b8331 config: fix GPUDevice fail when AutoAddGPU off + BusID This way your patches applied cleanly. (Cherry pick #1 above is actually necessary for semantics, while cherry pick #2 is needed for a clean context only, and has no impact for this test.) That is, in total, I added the following four patches to the Fedora 24 package: 1 xfree86: recognize primary BUS_PCI device in xf86IsPrimaryPlatform() 2 config: fix GPUDevice fail when AutoAddGPU off + BusID 3 xfree86: Make adding unclaimed devices as GPU devices a separate step 4 xfree86: Try harder to find atleast 1 non GPU Screen You can find the scratch build that I used for testing here: xorg-x11-server-1.18.4-4.hans_bz1366842_2.fc24 http://koji.fedoraproject.org/koji/taskinfo?taskID=15875087 Another reason I used F24's X server as basis, rather than upstream HEAD, is that Fedora 24 is pretty young, and it's already on kernel 4.7.4, and I believe it will soon move to kernel 4.8, without (necessarily) rebasing its X server package to upstream. IOW the kernel upgrade to 4.8 will break X in Fedora 24 too, and then I expect the Fedora X maintainers would have to cherry pick those two patches as dependencies just the same. To summarize, the patches don't seem to help. I shall nonetheless thank you for spending your Friday on this! Laszlo
Hi, On 30-09-16 16:51, Laszlo Ersek wrote: > On 09/30/16 12:35, Hans de Goede wrote: > >> Attached are 2 patches against the xserver which should fix this, >> please give them a try. > > Sorry about the delay. > > The patches don't seem to fix the issue for me. Please see the Xorg log > attached. > > I tested the patches as follows. Given that my bisection had been done > in a Fedora 24 guest, using > > xorg-x11-server-1.18.4-4.fc24 > http://koji.fedoraproject.org/koji/buildinfo?buildID=794494 > > I now rebuilt the guest kernel exactly at the failing commit (a325725 > "drm: Lobotomize set_busid nonsense for !pci drivers"), and first > reproduced the issue with the above X server. > > Then, I ported your patches to "xorg-server-1.18.4" (using the upstream > xserver tree), and rebuilt the Fedora package with the backport. For the > backport, I had to cherry-pick the following two patches from master first: > > 1 ca8d88e50310 xfree86: recognize primary BUS_PCI device in > xf86IsPrimaryPlatform() > 2 ea91db4b8331 config: fix GPUDevice fail when AutoAddGPU off + BusID > > This way your patches applied cleanly. (Cherry pick #1 above is actually > necessary for semantics, while cherry pick #2 is needed for a clean > context only, and has no impact for this test.) > > That is, in total, I added the following four patches to the Fedora 24 > package: > > 1 xfree86: recognize primary BUS_PCI device in xf86IsPrimaryPlatform() > 2 config: fix GPUDevice fail when AutoAddGPU off + BusID > 3 xfree86: Make adding unclaimed devices as GPU devices a separate step > 4 xfree86: Try harder to find atleast 1 non GPU Screen > > You can find the scratch build that I used for testing here: > > xorg-x11-server-1.18.4-4.hans_bz1366842_2.fc24 > http://koji.fedoraproject.org/koji/taskinfo?taskID=15875087 > > Another reason I used F24's X server as basis, rather than upstream > HEAD, is that Fedora 24 is pretty young, and it's already on kernel > 4.7.4, and I believe it will soon move to kernel 4.8, without > (necessarily) rebasing its X server package to upstream. IOW the kernel > upgrade to 4.8 will break X in Fedora 24 too, and then I expect the > Fedora X maintainers would have to cherry pick those two patches as > dependencies just the same. > > To summarize, the patches don't seem to help. I shall nonetheless thank > you for spending your Friday on this! Hmm, do you have a xorg.conf file lying around somewhere, the message about the xserver not being able to find an entry for screen 0 does not make sense ... Anyways, Monday is another day we can look at this ... Regards, Hans
On 09/30/16 16:59, Hans de Goede wrote: > Hi, > > On 30-09-16 16:51, Laszlo Ersek wrote: >> On 09/30/16 12:35, Hans de Goede wrote: >> >>> Attached are 2 patches against the xserver which should fix this, >>> please give them a try. >> >> Sorry about the delay. >> >> The patches don't seem to fix the issue for me. Please see the Xorg log >> attached. >> >> I tested the patches as follows. Given that my bisection had been done >> in a Fedora 24 guest, using >> >> xorg-x11-server-1.18.4-4.fc24 >> http://koji.fedoraproject.org/koji/buildinfo?buildID=794494 >> >> I now rebuilt the guest kernel exactly at the failing commit (a325725 >> "drm: Lobotomize set_busid nonsense for !pci drivers"), and first >> reproduced the issue with the above X server. >> >> Then, I ported your patches to "xorg-server-1.18.4" (using the upstream >> xserver tree), and rebuilt the Fedora package with the backport. For the >> backport, I had to cherry-pick the following two patches from master >> first: >> >> 1 ca8d88e50310 xfree86: recognize primary BUS_PCI device in >> xf86IsPrimaryPlatform() >> 2 ea91db4b8331 config: fix GPUDevice fail when AutoAddGPU off + BusID >> >> This way your patches applied cleanly. (Cherry pick #1 above is actually >> necessary for semantics, while cherry pick #2 is needed for a clean >> context only, and has no impact for this test.) >> >> That is, in total, I added the following four patches to the Fedora 24 >> package: >> >> 1 xfree86: recognize primary BUS_PCI device in xf86IsPrimaryPlatform() >> 2 config: fix GPUDevice fail when AutoAddGPU off + BusID >> 3 xfree86: Make adding unclaimed devices as GPU devices a separate step >> 4 xfree86: Try harder to find atleast 1 non GPU Screen >> >> You can find the scratch build that I used for testing here: >> >> xorg-x11-server-1.18.4-4.hans_bz1366842_2.fc24 >> http://koji.fedoraproject.org/koji/taskinfo?taskID=15875087 >> >> Another reason I used F24's X server as basis, rather than upstream >> HEAD, is that Fedora 24 is pretty young, and it's already on kernel >> 4.7.4, and I believe it will soon move to kernel 4.8, without >> (necessarily) rebasing its X server package to upstream. IOW the kernel >> upgrade to 4.8 will break X in Fedora 24 too, and then I expect the >> Fedora X maintainers would have to cherry pick those two patches as >> dependencies just the same. >> >> To summarize, the patches don't seem to help. I shall nonetheless thank >> you for spending your Friday on this! > > Hmm, do you have a xorg.conf file lying around somewhere, the message > about the xserver not being able to find an entry for screen 0 does > not make sense ... Good catch, I actually had two files under "/etc/X11/xorg.conf.d/": * "00-keyboard.conf", from package "systemd-229-13.fc24.x86_64", with contents ------------ # Read and parsed by systemd-localed. It's probably wise not to edit this file # manually too freely. Section "InputClass" Identifier "system-keyboard" MatchIsKeyboard "on" Option "XkbLayout" "us" EndSection ------------ * "01-resolution.conf", which I had created, in order to set the preferred display resolution: ------------ Section "Screen" Identifier "Default Screen" Device "Default Device" Monitor "Default Monitor" EndSection Section "Device" Identifier "Default Device" Driver "modesetting" EndSection Section "Monitor" Identifier "Default Monitor" Option "PreferredMode" "640x480" # Option "PreferredMode" "1440x900" EndSection ------------ I removed these files now, and repeated the test. Again, the X server wouldn't start, but I think the log file looks a bit different now. Attached. > Anyways, Monday is another day we can look at this ... Sure; many thanks! Laszlo
Hi, On 30-09-16 17:33, Laszlo Ersek wrote: > On 09/30/16 16:59, Hans de Goede wrote: >> Hi, >> >> On 30-09-16 16:51, Laszlo Ersek wrote: >>> On 09/30/16 12:35, Hans de Goede wrote: >>> >>>> Attached are 2 patches against the xserver which should fix this, >>>> please give them a try. >>> >>> Sorry about the delay. >>> >>> The patches don't seem to fix the issue for me. Please see the Xorg log >>> attached. >>> >>> I tested the patches as follows. Given that my bisection had been done >>> in a Fedora 24 guest, using >>> >>> xorg-x11-server-1.18.4-4.fc24 >>> http://koji.fedoraproject.org/koji/buildinfo?buildID=794494 >>> >>> I now rebuilt the guest kernel exactly at the failing commit (a325725 >>> "drm: Lobotomize set_busid nonsense for !pci drivers"), and first >>> reproduced the issue with the above X server. >>> >>> Then, I ported your patches to "xorg-server-1.18.4" (using the upstream >>> xserver tree), and rebuilt the Fedora package with the backport. For the >>> backport, I had to cherry-pick the following two patches from master >>> first: >>> >>> 1 ca8d88e50310 xfree86: recognize primary BUS_PCI device in >>> xf86IsPrimaryPlatform() >>> 2 ea91db4b8331 config: fix GPUDevice fail when AutoAddGPU off + BusID >>> >>> This way your patches applied cleanly. (Cherry pick #1 above is actually >>> necessary for semantics, while cherry pick #2 is needed for a clean >>> context only, and has no impact for this test.) >>> >>> That is, in total, I added the following four patches to the Fedora 24 >>> package: >>> >>> 1 xfree86: recognize primary BUS_PCI device in xf86IsPrimaryPlatform() >>> 2 config: fix GPUDevice fail when AutoAddGPU off + BusID >>> 3 xfree86: Make adding unclaimed devices as GPU devices a separate step >>> 4 xfree86: Try harder to find atleast 1 non GPU Screen >>> >>> You can find the scratch build that I used for testing here: >>> >>> xorg-x11-server-1.18.4-4.hans_bz1366842_2.fc24 >>> http://koji.fedoraproject.org/koji/taskinfo?taskID=15875087 >>> >>> Another reason I used F24's X server as basis, rather than upstream >>> HEAD, is that Fedora 24 is pretty young, and it's already on kernel >>> 4.7.4, and I believe it will soon move to kernel 4.8, without >>> (necessarily) rebasing its X server package to upstream. IOW the kernel >>> upgrade to 4.8 will break X in Fedora 24 too, and then I expect the >>> Fedora X maintainers would have to cherry pick those two patches as >>> dependencies just the same. >>> >>> To summarize, the patches don't seem to help. I shall nonetheless thank >>> you for spending your Friday on this! >> >> Hmm, do you have a xorg.conf file lying around somewhere, the message >> about the xserver not being able to find an entry for screen 0 does >> not make sense ... > > Good catch, I actually had two files under "/etc/X11/xorg.conf.d/": > > * "00-keyboard.conf", from package "systemd-229-13.fc24.x86_64", with > contents > > ------------ > # Read and parsed by systemd-localed. It's probably wise not to edit > this file > # manually too freely. > Section "InputClass" > Identifier "system-keyboard" > MatchIsKeyboard "on" > Option "XkbLayout" "us" > EndSection > ------------ > > * "01-resolution.conf", which I had created, in order to set the > preferred display resolution: > > ------------ > Section "Screen" > Identifier "Default Screen" > Device "Default Device" > Monitor "Default Monitor" > EndSection > > Section "Device" > Identifier "Default Device" > Driver "modesetting" > EndSection > > Section "Monitor" > Identifier "Default Monitor" > Option "PreferredMode" "640x480" > # Option "PreferredMode" "1440x900" > EndSection > ------------ > > I removed these files now, and repeated the test. Again, the X server > wouldn't start, but I think the log file looks a bit different now. > Attached. Ah, ok so it seems that my initial analysis is wrong, the problem is not a re-occuring of the device getting identified as a GPU screen, libdrm sorta depends on bus-ids and the lack of one is causing the server to misbehave. I guess that even with a xorg.conf things will fail with the troublesome kernel version (might be worth trying). Emil's analysis seems to be spot on. This does not seem easily fixable in userspace / does seem like a real regression as it even breaks things when specifying the device through xorg.conf (I or so I believe) which is something which uses to work ... I made the mistake of thinking the kernel change was re-triggering the old problem Laszlo fixed, but that does not seem to be the case. Regards, Hans
On 09/30/16 18:38, Hans de Goede wrote: > Hi, > > On 30-09-16 17:33, Laszlo Ersek wrote: >> On 09/30/16 16:59, Hans de Goede wrote: >>> Hi, >>> >>> On 30-09-16 16:51, Laszlo Ersek wrote: >>>> On 09/30/16 12:35, Hans de Goede wrote: >>>> >>>>> Attached are 2 patches against the xserver which should fix this, >>>>> please give them a try. >>>> >>>> Sorry about the delay. >>>> >>>> The patches don't seem to fix the issue for me. Please see the Xorg log >>>> attached. >>>> >>>> I tested the patches as follows. Given that my bisection had been done >>>> in a Fedora 24 guest, using >>>> >>>> xorg-x11-server-1.18.4-4.fc24 >>>> http://koji.fedoraproject.org/koji/buildinfo?buildID=794494 >>>> >>>> I now rebuilt the guest kernel exactly at the failing commit (a325725 >>>> "drm: Lobotomize set_busid nonsense for !pci drivers"), and first >>>> reproduced the issue with the above X server. >>>> >>>> Then, I ported your patches to "xorg-server-1.18.4" (using the upstream >>>> xserver tree), and rebuilt the Fedora package with the backport. For >>>> the >>>> backport, I had to cherry-pick the following two patches from master >>>> first: >>>> >>>> 1 ca8d88e50310 xfree86: recognize primary BUS_PCI device in >>>> xf86IsPrimaryPlatform() >>>> 2 ea91db4b8331 config: fix GPUDevice fail when AutoAddGPU off + BusID >>>> >>>> This way your patches applied cleanly. (Cherry pick #1 above is >>>> actually >>>> necessary for semantics, while cherry pick #2 is needed for a clean >>>> context only, and has no impact for this test.) >>>> >>>> That is, in total, I added the following four patches to the Fedora 24 >>>> package: >>>> >>>> 1 xfree86: recognize primary BUS_PCI device in xf86IsPrimaryPlatform() >>>> 2 config: fix GPUDevice fail when AutoAddGPU off + BusID >>>> 3 xfree86: Make adding unclaimed devices as GPU devices a separate step >>>> 4 xfree86: Try harder to find atleast 1 non GPU Screen >>>> >>>> You can find the scratch build that I used for testing here: >>>> >>>> xorg-x11-server-1.18.4-4.hans_bz1366842_2.fc24 >>>> http://koji.fedoraproject.org/koji/taskinfo?taskID=15875087 >>>> >>>> Another reason I used F24's X server as basis, rather than upstream >>>> HEAD, is that Fedora 24 is pretty young, and it's already on kernel >>>> 4.7.4, and I believe it will soon move to kernel 4.8, without >>>> (necessarily) rebasing its X server package to upstream. IOW the kernel >>>> upgrade to 4.8 will break X in Fedora 24 too, and then I expect the >>>> Fedora X maintainers would have to cherry pick those two patches as >>>> dependencies just the same. >>>> >>>> To summarize, the patches don't seem to help. I shall nonetheless thank >>>> you for spending your Friday on this! >>> >>> Hmm, do you have a xorg.conf file lying around somewhere, the message >>> about the xserver not being able to find an entry for screen 0 does >>> not make sense ... >> >> Good catch, I actually had two files under "/etc/X11/xorg.conf.d/": >> >> * "00-keyboard.conf", from package "systemd-229-13.fc24.x86_64", with >> contents >> >> ------------ >> # Read and parsed by systemd-localed. It's probably wise not to edit >> this file >> # manually too freely. >> Section "InputClass" >> Identifier "system-keyboard" >> MatchIsKeyboard "on" >> Option "XkbLayout" "us" >> EndSection >> ------------ >> >> * "01-resolution.conf", which I had created, in order to set the >> preferred display resolution: >> >> ------------ >> Section "Screen" >> Identifier "Default Screen" >> Device "Default Device" >> Monitor "Default Monitor" >> EndSection >> >> Section "Device" >> Identifier "Default Device" >> Driver "modesetting" >> EndSection >> >> Section "Monitor" >> Identifier "Default Monitor" >> Option "PreferredMode" "640x480" >> # Option "PreferredMode" "1440x900" >> EndSection >> ------------ >> >> I removed these files now, and repeated the test. Again, the X server >> wouldn't start, but I think the log file looks a bit different now. >> Attached. > > Ah, ok so it seems that my initial analysis is wrong, the problem > is not a re-occuring of the device getting identified as a GPU screen, > libdrm sorta depends on bus-ids and the lack of one is causing the > server to misbehave. I guess that even with a xorg.conf things > will fail with the troublesome kernel version (might be worth > trying). > > Emil's analysis seems to be spot on. This does not seem easily > fixable in userspace / does seem like a real regression as it > even breaks things when specifying the device through xorg.conf > (I or so I believe) which is something which uses to work ... In order to check this hypothesis, I did the following: - I downgraded my xorg-x11-server installation to the most recent official F24 packages, that is, "1.18.4-4.fc24", - I kept the kernel that I built exactly at the regressive commit (a325725633c2) - I modified "01-resolution.conf" (see it above in the context) like this: ---- Section "Device" Identifier "Default Device" Driver "modesetting" BusID "PCI:00:02:0" <------------ new option added EndSection ---- where BusID matches the B/D/F of the virtio-vga device from "lspci". This setup (modulo the kernel of course) was known to work, but now the X server actually segfaults (apparently in the xf86PlatformDeviceCheckBusID() function). Please find the logfile attached. (NB: this is unrelated to upstream commit de9ce6757c2e -- which the pristine FC24 build lacks -- because I don't set AutoAddGPU to "off" -- it is left at its default "on" value.) Therefore, you are right. :) Thanks Laszlo > I made the mistake of thinking the kernel change was re-triggering > the old problem Laszlo fixed, but that does not seem to be the > case. > > Regards, > > Hans
On 30 September 2016 at 18:26, Laszlo Ersek <lersek@redhat.com> wrote: > On 09/30/16 18:38, Hans de Goede wrote: >> Hi, >> >> On 30-09-16 17:33, Laszlo Ersek wrote: >>> On 09/30/16 16:59, Hans de Goede wrote: >>>> Hi, >>>> >>>> On 30-09-16 16:51, Laszlo Ersek wrote: >>>>> On 09/30/16 12:35, Hans de Goede wrote: >>>>> >>>>>> Attached are 2 patches against the xserver which should fix this, >>>>>> please give them a try. >>>>> >>>>> Sorry about the delay. >>>>> >>>>> The patches don't seem to fix the issue for me. Please see the Xorg log >>>>> attached. >>>>> >>>>> I tested the patches as follows. Given that my bisection had been done >>>>> in a Fedora 24 guest, using >>>>> >>>>> xorg-x11-server-1.18.4-4.fc24 >>>>> http://koji.fedoraproject.org/koji/buildinfo?buildID=794494 >>>>> >>>>> I now rebuilt the guest kernel exactly at the failing commit (a325725 >>>>> "drm: Lobotomize set_busid nonsense for !pci drivers"), and first >>>>> reproduced the issue with the above X server. >>>>> >>>>> Then, I ported your patches to "xorg-server-1.18.4" (using the upstream >>>>> xserver tree), and rebuilt the Fedora package with the backport. For >>>>> the >>>>> backport, I had to cherry-pick the following two patches from master >>>>> first: >>>>> >>>>> 1 ca8d88e50310 xfree86: recognize primary BUS_PCI device in >>>>> xf86IsPrimaryPlatform() >>>>> 2 ea91db4b8331 config: fix GPUDevice fail when AutoAddGPU off + BusID >>>>> >>>>> This way your patches applied cleanly. (Cherry pick #1 above is >>>>> actually >>>>> necessary for semantics, while cherry pick #2 is needed for a clean >>>>> context only, and has no impact for this test.) >>>>> >>>>> That is, in total, I added the following four patches to the Fedora 24 >>>>> package: >>>>> >>>>> 1 xfree86: recognize primary BUS_PCI device in xf86IsPrimaryPlatform() >>>>> 2 config: fix GPUDevice fail when AutoAddGPU off + BusID >>>>> 3 xfree86: Make adding unclaimed devices as GPU devices a separate step >>>>> 4 xfree86: Try harder to find atleast 1 non GPU Screen >>>>> >>>>> You can find the scratch build that I used for testing here: >>>>> >>>>> xorg-x11-server-1.18.4-4.hans_bz1366842_2.fc24 >>>>> http://koji.fedoraproject.org/koji/taskinfo?taskID=15875087 >>>>> >>>>> Another reason I used F24's X server as basis, rather than upstream >>>>> HEAD, is that Fedora 24 is pretty young, and it's already on kernel >>>>> 4.7.4, and I believe it will soon move to kernel 4.8, without >>>>> (necessarily) rebasing its X server package to upstream. IOW the kernel >>>>> upgrade to 4.8 will break X in Fedora 24 too, and then I expect the >>>>> Fedora X maintainers would have to cherry pick those two patches as >>>>> dependencies just the same. >>>>> >>>>> To summarize, the patches don't seem to help. I shall nonetheless thank >>>>> you for spending your Friday on this! >>>> >>>> Hmm, do you have a xorg.conf file lying around somewhere, the message >>>> about the xserver not being able to find an entry for screen 0 does >>>> not make sense ... >>> >>> Good catch, I actually had two files under "/etc/X11/xorg.conf.d/": >>> >>> * "00-keyboard.conf", from package "systemd-229-13.fc24.x86_64", with >>> contents >>> >>> ------------ >>> # Read and parsed by systemd-localed. It's probably wise not to edit >>> this file >>> # manually too freely. >>> Section "InputClass" >>> Identifier "system-keyboard" >>> MatchIsKeyboard "on" >>> Option "XkbLayout" "us" >>> EndSection >>> ------------ >>> >>> * "01-resolution.conf", which I had created, in order to set the >>> preferred display resolution: >>> >>> ------------ >>> Section "Screen" >>> Identifier "Default Screen" >>> Device "Default Device" >>> Monitor "Default Monitor" >>> EndSection >>> >>> Section "Device" >>> Identifier "Default Device" >>> Driver "modesetting" >>> EndSection >>> >>> Section "Monitor" >>> Identifier "Default Monitor" >>> Option "PreferredMode" "640x480" >>> # Option "PreferredMode" "1440x900" >>> EndSection >>> ------------ >>> >>> I removed these files now, and repeated the test. Again, the X server >>> wouldn't start, but I think the log file looks a bit different now. >>> Attached. >> >> Ah, ok so it seems that my initial analysis is wrong, the problem >> is not a re-occuring of the device getting identified as a GPU screen, >> libdrm sorta depends on bus-ids and the lack of one is causing the >> server to misbehave. I guess that even with a xorg.conf things >> will fail with the troublesome kernel version (might be worth >> trying). >> >> Emil's analysis seems to be spot on. This does not seem easily >> fixable in userspace / does seem like a real regression as it >> even breaks things when specifying the device through xorg.conf >> (I or so I believe) which is something which uses to work ... > > In order to check this hypothesis, I did the following: > - I downgraded my xorg-x11-server installation to the most recent > official F24 packages, that is, "1.18.4-4.fc24", > - I kept the kernel that I built exactly at the regressive commit > (a325725633c2) > - I modified "01-resolution.conf" (see it above in the context) like this: > > ---- > Section "Device" > Identifier "Default Device" > Driver "modesetting" > BusID "PCI:00:02:0" <------------ new option added > EndSection > ---- > > where BusID matches the B/D/F of the virtio-vga device from "lspci". > > This setup (modulo the kernel of course) was known to work, but now the > X server actually segfaults (apparently in the > xf86PlatformDeviceCheckBusID() function). Please find the logfile attached. > > (NB: this is unrelated to upstream commit de9ce6757c2e -- which the > pristine FC24 build lacks -- because I don't set AutoAddGPU to "off" -- > it is left at its default "on" value.) > Where is this upstream commit again - it shows as unknown for the kernel, xserver and libdrm ? So my theory was a bit off - SetVersion is the one responsible to set the "BusID", as retrieved by drmGetBusID, regardless if drmOpen or open is used. Here's a bit of a brain dump from the other day: - The commit mentioned 'affects' the drmSetBusid/DRM_IOCTL_SET_UNIQUE userspace codepaths. - The latter itself is dri1/legacy (xserver hw/xfree86/dri/) which is not functional for platform devices. The latter of which seems to be the case for virt-gpu based on the kernel module. - The modesetting driver should/cannot reach the above xserver codepath That said, it seems that (at least some) userspace expects a PCI device despite the kernel module 'advertising' itself as platform one :-\ Going through the xserver layers is a bit inspiring I'm wondering if we can not get a strace before/after the xserver commit ca8d88e50310a0d440a127c22a0a383cc149f408 ? It will help us track things a lot quicker/easier. Thanks Emil
On 10/03/16 13:34, Emil Velikov wrote: > On 30 September 2016 at 18:26, Laszlo Ersek <lersek@redhat.com> wrote: >> This setup (modulo the kernel of course) was known to work, but now the >> X server actually segfaults (apparently in the >> xf86PlatformDeviceCheckBusID() function). Please find the logfile attached. >> >> (NB: this is unrelated to upstream commit de9ce6757c2e -- which the >> pristine FC24 build lacks -- because I don't set AutoAddGPU to "off" -- >> it is left at its default "on" value.) >> > Where is this upstream commit again - it shows as unknown for the > kernel, xserver and libdrm ? Apologies, that commit hash belongs to a cherry-picked patch on one of my private branches; the original is ea91db4b8331 config: fix GPUDevice fail when AutoAddGPU off + BusID > So my theory was a bit off - SetVersion is the one responsible to set > the "BusID", as retrieved by drmGetBusID, regardless if drmOpen or > open is used. > > Here's a bit of a brain dump from the other day: > > - The commit mentioned (Here I assume you mean kernel commit a325725633c2 drm: Lobotomize set_busid nonsense for !pci drivers again.) >'affects' the drmSetBusid/DRM_IOCTL_SET_UNIQUE > userspace codepaths. > - The latter itself is dri1/legacy (xserver hw/xfree86/dri/) which is > not functional for platform devices. > The latter of which seems to be the case for virt-gpu based on the > kernel module. > - The modesetting driver should/cannot reach the above xserver codepath > > That said, it seems that (at least some) userspace expects a PCI > device despite the kernel module 'advertising' itself as platform one > :-\ > > Going through the xserver layers is a bit inspiring I'm wondering if > we can not get a strace before/after the xserver commit > ca8d88e50310a0d440a127c22a0a383cc149f408 ? It will help us track > things a lot quicker/easier. This commit: ca8d88e50310 xfree86: recognize primary BUS_PCI device in xf86IsPrimaryPlatform() is not relevant when running Xorg on QEMU's "virtio-vga" device, it is only relevant when the "virtio-gpu-pci" device is used. Aarch64 guests can't have "virtio-vga", only "virgio-gpu-pci", so that commit was needed for Aarch64 guests primarily. In this case however, kernel commit a325725633c2 breaks "virtio-vga" in x86_64 guests, both with and without xserver commit ca8d88e50310. Anyway I'll look into capturing a strace and/or following Xorg with gdb. Thanks! Laszlo
On 3 October 2016 at 13:14, Laszlo Ersek <lersek@redhat.com> wrote: > On 10/03/16 13:34, Emil Velikov wrote: >> On 30 September 2016 at 18:26, Laszlo Ersek <lersek@redhat.com> wrote: > >>> This setup (modulo the kernel of course) was known to work, but now the >>> X server actually segfaults (apparently in the >>> xf86PlatformDeviceCheckBusID() function). Please find the logfile attached. >>> >>> (NB: this is unrelated to upstream commit de9ce6757c2e -- which the >>> pristine FC24 build lacks -- because I don't set AutoAddGPU to "off" -- >>> it is left at its default "on" value.) >>> >> Where is this upstream commit again - it shows as unknown for the >> kernel, xserver and libdrm ? > > Apologies, that commit hash belongs to a cherry-picked patch on one of > my private branches; the original is > > ea91db4b8331 config: fix GPUDevice fail when AutoAddGPU off + BusID > > >> So my theory was a bit off - SetVersion is the one responsible to set >> the "BusID", as retrieved by drmGetBusID, regardless if drmOpen or >> open is used. >> >> Here's a bit of a brain dump from the other day: >> >> - The commit mentioned > > (Here I assume you mean kernel commit > > a325725633c2 drm: Lobotomize set_busid nonsense for !pci drivers > > again.) > Correct. >>'affects' the drmSetBusid/DRM_IOCTL_SET_UNIQUE >> userspace codepaths. >> - The latter itself is dri1/legacy (xserver hw/xfree86/dri/) which is >> not functional for platform devices. >> The latter of which seems to be the case for virt-gpu based on the >> kernel module. >> - The modesetting driver should/cannot reach the above xserver codepath >> >> That said, it seems that (at least some) userspace expects a PCI >> device despite the kernel module 'advertising' itself as platform one >> :-\ >> >> Going through the xserver layers is a bit inspiring I'm wondering if >> we can not get a strace before/after the xserver commit >> ca8d88e50310a0d440a127c22a0a383cc149f408 ? It will help us track >> things a lot quicker/easier. > > This commit: > > ca8d88e50310 xfree86: recognize primary BUS_PCI device in > xf86IsPrimaryPlatform() > > is not relevant when running Xorg on QEMU's "virtio-vga" device, it is > only relevant when the "virtio-gpu-pci" device is used. Aarch64 guests > can't have "virtio-vga", only "virgio-gpu-pci", so that commit was > needed for Aarch64 guests primarily. > > In this case however, kernel commit a325725633c2 breaks "virtio-vga" in > x86_64 guests, both with and without xserver commit ca8d88e50310. > > Anyway I'll look into capturing a strace and/or following Xorg with gdb. > If the overall case/issue is irrelevant of the xserver commit we can ignore if for the time being. In that case can you please get a xserver strace before/after the offending kernel commit. Thank you. Emil
On 10/03/16 14:46, Emil Velikov wrote: > On 3 October 2016 at 13:14, Laszlo Ersek <lersek@redhat.com> wrote: >> On 10/03/16 13:34, Emil Velikov wrote: >>> On 30 September 2016 at 18:26, Laszlo Ersek <lersek@redhat.com> wrote: >> >>>> This setup (modulo the kernel of course) was known to work, but now the >>>> X server actually segfaults (apparently in the >>>> xf86PlatformDeviceCheckBusID() function). Please find the logfile attached. >>>> >>>> (NB: this is unrelated to upstream commit de9ce6757c2e -- which the >>>> pristine FC24 build lacks -- because I don't set AutoAddGPU to "off" -- >>>> it is left at its default "on" value.) >>>> >>> Where is this upstream commit again - it shows as unknown for the >>> kernel, xserver and libdrm ? >> >> Apologies, that commit hash belongs to a cherry-picked patch on one of >> my private branches; the original is >> >> ea91db4b8331 config: fix GPUDevice fail when AutoAddGPU off + BusID >> >> >>> So my theory was a bit off - SetVersion is the one responsible to set >>> the "BusID", as retrieved by drmGetBusID, regardless if drmOpen or >>> open is used. >>> >>> Here's a bit of a brain dump from the other day: >>> >>> - The commit mentioned >> >> (Here I assume you mean kernel commit >> >> a325725633c2 drm: Lobotomize set_busid nonsense for !pci drivers >> >> again.) >> > Correct. > >>> 'affects' the drmSetBusid/DRM_IOCTL_SET_UNIQUE >>> userspace codepaths. >>> - The latter itself is dri1/legacy (xserver hw/xfree86/dri/) which is >>> not functional for platform devices. >>> The latter of which seems to be the case for virt-gpu based on the >>> kernel module. >>> - The modesetting driver should/cannot reach the above xserver codepath >>> >>> That said, it seems that (at least some) userspace expects a PCI >>> device despite the kernel module 'advertising' itself as platform one >>> :-\ >>> >>> Going through the xserver layers is a bit inspiring I'm wondering if >>> we can not get a strace before/after the xserver commit >>> ca8d88e50310a0d440a127c22a0a383cc149f408 ? It will help us track >>> things a lot quicker/easier. >> >> This commit: >> >> ca8d88e50310 xfree86: recognize primary BUS_PCI device in >> xf86IsPrimaryPlatform() >> >> is not relevant when running Xorg on QEMU's "virtio-vga" device, it is >> only relevant when the "virtio-gpu-pci" device is used. Aarch64 guests >> can't have "virtio-vga", only "virgio-gpu-pci", so that commit was >> needed for Aarch64 guests primarily. >> >> In this case however, kernel commit a325725633c2 breaks "virtio-vga" in >> x86_64 guests, both with and without xserver commit ca8d88e50310. >> >> Anyway I'll look into capturing a strace and/or following Xorg with gdb. >> > If the overall case/issue is irrelevant of the xserver commit we can > ignore if for the time being. > In that case can you please get a xserver strace before/after the > offending kernel commit. Please find them at <http://people.redhat.com/~lersek/set_busid/>. Thanks! Laszlo
On 10/03/16 13:34, Emil Velikov wrote: > On 30 September 2016 at 18:26, Laszlo Ersek <lersek@redhat.com> wrote: >> On 09/30/16 18:38, Hans de Goede wrote: >>> Hi, >>> >>> On 30-09-16 17:33, Laszlo Ersek wrote: >>>> On 09/30/16 16:59, Hans de Goede wrote: >>>>> Hi, >>>>> >>>>> On 30-09-16 16:51, Laszlo Ersek wrote: >>>>>> On 09/30/16 12:35, Hans de Goede wrote: >>>>>> >>>>>>> Attached are 2 patches against the xserver which should fix this, >>>>>>> please give them a try. >>>>>> >>>>>> Sorry about the delay. >>>>>> >>>>>> The patches don't seem to fix the issue for me. Please see the Xorg log >>>>>> attached. >>>>>> >>>>>> I tested the patches as follows. Given that my bisection had been done >>>>>> in a Fedora 24 guest, using >>>>>> >>>>>> xorg-x11-server-1.18.4-4.fc24 >>>>>> http://koji.fedoraproject.org/koji/buildinfo?buildID=794494 >>>>>> >>>>>> I now rebuilt the guest kernel exactly at the failing commit (a325725 >>>>>> "drm: Lobotomize set_busid nonsense for !pci drivers"), and first >>>>>> reproduced the issue with the above X server. >>>>>> >>>>>> Then, I ported your patches to "xorg-server-1.18.4" (using the upstream >>>>>> xserver tree), and rebuilt the Fedora package with the backport. For >>>>>> the >>>>>> backport, I had to cherry-pick the following two patches from master >>>>>> first: >>>>>> >>>>>> 1 ca8d88e50310 xfree86: recognize primary BUS_PCI device in >>>>>> xf86IsPrimaryPlatform() >>>>>> 2 ea91db4b8331 config: fix GPUDevice fail when AutoAddGPU off + BusID >>>>>> >>>>>> This way your patches applied cleanly. (Cherry pick #1 above is >>>>>> actually >>>>>> necessary for semantics, while cherry pick #2 is needed for a clean >>>>>> context only, and has no impact for this test.) >>>>>> >>>>>> That is, in total, I added the following four patches to the Fedora 24 >>>>>> package: >>>>>> >>>>>> 1 xfree86: recognize primary BUS_PCI device in xf86IsPrimaryPlatform() >>>>>> 2 config: fix GPUDevice fail when AutoAddGPU off + BusID >>>>>> 3 xfree86: Make adding unclaimed devices as GPU devices a separate step >>>>>> 4 xfree86: Try harder to find atleast 1 non GPU Screen >>>>>> >>>>>> You can find the scratch build that I used for testing here: >>>>>> >>>>>> xorg-x11-server-1.18.4-4.hans_bz1366842_2.fc24 >>>>>> http://koji.fedoraproject.org/koji/taskinfo?taskID=15875087 >>>>>> >>>>>> Another reason I used F24's X server as basis, rather than upstream >>>>>> HEAD, is that Fedora 24 is pretty young, and it's already on kernel >>>>>> 4.7.4, and I believe it will soon move to kernel 4.8, without >>>>>> (necessarily) rebasing its X server package to upstream. IOW the kernel >>>>>> upgrade to 4.8 will break X in Fedora 24 too, and then I expect the >>>>>> Fedora X maintainers would have to cherry pick those two patches as >>>>>> dependencies just the same. >>>>>> >>>>>> To summarize, the patches don't seem to help. I shall nonetheless thank >>>>>> you for spending your Friday on this! >>>>> >>>>> Hmm, do you have a xorg.conf file lying around somewhere, the message >>>>> about the xserver not being able to find an entry for screen 0 does >>>>> not make sense ... >>>> >>>> Good catch, I actually had two files under "/etc/X11/xorg.conf.d/": >>>> >>>> * "00-keyboard.conf", from package "systemd-229-13.fc24.x86_64", with >>>> contents >>>> >>>> ------------ >>>> # Read and parsed by systemd-localed. It's probably wise not to edit >>>> this file >>>> # manually too freely. >>>> Section "InputClass" >>>> Identifier "system-keyboard" >>>> MatchIsKeyboard "on" >>>> Option "XkbLayout" "us" >>>> EndSection >>>> ------------ >>>> >>>> * "01-resolution.conf", which I had created, in order to set the >>>> preferred display resolution: >>>> >>>> ------------ >>>> Section "Screen" >>>> Identifier "Default Screen" >>>> Device "Default Device" >>>> Monitor "Default Monitor" >>>> EndSection >>>> >>>> Section "Device" >>>> Identifier "Default Device" >>>> Driver "modesetting" >>>> EndSection >>>> >>>> Section "Monitor" >>>> Identifier "Default Monitor" >>>> Option "PreferredMode" "640x480" >>>> # Option "PreferredMode" "1440x900" >>>> EndSection >>>> ------------ >>>> >>>> I removed these files now, and repeated the test. Again, the X server >>>> wouldn't start, but I think the log file looks a bit different now. >>>> Attached. >>> >>> Ah, ok so it seems that my initial analysis is wrong, the problem >>> is not a re-occuring of the device getting identified as a GPU screen, >>> libdrm sorta depends on bus-ids and the lack of one is causing the >>> server to misbehave. I guess that even with a xorg.conf things >>> will fail with the troublesome kernel version (might be worth >>> trying). >>> >>> Emil's analysis seems to be spot on. This does not seem easily >>> fixable in userspace / does seem like a real regression as it >>> even breaks things when specifying the device through xorg.conf >>> (I or so I believe) which is something which uses to work ... >> >> In order to check this hypothesis, I did the following: >> - I downgraded my xorg-x11-server installation to the most recent >> official F24 packages, that is, "1.18.4-4.fc24", >> - I kept the kernel that I built exactly at the regressive commit >> (a325725633c2) >> - I modified "01-resolution.conf" (see it above in the context) like this: >> >> ---- >> Section "Device" >> Identifier "Default Device" >> Driver "modesetting" >> BusID "PCI:00:02:0" <------------ new option added >> EndSection >> ---- >> >> where BusID matches the B/D/F of the virtio-vga device from "lspci". >> >> This setup (modulo the kernel of course) was known to work, but now the >> X server actually segfaults (apparently in the >> xf86PlatformDeviceCheckBusID() function). Please find the logfile attached. >> >> (NB: this is unrelated to upstream commit de9ce6757c2e -- which the >> pristine FC24 build lacks -- because I don't set AutoAddGPU to "off" -- >> it is left at its default "on" value.) >> > Where is this upstream commit again - it shows as unknown for the > kernel, xserver and libdrm ? > > So my theory was a bit off - SetVersion is the one responsible to set > the "BusID", as retrieved by drmGetBusID, regardless if drmOpen or > open is used. > > Here's a bit of a brain dump from the other day: > > - The commit mentioned 'affects' the drmSetBusid/DRM_IOCTL_SET_UNIQUE > userspace codepaths. > - The latter itself is dri1/legacy (xserver hw/xfree86/dri/) which is > not functional for platform devices. > The latter of which seems to be the case for virt-gpu based on the > kernel module. > - The modesetting driver should/cannot reach the above xserver codepath > > That said, it seems that (at least some) userspace expects a PCI > device despite the kernel module 'advertising' itself as platform one > :-\ With kernel commit a325725633c2 applied, the drmGetBusid() call in get_drm_info() [hw/xfree86/os-support/linux/lnx_platform.c] returns "virtio0". Without kernel commit a325725633c2 in place, the same function call produces "pci:0000:00:02.0". I think that the idea of kernel commit a325725633c2: We already have a fallback in place to fill out the unique from dev->unique, which is set to something reasonable in drm_dev_alloc. is inappropriate for virtio devices. While it is true that "virtioN" is unique across the guest system, those identifiers are not stable. # ls -1d /sys/devices/pci*/*/virtio* /sys/devices/pci0000:00/0000:00:02.0/virtio0 /sys/devices/pci0000:00/0000:00:03.0/virtio1 /sys/devices/pci0000:00/0000:00:05.0/virtio2 /sys/devices/pci0000:00/0000:00:06.0/virtio3 /sys/devices/pci0000:00/0000:00:09.0/virtio4 If you tweak the PCI addresses of other virtio devices on the QEMU command line, while keeping the PCI address of the virtio-vga device intact, the "virtioN" identifier of the virtio-vga device may change in an unspecified / unexpected manner. From the above listing, it seems like higher PCI Device numbers happen to end up with higher "virtioN" identifiers. While this is unspecified / non-guaranteed behavior, in practice it means the following: Moving the "virtio1" device from PCI address 00:03.0 to 00:01.0, while preserving the PCI address of virtio-vga as 00:02.0, will bump virtio-vga to "virtio1" from "virtio0" (and sink that other device from "virtio1" to "virtio0"). I think that unstable identifiers don't qualify for BusID use, regardless of the actual format of the IDs in question. Searching the patch quickly, drm_virtio_set_busid() is the only implementation of the "drm_driver.set_busid" callback that delegates the task to drm_pci_set_busid(). All other implementations of this callback were provided by drm_platform_set_busid(). drm_platform_set_busid() would ultimately format "dev->platformdev->name" and "dev->platformdev->id" into the bus identifier. Replacing that common logic with the drm_dev_alloc() fallback that is already in place is probably justified: judging from "virtio0", the fallback likely captures the exact same information, just with a different format. However, drm_virtio_set_busid() used to implement a different logic (encoding different information). It intentionally used stable PCI addresses rather than (platformdev->name, platformdev->id). So, I think that removing drm_virtio_set_busid() was an actual regression. I propose that the related hunks of a325725633c2 be reverted. For virtio-vga and virtio-gpu-pci, "virtioN" is an unstable and inappropraite BusID pattern. Thanks Laszlo
On 10/03/16 16:15, Laszlo Ersek wrote: > On 10/03/16 13:34, Emil Velikov wrote: >> On 30 September 2016 at 18:26, Laszlo Ersek <lersek@redhat.com> wrote: >>> On 09/30/16 18:38, Hans de Goede wrote: >>>> Hi, >>>> >>>> On 30-09-16 17:33, Laszlo Ersek wrote: >>>>> On 09/30/16 16:59, Hans de Goede wrote: >>>>>> Hi, >>>>>> >>>>>> On 30-09-16 16:51, Laszlo Ersek wrote: >>>>>>> On 09/30/16 12:35, Hans de Goede wrote: >>>>>>> >>>>>>>> Attached are 2 patches against the xserver which should fix this, >>>>>>>> please give them a try. >>>>>>> >>>>>>> Sorry about the delay. >>>>>>> >>>>>>> The patches don't seem to fix the issue for me. Please see the Xorg log >>>>>>> attached. >>>>>>> >>>>>>> I tested the patches as follows. Given that my bisection had been done >>>>>>> in a Fedora 24 guest, using >>>>>>> >>>>>>> xorg-x11-server-1.18.4-4.fc24 >>>>>>> http://koji.fedoraproject.org/koji/buildinfo?buildID=794494 >>>>>>> >>>>>>> I now rebuilt the guest kernel exactly at the failing commit (a325725 >>>>>>> "drm: Lobotomize set_busid nonsense for !pci drivers"), and first >>>>>>> reproduced the issue with the above X server. >>>>>>> >>>>>>> Then, I ported your patches to "xorg-server-1.18.4" (using the upstream >>>>>>> xserver tree), and rebuilt the Fedora package with the backport. For >>>>>>> the >>>>>>> backport, I had to cherry-pick the following two patches from master >>>>>>> first: >>>>>>> >>>>>>> 1 ca8d88e50310 xfree86: recognize primary BUS_PCI device in >>>>>>> xf86IsPrimaryPlatform() >>>>>>> 2 ea91db4b8331 config: fix GPUDevice fail when AutoAddGPU off + BusID >>>>>>> >>>>>>> This way your patches applied cleanly. (Cherry pick #1 above is >>>>>>> actually >>>>>>> necessary for semantics, while cherry pick #2 is needed for a clean >>>>>>> context only, and has no impact for this test.) >>>>>>> >>>>>>> That is, in total, I added the following four patches to the Fedora 24 >>>>>>> package: >>>>>>> >>>>>>> 1 xfree86: recognize primary BUS_PCI device in xf86IsPrimaryPlatform() >>>>>>> 2 config: fix GPUDevice fail when AutoAddGPU off + BusID >>>>>>> 3 xfree86: Make adding unclaimed devices as GPU devices a separate step >>>>>>> 4 xfree86: Try harder to find atleast 1 non GPU Screen >>>>>>> >>>>>>> You can find the scratch build that I used for testing here: >>>>>>> >>>>>>> xorg-x11-server-1.18.4-4.hans_bz1366842_2.fc24 >>>>>>> http://koji.fedoraproject.org/koji/taskinfo?taskID=15875087 >>>>>>> >>>>>>> Another reason I used F24's X server as basis, rather than upstream >>>>>>> HEAD, is that Fedora 24 is pretty young, and it's already on kernel >>>>>>> 4.7.4, and I believe it will soon move to kernel 4.8, without >>>>>>> (necessarily) rebasing its X server package to upstream. IOW the kernel >>>>>>> upgrade to 4.8 will break X in Fedora 24 too, and then I expect the >>>>>>> Fedora X maintainers would have to cherry pick those two patches as >>>>>>> dependencies just the same. >>>>>>> >>>>>>> To summarize, the patches don't seem to help. I shall nonetheless thank >>>>>>> you for spending your Friday on this! >>>>>> >>>>>> Hmm, do you have a xorg.conf file lying around somewhere, the message >>>>>> about the xserver not being able to find an entry for screen 0 does >>>>>> not make sense ... >>>>> >>>>> Good catch, I actually had two files under "/etc/X11/xorg.conf.d/": >>>>> >>>>> * "00-keyboard.conf", from package "systemd-229-13.fc24.x86_64", with >>>>> contents >>>>> >>>>> ------------ >>>>> # Read and parsed by systemd-localed. It's probably wise not to edit >>>>> this file >>>>> # manually too freely. >>>>> Section "InputClass" >>>>> Identifier "system-keyboard" >>>>> MatchIsKeyboard "on" >>>>> Option "XkbLayout" "us" >>>>> EndSection >>>>> ------------ >>>>> >>>>> * "01-resolution.conf", which I had created, in order to set the >>>>> preferred display resolution: >>>>> >>>>> ------------ >>>>> Section "Screen" >>>>> Identifier "Default Screen" >>>>> Device "Default Device" >>>>> Monitor "Default Monitor" >>>>> EndSection >>>>> >>>>> Section "Device" >>>>> Identifier "Default Device" >>>>> Driver "modesetting" >>>>> EndSection >>>>> >>>>> Section "Monitor" >>>>> Identifier "Default Monitor" >>>>> Option "PreferredMode" "640x480" >>>>> # Option "PreferredMode" "1440x900" >>>>> EndSection >>>>> ------------ >>>>> >>>>> I removed these files now, and repeated the test. Again, the X server >>>>> wouldn't start, but I think the log file looks a bit different now. >>>>> Attached. >>>> >>>> Ah, ok so it seems that my initial analysis is wrong, the problem >>>> is not a re-occuring of the device getting identified as a GPU screen, >>>> libdrm sorta depends on bus-ids and the lack of one is causing the >>>> server to misbehave. I guess that even with a xorg.conf things >>>> will fail with the troublesome kernel version (might be worth >>>> trying). >>>> >>>> Emil's analysis seems to be spot on. This does not seem easily >>>> fixable in userspace / does seem like a real regression as it >>>> even breaks things when specifying the device through xorg.conf >>>> (I or so I believe) which is something which uses to work ... >>> >>> In order to check this hypothesis, I did the following: >>> - I downgraded my xorg-x11-server installation to the most recent >>> official F24 packages, that is, "1.18.4-4.fc24", >>> - I kept the kernel that I built exactly at the regressive commit >>> (a325725633c2) >>> - I modified "01-resolution.conf" (see it above in the context) like this: >>> >>> ---- >>> Section "Device" >>> Identifier "Default Device" >>> Driver "modesetting" >>> BusID "PCI:00:02:0" <------------ new option added >>> EndSection >>> ---- >>> >>> where BusID matches the B/D/F of the virtio-vga device from "lspci". >>> >>> This setup (modulo the kernel of course) was known to work, but now the >>> X server actually segfaults (apparently in the >>> xf86PlatformDeviceCheckBusID() function). Please find the logfile attached. >>> >>> (NB: this is unrelated to upstream commit de9ce6757c2e -- which the >>> pristine FC24 build lacks -- because I don't set AutoAddGPU to "off" -- >>> it is left at its default "on" value.) >>> >> Where is this upstream commit again - it shows as unknown for the >> kernel, xserver and libdrm ? >> >> So my theory was a bit off - SetVersion is the one responsible to set >> the "BusID", as retrieved by drmGetBusID, regardless if drmOpen or >> open is used. >> >> Here's a bit of a brain dump from the other day: >> >> - The commit mentioned 'affects' the drmSetBusid/DRM_IOCTL_SET_UNIQUE >> userspace codepaths. >> - The latter itself is dri1/legacy (xserver hw/xfree86/dri/) which is >> not functional for platform devices. >> The latter of which seems to be the case for virt-gpu based on the >> kernel module. >> - The modesetting driver should/cannot reach the above xserver codepath >> >> That said, it seems that (at least some) userspace expects a PCI >> device despite the kernel module 'advertising' itself as platform one >> :-\ > > With kernel commit a325725633c2 applied, the drmGetBusid() call in > get_drm_info() [hw/xfree86/os-support/linux/lnx_platform.c] returns > "virtio0". > > Without kernel commit a325725633c2 in place, the same function call > produces "pci:0000:00:02.0". > > I think that the idea of kernel commit a325725633c2: > > We already have a fallback in place to fill out the unique from > dev->unique, which is set to something reasonable in drm_dev_alloc. > > is inappropriate for virtio devices. > > While it is true that "virtioN" is unique across the guest system, those > identifiers are not stable. > > # ls -1d /sys/devices/pci*/*/virtio* > /sys/devices/pci0000:00/0000:00:02.0/virtio0 > /sys/devices/pci0000:00/0000:00:03.0/virtio1 > /sys/devices/pci0000:00/0000:00:05.0/virtio2 > /sys/devices/pci0000:00/0000:00:06.0/virtio3 > /sys/devices/pci0000:00/0000:00:09.0/virtio4 > > If you tweak the PCI addresses of other virtio devices on the QEMU > command line, while keeping the PCI address of the virtio-vga device > intact, the "virtioN" identifier of the virtio-vga device may change in > an unspecified / unexpected manner. > > From the above listing, it seems like higher PCI Device numbers happen > to end up with higher "virtioN" identifiers. While this is unspecified / > non-guaranteed behavior, in practice it means the following: > > Moving the "virtio1" device from PCI address 00:03.0 to 00:01.0, while > preserving the PCI address of virtio-vga as 00:02.0, will bump > virtio-vga to "virtio1" from "virtio0" (and sink that other device from > "virtio1" to "virtio0"). > > I think that unstable identifiers don't qualify for BusID use, > regardless of the actual format of the IDs in question. > > Searching the patch quickly, drm_virtio_set_busid() is the only > implementation of the "drm_driver.set_busid" callback that delegates the > task to drm_pci_set_busid(). All other implementations of this callback > were provided by drm_platform_set_busid(). > > drm_platform_set_busid() would ultimately format > "dev->platformdev->name" and "dev->platformdev->id" into the bus > identifier. Replacing that common logic with the drm_dev_alloc() > fallback that is already in place is probably justified: judging from > "virtio0", the fallback likely captures the exact same information, just > with a different format. > > However, drm_virtio_set_busid() used to implement a different logic > (encoding different information). It intentionally used stable PCI > addresses rather than (platformdev->name, platformdev->id). > > So, I think that removing drm_virtio_set_busid() was an actual > regression. I propose that the related hunks of a325725633c2 be > reverted. For virtio-vga and virtio-gpu-pci, "virtioN" is an unstable > and inappropraite BusID pattern. In support of my argument, please see the following remark in commit message: v4: Drop accidental amdgpu hunk (Emil). Now, if you look up the v3 posting for that amdgpu hunk: https://lists.freedesktop.org/archives/dri-devel/2016-June/111200.html https://lists.freedesktop.org/archives/dri-devel/2016-June/111486.html the v3 patch accidentally removed the similarly customized set_busid callback for amdgpu -- drm_pci_set_busid(). Emil caught that error in review, hence the v4 patch wouldn't contain the same error. My argument is exactly the same -- just as the amdgpu hunk shouldn't be in the patch, the drm_virtio_set_busid() hunk shouldn't be there either. Both amdgpu and virtio-gpu implement a set_busid callback -- by delegating to drm_pci_set_busid() -- that *cannot* be replaced with the fallback in drm_dev_alloc(). Please revert the drm_virtio_set_busid() part of kernel commit a325725633c2; at this point I'm 95% sure it was an oversight that didn't get caught in review. The rest of kernel commit a325725633c2 looks sane to me; while it changes the formatting of the busid for the affected platform devices, relying on the fallback in those cases actually preserves the same information content. (This is not true for amdgpu and virtio-gpu.) Thanks Laszlo
On 3 October 2016 at 15:27, Laszlo Ersek <lersek@redhat.com> wrote: [snip] > the v3 patch accidentally removed the similarly customized set_busid > callback for amdgpu -- drm_pci_set_busid(). Emil caught that error in > review, hence the v4 patch wouldn't contain the same error. > You're spot on - virtio-gpu doesn't use drm_*platform*_set_busid, despite that I've misread it as such for over a dozen times. So yes, that hunk should not have been removed. Lacking a git checkout on this system, so if anyone beats me to it and sends a patch that'll be great. Emil
On 10/03/16 17:00, Emil Velikov wrote: > On 3 October 2016 at 15:27, Laszlo Ersek <lersek@redhat.com> wrote: > [snip] >> the v3 patch accidentally removed the similarly customized set_busid >> callback for amdgpu -- drm_pci_set_busid(). Emil caught that error in >> review, hence the v4 patch wouldn't contain the same error. >> > You're spot on - virtio-gpu doesn't use drm_*platform*_set_busid, > despite that I've misread it as such for over a dozen times. So yes, > that hunk should not have been removed. > > Lacking a git checkout on this system, so if anyone beats me to it and > sends a patch that'll be great. I'll send a patch ASAP. Thank you Laszlo
From 3241fcf1636e1b1cb8e0fdd698a2d2e00d1bbcc0 Mon Sep 17 00:00:00 2001 From: Hans de Goede <hdegoede@redhat.com> Date: Fri, 30 Sep 2016 12:29:09 +0200 Subject: [PATCH 2/2] xfree86: Try harder to find atleast 1 non GPU Screen If we did not find any non GPU Screens, try again ignoring the notion of any video devices being the primary device. This fixes Xorg exiting with a "no screens found" error when using virtio-vga in a virtual-machine and when using a device driven by simpledrm. This is a somewhat ugly solution, but it is the best I can come up with without major surgery to the bus and probe code. Signed-off-by: Hans de Goede <hdegoede@redhat.com> --- hw/xfree86/common/xf86.h | 1 + hw/xfree86/common/xf86Bus.c | 26 +++++++++++++++++++++++--- hw/xfree86/common/xf86Globals.c | 1 + hw/xfree86/common/xf86pciBus.c | 4 ++++ hw/xfree86/common/xf86platformBus.c | 4 ++++ 5 files changed, 33 insertions(+), 3 deletions(-) diff --git a/hw/xfree86/common/xf86.h b/hw/xfree86/common/xf86.h index e54c811..f724688 100644 --- a/hw/xfree86/common/xf86.h +++ b/hw/xfree86/common/xf86.h @@ -55,6 +55,7 @@ extern _X_EXPORT int xf86DoConfigure; extern _X_EXPORT int xf86DoShowOptions; extern _X_EXPORT Bool xf86DoConfigurePass1; +extern _X_EXPORT Bool xf86ProbeIgnorePrimary; extern _X_EXPORT Bool xorgHWAccess; extern _X_EXPORT DevPrivateKeyRec xf86ScreenKeyRec; diff --git a/hw/xfree86/common/xf86Bus.c b/hw/xfree86/common/xf86Bus.c index a3a9898..9836803 100644 --- a/hw/xfree86/common/xf86Bus.c +++ b/hw/xfree86/common/xf86Bus.c @@ -117,14 +117,34 @@ xf86BusConfig(void) int i, j; /* - * Now call each of the Probe functions. Each successful probe will - * result in an extra entry added to the xf86Screens[] list for each - * instance of the hardware found. + * 3 step probe to (hopefully) ensure that we always find at least 1 + * (non GPU) screen: + * + * 1. Call each drivers probe function normally, + * Each successful probe will result in an extra entry added to the + * xf86Screens[] list for each instance of the hardware found. */ for (i = 0; i < xf86NumDrivers; i++) { xf86CallDriverProbe(xf86DriverList[i], FALSE); } + /* + * 2. If no Screens were found, call each drivers probe function with + * ignorePrimary = TRUE, to ensure that we do actually get a + * Screen if there is atleast one supported video card. + */ + if (xf86NumScreens == 0) { + xf86ProbeIgnorePrimary = TRUE; + for (i = 0; i < xf86NumDrivers && xf86NumScreens == 0; i++) { + xf86CallDriverProbe(xf86DriverList[i], FALSE); + } + xf86ProbeIgnorePrimary = FALSE; + } + + /* + * 3. Call xf86platformAddGPUDevices() to add any additional video cards as + * GPUScreens (GPUScreens are only supported by platformBus drivers). + */ for (i = 0; i < xf86NumDrivers; i++) { xf86platformAddGPUDevices(xf86DriverList[i]); } diff --git a/hw/xfree86/common/xf86Globals.c b/hw/xfree86/common/xf86Globals.c index 072c3fc..0d1e31b 100644 --- a/hw/xfree86/common/xf86Globals.c +++ b/hw/xfree86/common/xf86Globals.c @@ -153,6 +153,7 @@ XF86ConfigPtr xf86configptr = NULL; Bool xf86Resetting = FALSE; Bool xf86Initialising = FALSE; Bool xf86DoConfigure = FALSE; +Bool xf86ProbeIgnorePrimary = FALSE; Bool xf86DoShowOptions = FALSE; DriverPtr *xf86DriverList = NULL; int xf86NumDrivers = 0; diff --git a/hw/xfree86/common/xf86pciBus.c b/hw/xfree86/common/xf86pciBus.c index 8158c2b..9adfee5 100644 --- a/hw/xfree86/common/xf86pciBus.c +++ b/hw/xfree86/common/xf86pciBus.c @@ -352,6 +352,10 @@ xf86ComparePciBusString(const char *busID, int bus, int device, int func) Bool xf86IsPrimaryPci(struct pci_device *pPci) { + /* Add max. 1 screen for the IgnorePrimary fallback path */ + if (xf86ProbeIgnorePrimary && xf86NumScreens == 0) + return TRUE; + if (primaryBus.type == BUS_PCI) return pPci == primaryBus.id.pci; #ifdef XSERVER_PLATFORM_BUS diff --git a/hw/xfree86/common/xf86platformBus.c b/hw/xfree86/common/xf86platformBus.c index ec2a1dd..fde6796 100644 --- a/hw/xfree86/common/xf86platformBus.c +++ b/hw/xfree86/common/xf86platformBus.c @@ -115,6 +115,10 @@ xf86_find_platform_device_by_devnum(int major, int minor) static Bool xf86IsPrimaryPlatform(struct xf86_platform_device *plat) { + /* Add max. 1 screen for the IgnorePrimary fallback path */ + if (xf86ProbeIgnorePrimary && xf86NumScreens == 0) + return TRUE; + if (primaryBus.type == BUS_PLATFORM) return plat == primaryBus.id.plat; #ifdef XSERVER_LIBPCIACCESS -- 2.9.3