Message ID | 20201218174011.340514-1-s.miroshnichenko@yadro.com (mailing list archive) |
---|---|
Headers | show |
Series | PCI: Allow BAR movement during boot and hotplug | expand |
On Fri, Dec 18, 2020 at 08:39:45PM +0300, Sergei Miroshnichenko wrote: > Currently PCI hotplug works on top of resources which are usually reserved: > by BIOS, bootloader, firmware, or by the kernel (pci=hpmemsize=XM). These > resources are gaps in the address space where BARs of new devices may fit, > and extra bus number per port, so bridges can be hot-added. This series aim > the BARs problem: it shows the kernel how to redistribute them on the run, > so the hotplug becomes predictable and cross-platform. A follow-up patchset > will propose a solution for bus numbers. And another one -- for the powerpc > arch-specific problems. Hi Sergei, It looks like there's a lot of good work here, but I don't really know how to move forward with it. I can certainly see scenarios where this functionality will be useful, but the series currently doesn't mention bug reports that it fixes. I suspect there *are* some related bug reports, e.g., for Thunderbolt hotplug. We should dig them up, include pointers to them, and get the reporters to test the series and provide feedback. We had some review traffic on earlier versions, but as far as I can see, nobody has stepped up with any actual signs of approval, e.g., Tested-by, Reviewed-by, or Acked-by tags. That's a problem because this touches a lot of sensitive code and I don't want to be stuck fixing issues all by myself. When issues crop up, I look to the author and reviewers to help out. Along that line, I'm a little concerned about how we will be able to reproduce and debug problems. Issues will likely depend on the topology, the resources of the specific devices involved, and a specific sequence of hotplug operations. Those may be hard to reproduce even for the reporter. Maybe this is nothing more than a grep pattern to extract relevant events from dmesg. Ideal, but maybe impractical, would be a way to reproduce things in a VM using qemu and a script to simulate hotplugs. > To arrange a space for BARs of new hot-added devices, the kernel can pause > the drivers of working PCI devices and reshuffle the assigned BARs. When a > driver is un-paused by the kernel, it should ioremap() the new addresses of > its BARs. > > Drivers indicate their support of the feature by implementing the new hooks > .bar_fixed(), .rescan_prepare(), .rescan_done() in the struct pci_driver. > If a driver doesn't yet support the feature, BARs of its devices are seen > as immovable and handled in the same way as resources with the PCI_FIXED > flag: they are guaranteed to remain untouched. > > This approached was discussed during LPC 2020 [1]. > > The feature is usable not only for hotplug, but also for booting with a > complete set of BARs assigned, and for Resizable BARs. I'm interested in more details about both of these. What does "a complete set of BARs assigned" mean? On x86, the BIOS usually assigns all the BARs ahead of time, but I know that's not the case for all arches. For Resizable BARs, is the point that this allows more flexibility in resizing BARs because we can now move things out of the way? > Tested on a number of x86_64 machines without any special kernel command > line arguments (some of them -- with older v8 revision of this patchset): > - PC: i7-5930K + ASUS X99-A; > - PC: i5-8500 + ASUS Z370-F; > - PC: AMD Ryzen 7 3700X + ASRock X570 + AMD 5700 XT (Resizable BARs); > - Supermicro Super Server/H11SSL-i: AMD EPYC 7251; > - HP ProLiant DL380 G5: Xeon X5460; > - Dell Inspiron N5010: i5 M 480; > - Dell Precision M6600: i7-2920XM. Does this testing show no change in behavior and no regressions, or does it show that this series fixes cases that previously did not work? If the latter, some bugzilla reports with before/after dmesg logs would give some insight into how this works and also some good justification. > Also tested on a Power8 (Vesnin) and Power9 (Nicole) ppc64le machines, but > with extra patchset, which is to be sent upstream a bit later. > > The set contains: > 01-02: Independent bugfixes, both are not related directly to the movable > BARs feature, but without them the rest of this series will not > work as expected; > > 03-14: Implement the essentials of the feature; > > 15-21: Performance improvements for movable BARs and pciehp; > > 22: Enable the feature by default; > > 23-24: Add movable BARs support to nvme and portdrv; > > 25-26: Debugging helpers. > > This patchset is a part of our work on adding support for hot-adding > chains of chassis full of other bridges, NVME drives, SAS HBAs, GPUs, etc. > without special requirements such as Hot-Plug Controller, reservation of > bus numbers or memory regions by firmware, etc. This is a little bit of a chicken and egg situation. I suspect many people would like this functionality, but currently we either avoid it because it's "known not to work" or we hack around it with the firmware reservations you mention. Bjorn
On Thu, Jan 28, 2021 at 08:53:16AM -0600, Bjorn Helgaas wrote: > On Fri, Dec 18, 2020 at 08:39:45PM +0300, Sergei Miroshnichenko wrote: > > Currently PCI hotplug works on top of resources which are usually reserved: > > by BIOS, bootloader, firmware, or by the kernel (pci=hpmemsize=XM). These > > resources are gaps in the address space where BARs of new devices may fit, > > and extra bus number per port, so bridges can be hot-added. This series aim > > the BARs problem: it shows the kernel how to redistribute them on the run, > > so the hotplug becomes predictable and cross-platform. A follow-up patchset > > will propose a solution for bus numbers. And another one -- for the powerpc > > arch-specific problems. > > I can certainly see scenarios where this functionality will be useful, > but the series currently doesn't mention bug reports that it fixes. I > suspect there *are* some related bug reports, e.g., for Thunderbolt > hotplug. We should dig them up, include pointers to them, and get the > reporters to test the series and provide feedback. In case it helps, an earlier version of the series was referenced in this LWN article more than 2 years ago (scroll down to the "Moving BARs" section at the end of the article): https://lwn.net/Articles/767885/ The article provides some context: Specifically, macOS is capable of resizing and moving BARs, so this series sort of helps us catch up with the competition. With Thunderbolt, this series is particularly useful if (a) PCIe cards are hot-added with large BARs (such as GPUs) and/or (b) the Thunderbolt daisy-chain is very long. Thunderbolt is essentially a cascade of nested hotplug ports, so if more and more devices are added, it's easy to see that the top-level hotplug port's BAR window may run out of space. My understanding is that Sergei's use case doesn't involve Thunderbolt at all but rather hotplugging of GPUs and network cards in PowerPC servers in a datacenter, which may have the same kinds of issues. I intended to review and test this iteration of the series more closely, but haven't been able to carve out the required time. I'm adding some Thunderbolt folks to cc in the hope that they can at least test the series on their development branch. Getting this upstreamed should really be in the best interest of Intel and other promulgators of Thunderbolt. Thanks, Lukas
Hi, On Thu, Jan 28, 2021 at 09:39:29PM +0100, Lukas Wunner wrote: > On Thu, Jan 28, 2021 at 08:53:16AM -0600, Bjorn Helgaas wrote: > > On Fri, Dec 18, 2020 at 08:39:45PM +0300, Sergei Miroshnichenko wrote: > > > Currently PCI hotplug works on top of resources which are usually reserved: > > > by BIOS, bootloader, firmware, or by the kernel (pci=hpmemsize=XM). These > > > resources are gaps in the address space where BARs of new devices may fit, > > > and extra bus number per port, so bridges can be hot-added. This series aim > > > the BARs problem: it shows the kernel how to redistribute them on the run, > > > so the hotplug becomes predictable and cross-platform. A follow-up patchset > > > will propose a solution for bus numbers. And another one -- for the powerpc > > > arch-specific problems. > > > > I can certainly see scenarios where this functionality will be useful, > > but the series currently doesn't mention bug reports that it fixes. I > > suspect there *are* some related bug reports, e.g., for Thunderbolt > > hotplug. We should dig them up, include pointers to them, and get the > > reporters to test the series and provide feedback. > > In case it helps, an earlier version of the series was referenced > in this LWN article more than 2 years ago (scroll down to the > "Moving BARs" section at the end of the article): > > https://lwn.net/Articles/767885/ > > The article provides some context: Specifically, macOS is capable > of resizing and moving BARs, so this series sort of helps us catch > up with the competition. > > With Thunderbolt, this series is particularly useful if > (a) PCIe cards are hot-added with large BARs (such as GPUs) and/or > (b) the Thunderbolt daisy-chain is very long. > > Thunderbolt is essentially a cascade of nested hotplug ports, > so if more and more devices are added, it's easy to see that > the top-level hotplug port's BAR window may run out of space. > > My understanding is that Sergei's use case doesn't involve > Thunderbolt at all but rather hotplugging of GPUs and network > cards in PowerPC servers in a datacenter, which may have the > same kinds of issues. > > I intended to review and test this iteration of the series more > closely, but haven't been able to carve out the required time. > I'm adding some Thunderbolt folks to cc in the hope that they > can at least test the series on their development branch. > Getting this upstreamed should really be in the best interest > of Intel and other promulgators of Thunderbolt. Sure. It seems that this series was submitted in December so probably not applicable to the pci.git/next anymore. Anyways, I can give it a try on a TBT capable system if someone tells me what exactly to test ;-) Probably at least that the existing functionality still works but something else maybe too?
Hi Bjorn, On Thu, 2021-01-28 at 08:53 -0600, Bjorn Helgaas wrote: > I can certainly see scenarios where this functionality will be > useful, > but the series currently doesn't mention bug reports that it > fixes. I > suspect there *are* some related bug reports, e.g., for Thunderbolt > hotplug. We should dig them up, include pointers to them, and get > the > reporters to test the series and provide feedback. It never occurred to me to actually search for reports, thanks. > We had some review traffic on earlier versions, but as far as I can > see, nobody has stepped up with any actual signs of approval, e.g., > Tested-by, Reviewed-by, or Acked-by tags. That's a problem because > this touches a lot of sensitive code and I don't want to be stuck > fixing issues all by myself. When issues crop up, I look to the > author and reviewers to help out. Indeed, I had only a few informal personal responses [from outside of our company]: - some machines have no regressions with the kernel built with Clang- 12+LTO+IAS; - NVMe hotplug started working for AMD Epyc with a PEX switch; - Another system with several PEX switches is under ongoing experiments. Providing an ability to quickly return to the old BARs allocating is the reason why I keep wrapping the most intrusive code with "if (pci_can_move_bars)". Originally I wanted it to be disabled by default, available for those truly desperate. But there was an objection that the code will not be ever tested this way. > Along that line, I'm a little concerned about how we will be able to > reproduce and debug problems. Issues will likely depend on the > topology, the resources of the specific devices involved, and a > specific sequence of hotplug operations. Those may be hard to > reproduce even for the reporter. Maybe this is nothing more than a > grep pattern to extract relevant events from dmesg. Ideal, but maybe > impractical, would be a way to reproduce things in a VM using qemu > and > a script to simulate hotplugs. I haven't yet tried the qemu for PCI debugging, definitely need to take a look. To help with that, the patchset is supplied with additional prints, but CONFIG_PCI_DEBUG=y is still preferred. A simulation will reveal if additional debug messages are needed. > > The feature is usable not only for hotplug, but also for booting > > with a > > complete set of BARs assigned, and for Resizable BARs. > > I'm interested in more details about both of these. What does "a > complete set of BARs assigned" mean? On x86, the BIOS usually > assigns > all the BARs ahead of time, but I know that's not the case for all > arches. Usually yes, but we have at least one x86 PC that skips some BARs (need to check out again its particular model and version, IIRC that was Z370-F) -- not the crucial ones like BAR0, though. That really got me by surprise, so now it is one more case covered. > For Resizable BARs, is the point that this allows more flexibility in > resizing BARs because we can now move things out of the way? Not only things out of the way, but most importantly the bridge windows are now floating as well. So it's more freedom for a new BAR to "choose" a place. An awfully synthetic example: | RC Address Space | | orig GPU BAR | fixed BAR | | | | fixed BAR | Resized GPU BAR | | > > Tested on a number of x86_64 machines without any special kernel > > command > > line arguments (some of them -- with older v8 revision of this > > patchset): > > - PC: i7-5930K + ASUS X99-A; > > - PC: i5-8500 + ASUS Z370-F; > > - PC: AMD Ryzen 7 3700X + ASRock X570 + AMD 5700 XT (Resizable > > BARs); > > - Supermicro Super Server/H11SSL-i: AMD EPYC 7251; > > - HP ProLiant DL380 G5: Xeon X5460; > > - Dell Inspiron N5010: i5 M 480; > > - Dell Precision M6600: i7-2920XM. > > Does this testing show no change in behavior and no regressions, or > does it show that this series fixes cases that previously did not > work? If the latter, some bugzilla reports with before/after dmesg > logs would give some insight into how this works and also some good > justification. Both, actually, but the dmesg logs were never published, so they will be the next step -- combined with qemu scripts, if it works. > > This patchset is a part of our work on adding support for hot- > > adding > > chains of chassis full of other bridges, NVME drives, SAS HBAs, > > GPUs, etc. > > without special requirements such as Hot-Plug Controller, > > reservation of > > bus numbers or memory regions by firmware, etc. > > This is a little bit of a chicken and egg situation. I suspect many > people would like this functionality, but currently we either avoid > it > because it's "known not to work" or we hack around it with the > firmware reservations you mention. The current (v9) patchset became more symbiotic with other hotplug facilities: it started to respect pci=hpmemsize etc, and only drops it if such allocation is impossible. Attention Button for Assisted Hotplug (pciehp driver) was always supported. It is also fully compatible with DPC. The goal is to squeeze as much as possible even without special hw+fw+sw support, when the only available tool is a manual rescan via sysfs. Serge
On Thu, 2021-01-28 at 21:39 +0100, Lukas Wunner wrote: > With Thunderbolt, this series is particularly useful if > (a) PCIe cards are hot-added with large BARs (such as GPUs) and/or > (b) the Thunderbolt daisy-chain is very long. > > Thunderbolt is essentially a cascade of nested hotplug ports, > so if more and more devices are added, it's easy to see that > the top-level hotplug port's BAR window may run out of space. > > My understanding is that Sergei's use case doesn't involve > Thunderbolt at all but rather hotplugging of GPUs and network > cards in PowerPC servers in a datacenter, which may have the > same kinds of issues. Hi Lukas, I have yet to find some Thunderbolt hardware and try it. Actually, we are hotplugging not only endpoints, but nested PCIe switches as well: those are PCIe JBOD chassis (with NVMes and SAS drives). But to deal with them we have to use an additional patchset "Movable bus numbers", that was also described during LPC2020 (including its problematic points), here is its RFC, it haven't changed much since then: https://lore.kernel.org/linux-pci/20191024172157.878735-5-s.miroshnichenko@yadro.com/T/ Can Thunderbolt have a chain of switches deep enough to have problems with bus numbers, when reserving is not the best option? Serge > I intended to review and test this iteration of the series more > closely, but haven't been able to carve out the required time. > I'm adding some Thunderbolt folks to cc in the hope that they > can at least test the series on their development branch. > Getting this upstreamed should really be in the best interest > of Intel and other promulgators of Thunderbolt. > > Thanks, > > Lukas
Hi Mika, On Mon, 2021-02-01 at 14:55 +0200, Mika Westerberg wrote: > On Thu, Jan 28, 2021 at 09:39:29PM +0100, Lukas Wunner wrote: > > On Thu, Jan 28, 2021 at 08:53:16AM -0600, Bjorn Helgaas wrote: > > > On Fri, Dec 18, 2020 at 08:39:45PM +0300, Sergei Miroshnichenko > > > wrote: > > > > ... > > > > I intended to review and test this iteration of the series more > > closely, but haven't been able to carve out the required time. > > I'm adding some Thunderbolt folks to cc in the hope that they > > can at least test the series on their development branch. > > Getting this upstreamed should really be in the best interest > > of Intel and other promulgators of Thunderbolt. > > Sure. It seems that this series was submitted in December so probably > not applicable to the pci.git/next anymore. Anyways, I can give it a > try > on a TBT capable system if someone tells me what exactly to test ;-) > Probably at least that the existing functionality still works but > something else maybe too? For setups that worked fine, the only expected change is a possible little different BAR layout (in /proc/iomem), and there should the same quantity (or more) of BARs assigned than before. But if there are any problematic setups, which weren't able to arrange new BARs, this patchset may push a bit further. In a few days I'll provide an updated branch for our mirror of the kernel on Github, with a complete and bumped set of patches, reducing the steps required to test them. Thanks! Serge
Dear, On 2/3/21 8:59 PM, Sergei Miroshnichenko wrote: > Hi Bjorn, > > On Thu, 2021-01-28 at 08:53 -0600, Bjorn Helgaas wrote: >> I can certainly see scenarios where this functionality will be >> useful, >> but the series currently doesn't mention bug reports that it >> fixes. I >> suspect there *are* some related bug reports, e.g., for Thunderbolt >> hotplug. We should dig them up, include pointers to them, and get >> the >> reporters to test the series and provide feedback. > > It never occurred to me to actually search for reports, thanks. > >> We had some review traffic on earlier versions, but as far as I can >> see, nobody has stepped up with any actual signs of approval, e.g., >> Tested-by, Reviewed-by, or Acked-by tags. That's a problem because >> this touches a lot of sensitive code and I don't want to be stuck >> fixing issues all by myself. When issues crop up, I look to the >> author and reviewers to help out. > > Indeed, I had only a few informal personal responses [from outside of > our company]: > - some machines have no regressions with the kernel built with Clang- > 12+LTO+IAS; > - NVMe hotplug started working for AMD Epyc with a PEX switch; > - Another system with several PEX switches is under ongoing > experiments. > I reached out to Sergei a while ago about these patches on systems I work with. They involve a PEX switch on a separate card (MCH). FWIW, I find this patchset really interesting and useful in the modular systems such as microTCA [1]. In those, up to 12 PCIe hotswappable mezzanine cards (AMCs) can be added/removed at anytime. Each can be a PCIe endpoint. The infrastructure usually involves a CPU card with a PCIe root complex. In addition, PCIe switch is located on the carrier hub (MCH) card which handles the PCIe links for AMCs. At this time, if the system is booted without a certain slot being populated at CPU/Linux startup adding a card requires system power cycle. Today there are already (minor?) issues with the hotplug/hotswap of the cards that are present at Linux bootup hence folks that use these systems usually do not rely on any these features; as a result boxes are power cycled. I'm not saying it is all Linux;s fault, as the microTCA is inherently complex due to modularity and versatility of configurations one can have, even without hotplugging. It is also a matter of understanding how to configure and use the hotplugging that can be a challenge on its own. Some preliminary tests of this patchset, I managed to run so far, suggest that newly added cards are nicely being recognized by the Linux and are ready for use without CPU reboot. I currently do not have access to the microTCA boxes I use. I hope to be in a better place in couple of weeks and look into these patches again! Thanks! //hinko [1] https://www.picmg.org/wp-content/uploads/MicroTCA_Short_Form_Sept_2006.pdf > Providing an ability to quickly return to the old BARs allocating is > the reason why I keep wrapping the most intrusive code with "if > (pci_can_move_bars)". Originally I wanted it to be disabled by default, > available for those truly desperate. But there was an objection that > the code will not be ever tested this way. > >> Along that line, I'm a little concerned about how we will be able to >> reproduce and debug problems. Issues will likely depend on the >> topology, the resources of the specific devices involved, and a >> specific sequence of hotplug operations. Those may be hard to >> reproduce even for the reporter. Maybe this is nothing more than a >> grep pattern to extract relevant events from dmesg. Ideal, but maybe >> impractical, would be a way to reproduce things in a VM using qemu >> and >> a script to simulate hotplugs. > > I haven't yet tried the qemu for PCI debugging, definitely need to take > a look. To help with that, the patchset is supplied with additional > prints, but CONFIG_PCI_DEBUG=y is still preferred. A simulation will > reveal if additional debug messages are needed. > >>> The feature is usable not only for hotplug, but also for booting >>> with a >>> complete set of BARs assigned, and for Resizable BARs. >> >> I'm interested in more details about both of these. What does "a >> complete set of BARs assigned" mean? On x86, the BIOS usually >> assigns >> all the BARs ahead of time, but I know that's not the case for all >> arches. > > Usually yes, but we have at least one x86 PC that skips some BARs (need > to check out again its particular model and version, IIRC that was > Z370-F) -- not the crucial ones like BAR0, though. That really got me > by surprise, so now it is one more case covered. > >> For Resizable BARs, is the point that this allows more flexibility in >> resizing BARs because we can now move things out of the way? > > Not only things out of the way, but most importantly the bridge windows > are now floating as well. So it's more freedom for a new BAR to > "choose" a place. > > An awfully synthetic example: > > | RC Address Space | > | orig GPU BAR | fixed BAR | | > | | fixed BAR | Resized GPU BAR | | > >>> Tested on a number of x86_64 machines without any special kernel >>> command >>> line arguments (some of them -- with older v8 revision of this >>> patchset): >>> - PC: i7-5930K + ASUS X99-A; >>> - PC: i5-8500 + ASUS Z370-F; >>> - PC: AMD Ryzen 7 3700X + ASRock X570 + AMD 5700 XT (Resizable >>> BARs); >>> - Supermicro Super Server/H11SSL-i: AMD EPYC 7251; >>> - HP ProLiant DL380 G5: Xeon X5460; >>> - Dell Inspiron N5010: i5 M 480; >>> - Dell Precision M6600: i7-2920XM. >> >> Does this testing show no change in behavior and no regressions, or >> does it show that this series fixes cases that previously did not >> work? If the latter, some bugzilla reports with before/after dmesg >> logs would give some insight into how this works and also some good >> justification. > > Both, actually, but the dmesg logs were never published, so they will > be the next step -- combined with qemu scripts, if it works. > >>> This patchset is a part of our work on adding support for hot- >>> adding >>> chains of chassis full of other bridges, NVME drives, SAS HBAs, >>> GPUs, etc. >>> without special requirements such as Hot-Plug Controller, >>> reservation of >>> bus numbers or memory regions by firmware, etc. >> >> This is a little bit of a chicken and egg situation. I suspect many >> people would like this functionality, but currently we either avoid >> it >> because it's "known not to work" or we hack around it with the >> firmware reservations you mention. > > The current (v9) patchset became more symbiotic with other hotplug > facilities: it started to respect pci=hpmemsize etc, and only drops it > if such allocation is impossible. > > Attention Button for Assisted Hotplug (pciehp driver) was always > supported. It is also fully compatible with DPC. > > The goal is to squeeze as much as possible even without special > hw+fw+sw support, when the only available tool is a manual rescan via > sysfs. > > Serge >
> Actually, we are hotplugging not only endpoints, but nested PCIe > switches as well: those are PCIe JBOD chassis (with NVMes and SAS > drives). From what I remember of playing with some PCI hot-plug hardware and cardbus extenders to PCI chassis many years ago bus numbers were actually a big problem. A surprising number of io cards contain a bridge - so need a bus number if hot-plugged. (In spite of the marketing hype hot-plug basically let you remove a working card and replace it with an identical one! Modern drivers and OS are likely to handle the errors from faulty cards better.) The initial allocation of bus-numbers could easily spread out the unused bus numbers. Doing that for memory resources may have other problems (You probably don't want to allocate the entire range to the root hub?) Are the bus numbers exposed as keys/filename in /sys ? In which case changing them after boot is problematic. You'd need a map of virtual bus numbers to physical ones. As well as your 'suspend/resume' sequence, it should be possible to send a card-remove/insert sequence to an idle driver. There is another case were BARs might need moving. The PCIe spec doesn't allow very long (200ms?) from reset removal (which might be close to power on) and the requirement for endpoints to answer config cycles. If you have to load a large FPGA from a serial EEPROM this is actually a real problem. So if the OS does a full probe of the PCI devices it may find endpoints (or even bridges) that weren't actually there when the BIOS (or equivalent) did its earlier probe. Finding space in the middle of the pci devices for an endpoint that wants two 1MB BARs is unlikely to suceed! David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Wed, Feb 03, 2021 at 08:17:14PM +0000, Sergei Miroshnichenko wrote: > Hi Mika, > > On Mon, 2021-02-01 at 14:55 +0200, Mika Westerberg wrote: > > On Thu, Jan 28, 2021 at 09:39:29PM +0100, Lukas Wunner wrote: > > > On Thu, Jan 28, 2021 at 08:53:16AM -0600, Bjorn Helgaas wrote: > > > > On Fri, Dec 18, 2020 at 08:39:45PM +0300, Sergei Miroshnichenko > > > > wrote: > > > > > ... > > > > > > I intended to review and test this iteration of the series more > > > closely, but haven't been able to carve out the required time. > > > I'm adding some Thunderbolt folks to cc in the hope that they > > > can at least test the series on their development branch. > > > Getting this upstreamed should really be in the best interest > > > of Intel and other promulgators of Thunderbolt. > > > > Sure. It seems that this series was submitted in December so probably > > not applicable to the pci.git/next anymore. Anyways, I can give it a > > try > > on a TBT capable system if someone tells me what exactly to test ;-) > > Probably at least that the existing functionality still works but > > something else maybe too? > > For setups that worked fine, the only expected change is a possible > little different BAR layout (in /proc/iomem), and there should the same > quantity (or more) of BARs assigned than before. > > But if there are any problematic setups, which weren't able to arrange > new BARs, this patchset may push a bit further. Got it. > In a few days I'll provide an updated branch for our mirror of the > kernel on Github, with a complete and bumped set of patches, reducing > the steps required to test them. Sounds good, thanks!
On Thu, 2021-02-04 at 12:49 +0200, Mika Westerberg wrote: > On Wed, Feb 03, 2021 at 08:17:14PM +0000, Sergei Miroshnichenko > wrote: > > On Mon, 2021-02-01 at 14:55 +0200, Mika Westerberg wrote: > > > On Thu, Jan 28, 2021 at 09:39:29PM +0100, Lukas Wunner wrote: > > > > On Thu, Jan 28, 2021 at 08:53:16AM -0600, Bjorn Helgaas wrote: > > > > > On Fri, Dec 18, 2020 at 08:39:45PM +0300, Sergei > > > > > Miroshnichenko > > > > > wrote: > > > > > > ... > > > > > > > > I intended to review and test this iteration of the series more > > > > closely, but haven't been able to carve out the required time. > > > > I'm adding some Thunderbolt folks to cc in the hope that they > > > > can at least test the series on their development branch. > > > > Getting this upstreamed should really be in the best interest > > > > of Intel and other promulgators of Thunderbolt. > > > > > > Sure. It seems that this series was submitted in December so > > > probably > > > not applicable to the pci.git/next anymore. Anyways, I can give > > > it a > > > try > > > on a TBT capable system if someone tells me what exactly to test > > > ;-) > > > Probably at least that the existing functionality still works but > > > something else maybe too? > > > > For setups that worked fine, the only expected change is a possible > > little different BAR layout (in /proc/iomem), and there should the > > same > > quantity (or more) of BARs assigned than before. > > > > But if there are any problematic setups, which weren't able to > > arrange > > new BARs, this patchset may push a bit further. > > Got it. > > > In a few days I'll provide an updated branch for our mirror of the > > kernel on Github, with a complete and bumped set of patches, > > reducing > > the steps required to test them. > > Sounds good, thanks! Hi Mika, The branch is finally ready, so if you still have time for that, please take a look: https://github.com/YADRO-KNS/linux/tree/yadro/pcie_hotplug/movable_bars_v9.1 And a bit off topic: I've also updated the branch with a complete set of our hotplug-related patches, including an RFC series of movable bus numbers for hot-adding large and/or nested switches (at the cost of breaking the sysfs+procsf ABI). In case one is ever going to try that, the pci=realloc,movable_buses kernel command line arguments are needed: https://github.com/YADRO-KNS/linux/commits/yadro/pcie_hotplug/movable_buses_v9.1 Thanks! Serge
On Wed, Feb 10, 2021 at 07:40:06PM +0000, Sergei Miroshnichenko wrote: > The branch is finally ready, so if you still have time for that, please > take a look: > > https://github.com/YADRO-KNS/linux/tree/yadro/pcie_hotplug/movable_bars_v9.1 Just a quick drive-by comment, in commit scsi: mpt3sas: Handle a surprise PCI unplug faster you probably want to call pci_dev_is_disconnected() instead of adding a new flag. Thanks, Lukas
Hi, On Wed, Feb 10, 2021 at 07:40:06PM +0000, Sergei Miroshnichenko wrote: > On Thu, 2021-02-04 at 12:49 +0200, Mika Westerberg > wrote: > > On Wed, Feb 03, 2021 at 08:17:14PM +0000, Sergei Miroshnichenko > > wrote: > > > On Mon, 2021-02-01 at 14:55 +0200, Mika Westerberg wrote: > > > > On Thu, Jan 28, 2021 at 09:39:29PM +0100, Lukas Wunner wrote: > > > > > On Thu, Jan 28, 2021 at 08:53:16AM -0600, Bjorn Helgaas wrote: > > > > > > On Fri, Dec 18, 2020 at 08:39:45PM +0300, Sergei > > > > > > Miroshnichenko > > > > > > wrote: > > > > > > > ... > > > > > > > > > > I intended to review and test this iteration of the series more > > > > > closely, but haven't been able to carve out the required time. > > > > > I'm adding some Thunderbolt folks to cc in the hope that they > > > > > can at least test the series on their development branch. > > > > > Getting this upstreamed should really be in the best interest > > > > > of Intel and other promulgators of Thunderbolt. > > > > > > > > Sure. It seems that this series was submitted in December so > > > > probably > > > > not applicable to the pci.git/next anymore. Anyways, I can give > > > > it a > > > > try > > > > on a TBT capable system if someone tells me what exactly to test > > > > ;-) > > > > Probably at least that the existing functionality still works but > > > > something else maybe too? > > > > > > For setups that worked fine, the only expected change is a possible > > > little different BAR layout (in /proc/iomem), and there should the > > > same > > > quantity (or more) of BARs assigned than before. > > > > > > But if there are any problematic setups, which weren't able to > > > arrange > > > new BARs, this patchset may push a bit further. > > > > Got it. > > > > > In a few days I'll provide an updated branch for our mirror of the > > > kernel on Github, with a complete and bumped set of patches, > > > reducing > > > the steps required to test them. > > > > Sounds good, thanks! > > Hi Mika, > > The branch is finally ready, so if you still have time for that, please > take a look: > > https://github.com/YADRO-KNS/linux/tree/yadro/pcie_hotplug/movable_bars_v9.1 Thanks for sharing! I tried this series on top of v5.11-rc7 on a Dell XPS 9380 so that I have two TBT3 devices connected. Each device includes PCIe switch and a xHCI endpoint. What I see that the hotplug downstream port does not have enough bus numbers (and other resources allocated) so when attaching the second device it does not fit there anymore. The resulting 'lspci -t' output looks like below: -[0000:00]-+-00.0 +-02.0 +-04.0 +-08.0 +-12.0 +-14.0 +-14.2 +-15.0 +-15.1 +-16.0 +-1c.0-[01]----00.0 +-1c.6-[02]----00.0 +-1d.0-[03-3b]----00.0-[04-3b]--+-00.0-[05]----00.0 | +-01.0-[06-1f]----00.0-[07-09]--+-02.0-[08]----00.0 | | \-04.0-[09]----00.0-[0a]-- | +-02.0-[20]----00.0 | \-04.0-[21-3b]-- +-1d.4-[3c]----00.0 +-1f.0 +-1f.3 +-1f.4 \-1f.5 So the last PCIE switch is not visible anymore, and the xHCI on the second TBT device is not functional either. On the mainline kernel I get this: -[0000:00]-+-00.0 +-02.0 +-04.0 +-08.0 +-12.0 +-14.0 +-14.2 +-15.0 +-15.1 +-16.0 +-1c.0-[01]----00.0 +-1c.6-[02]----00.0 +-1d.0-[03-3b]----00.0-[04-3b]--+-00.0-[05]----00.0 | +-01.0-[06-1f]----00.0-[07-1f]--+-02.0-[08]----00.0 | | \-04.0-[09-1f]----00.0-[0a-1f]--+-02.0-[0b]----00.0 | | \-04.0-[0c-1f]-- | +-02.0-[20]----00.0 | \-04.0-[21-3b]-- +-1d.4-[3c]----00.0 +-1f.0 +-1f.3 +-1f.4 \-1f.5 In this topology I can add yet another TBT device and there are still resources available and all the endpoints are functional too. I can send you the full dmesg and lspci -vv output if needed.
On Thu, 2021-02-11 at 13:39 +0200, mika.westerberg@linux.intel.com wrote: > On Wed, Feb 10, 2021 at 07:40:06PM +0000, Sergei Miroshnichenko > wrote: > > On Thu, 2021-02-04 at 12:49 +0200, Mika Westerberg > > wrote: > > > ... > > The branch is finally ready, so if you still have time for that, > > please > > take a look: > > > > https://github.com/YADRO-KNS/linux/tree/yadro/pcie_hotplug/movable_bars_v9.1 > > Thanks for sharing! > > I tried this series on top of v5.11-rc7 on a Dell XPS 9380 so that I > have two TBT3 devices connected. Each device includes PCIe switch and > a > xHCI endpoint. > > What I see that the hotplug downstream port does not have enough bus > numbers (and other resources allocated) so when attaching the second > device it does not fit there anymore. The resulting 'lspci -t' output > looks like below: > > -[0000:00]-+-00.0 > +-02.0 > +-04.0 > +-08.0 > +-12.0 > +-14.0 > +-14.2 > +-15.0 > +-15.1 > +-16.0 > +-1c.0-[01]----00.0 > +-1c.6-[02]----00.0 > +-1d.0-[03-3b]----00.0-[04-3b]--+-00.0-[05]----00.0 > | +-01.0-[06-1f]----00.0- > [07-09]--+-02.0-[08]----00.0 > | | > \-04.0-[09]----00.0-[0a]-- > | +-02.0-[20]----00.0 > | \-04.0-[21-3b]-- > +-1d.4-[3c]----00.0 > +-1f.0 > +-1f.3 > +-1f.4 > \-1f.5 > > So the last PCIE switch is not visible anymore, and the xHCI on the > second TBT device is not functional either. > > On the mainline kernel I get this: > > -[0000:00]-+-00.0 > +-02.0 > +-04.0 > +-08.0 > +-12.0 > +-14.0 > +-14.2 > +-15.0 > +-15.1 > +-16.0 > +-1c.0-[01]----00.0 > +-1c.6-[02]----00.0 > +-1d.0-[03-3b]----00.0-[04-3b]--+-00.0-[05]----00.0 > | +-01.0-[06-1f]----00.0- > [07-1f]--+-02.0-[08]----00.0 > | | > \-04.0-[09-1f]----00.0-[0a-1f]--+-02.0-[0b]----00.0 > | | > \-04.0-[0c-1f]-- > | +-02.0-[20]----00.0 > | \-04.0-[21-3b]-- > +-1d.4-[3c]----00.0 > +-1f.0 > +-1f.3 > +-1f.4 > \-1f.5 > > In this topology I can add yet another TBT device and there are still > resources available and all the endpoints are functional too. > > I can send you the full dmesg and lspci -vv output if needed. What a pity. Yes, please, I would of course like to take a look why that happened, and compare, what went wrong (before and after the hotplug: lspci -tv, dmesg -t and /proc/iomem with /proc/ioports, if it wouldn't be too much trouble). The first patchset wasn't intended to change the bus numbers behavior, so I also have to check that locally. And despite of lack of bus numbers, new endpoints still should be visible, this is for me to check as well. Thanks a lot for testing! Serge
Hi, On Thu, Feb 11, 2021 at 05:45:20PM +0000, Sergei Miroshnichenko wrote: > What a pity. Yes, please, I would of course like to take a look why > that happened, and compare, what went wrong (before and after the > hotplug: lspci -tv, dmesg -t and /proc/iomem with /proc/ioports, if it > wouldn't be too much trouble). I just sent these logs to you in a separate email. Let me know if you need more. Thanks!
On Fri, 2021-02-12 at 14:52 +0200, mika.westerberg@linux.intel.com wrote: > Hi, > > On Thu, Feb 11, 2021 at 05:45:20PM +0000, Sergei Miroshnichenko > wrote: > > What a pity. Yes, please, I would of course like to take a look why > > that happened, and compare, what went wrong (before and after the > > hotplug: lspci -tv, dmesg -t and /proc/iomem with /proc/ioports, if > > it > > wouldn't be too much trouble). > > I just sent these logs to you in a separate email. Let me know if you > need more. Thanks, from them it's clear that the "full rescan" apprach currently doesn't involve the pci_bus_distribute_available_resources(), that's why hot-adding a second nested switch breaks: because of non- distributed free bus numbers. The first switch seems was hot-added just fine, with BARs being moved a bit. This is to be fixed in v10, along with the mpt3sas+pci_dev_is_disconnected() moment Lukas had found (thanks Lukas!), CONFIG_DEBUG_LOCK_ALLOC macro, and a more useful debug messages. Serge
On Fri, Feb 12, 2021 at 08:54:18PM +0000, Sergei Miroshnichenko wrote: > On Fri, 2021-02-12 at 14:52 +0200, mika.westerberg@linux.intel.com > wrote: > > Hi, > > > > On Thu, Feb 11, 2021 at 05:45:20PM +0000, Sergei Miroshnichenko > > wrote: > > > What a pity. Yes, please, I would of course like to take a look why > > > that happened, and compare, what went wrong (before and after the > > > hotplug: lspci -tv, dmesg -t and /proc/iomem with /proc/ioports, if > > > it > > > wouldn't be too much trouble). > > > > I just sent these logs to you in a separate email. Let me know if you > > need more. > > Thanks, from them it's clear that the "full rescan" apprach currently > doesn't involve the pci_bus_distribute_available_resources(), that's > why hot-adding a second nested switch breaks: because of non- > distributed free bus numbers. The first switch seems was hot-added just > fine, with BARs being moved a bit. > > This is to be fixed in v10, along with the > mpt3sas+pci_dev_is_disconnected() moment Lukas had found (thanks > Lukas!), CONFIG_DEBUG_LOCK_ALLOC macro, and a more useful debug > messages. Great, thanks! :)