Message ID | 20200801112446.149549-1-refactormyself@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Drop uses of pci_read_config_*() return value | expand |
On Sat, Aug 01, 2020 at 01:24:29PM +0200, Saheed O. Bolarinwa wrote: > The return value of pci_read_config_*() may not indicate a device error. > However, the value read by these functions is more likely to indicate > this kind of error. This presents two overlapping ways of reporting > errors and complicates error checking. So why isn't the *value check done in the pci_read_config_* functions instead of touching gazillion callers? For example, pci_conf{1,2}_read() could check whether the u32 *value it just read depending on the access method, whether that value is ~0 and return proper PCIBIOS_ error in that case. The check you're replicating if (val32 == (u32)~0) everywhere, instead, is just ugly and tests a naked value ~0 which doesn't mean anything...
On 8/1/20 5:56 AM, Borislav Petkov wrote: > On Sat, Aug 01, 2020 at 01:24:29PM +0200, Saheed O. Bolarinwa wrote: >> The return value of pci_read_config_*() may not indicate a device error. >> However, the value read by these functions is more likely to indicate >> this kind of error. This presents two overlapping ways of reporting >> errors and complicates error checking. > So why isn't the *value check done in the pci_read_config_* functions > instead of touching gazillion callers? > > For example, pci_conf{1,2}_read() could check whether the u32 *value it > just read depending on the access method, whether that value is ~0 and > return proper PCIBIOS_ error in that case. > > The check you're replicating > > if (val32 == (u32)~0) > > everywhere, instead, is just ugly and tests a naked value ~0 which > doesn't mean anything... > I agree, if there is a change, it should be in the pci_read_* functions. Anything returning void should not fail and likely future users of the proposed change will not do the extra checks. Tom
On 8/1/20 2:56 PM, Borislav Petkov wrote: > On Sat, Aug 01, 2020 at 01:24:29PM +0200, Saheed O. Bolarinwa wrote: >> The return value of pci_read_config_*() may not indicate a device error. >> However, the value read by these functions is more likely to indicate >> this kind of error. This presents two overlapping ways of reporting >> errors and complicates error checking. > So why isn't the *value check done in the pci_read_config_* functions > instead of touching gazillion callers? Because the value ~0 has a meaning to some drivers and only drivers have this knowledge. For those cases more checks will be needed to ensure that it is an error that has actually happened. > For example, pci_conf{1,2}_read() could check whether the u32 *value it > just read depending on the access method, whether that value is ~0 and > return proper PCIBIOS_ error in that case. The primary goal is to make pci_config_read*() return void, so that there is *only* one way to check for error i.e. through the obtained value. Again, only the drivers can determine if ~0 is a valid value. This information is not available inside pci_config_read*(). - Saheed
On Sun, Aug 02, 2020 at 07:28:00PM +0200, Saheed Bolarinwa wrote: > Because the value ~0 has a meaning to some drivers and only No, ~0 means that the PCI read failed. For *every* PCI device I know. Here's me reading from 0xf0 offset of my hostbridge: # setpci -s 00:00.0 0xf0.l 01000000 That device doesn't have extended config space, so the last valid byte is 0xff. Let's read beyond that: # setpci -s 00:00.0 0x100.l ffffffff > Again, only the drivers can determine if ~0 is a valid value. This > information is not available inside pci_config_read*(). Of course it is. *every* change you've done in 6/17 - this is the only patch I have received - checks for == ~0. So that check can just as well be moved inside pci_config_read_*(). Here's how one could do it: #define PCI_OP_READ(size, type, len) \ int noinline pci_bus_read_config_##size \ (struct pci_bus *bus, unsigned int devfn, int pos, type *value) \ { \ int res; \ unsigned long flags; \ u32 data = 0; \ if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \ pci_lock_config(flags); \ res = bus->ops->read(bus, devfn, pos, len, &data); \ /* Check we actually read something which is not all 1s.*/ if (data == ~0) return PCIBIOS_READ_FAILED; *value = (type)data; \ pci_unlock_config(flags); \ return res; \ } Also, I'd prefer a function to *not* return void but return either an error or success. In the success case, the @value argument can be consumed by the caller and otherwise not. In any case, that change is a step in the wrong direction and I don't like it, sorry.
On Sun, Aug 02, 2020 at 08:46:48PM +0200, Borislav Petkov wrote: > On Sun, Aug 02, 2020 at 07:28:00PM +0200, Saheed Bolarinwa wrote: > > Because the value ~0 has a meaning to some drivers and only > > No, ~0 means that the PCI read failed. For *every* PCI device I know. Wait, I'm not convinced yet. I know that if a PCI read fails, you normally get ~0 data because the host bridge fabricates it to complete the CPU load. But what guarantees that a PCI config register cannot contain ~0? If there's something about that in the spec I'd love to know where it is because it would simplify a lot of things. I don't think we should merge any of these patches as-is. If we *do* want to go this direction, we at least need some kind of macro or function that tests for ~0 so we have a clue about what's happening and can grep for it. Bjorn
On Sun, Aug 02, 2020 at 02:14:06PM -0500, Bjorn Helgaas wrote: > Wait, I'm not convinced yet. I know that if a PCI read fails, you > normally get ~0 data because the host bridge fabricates it to complete > the CPU load. > > But what guarantees that a PCI config register cannot contain ~0? Well, I don't think you can differentiate that case, right? I guess this is where the driver knowledge comes into play: if the read returns ~0, the pci_read_config* should probably return in that case something like: PCIBIOS_READ_MAYBE_FAILED to denote it is all 1s and then the caller should be able to determine, based on any of domain:bus:slot.func and whatever else the driver knows about its hardware, whether the 1s are a valid value or an error. Hopefully. Or something better of which I cannot think of right now...
On Sun, Aug 02, 2020 at 02:14:06PM -0500, Bjorn Helgaas wrote: > But what guarantees that a PCI config register cannot contain ~0? > If there's something about that in the spec I'd love to know where it > is because it would simplify a lot of things. There isn't. An we even have cases like the NVMe controller memory buffer and persistent memory region, which are BARs that store abritrary values for later retreival, so it can't. (now those features have a major issue with error detection, but that is another issue)