Message ID | 3256560.C0cZnIlnAv@wuerfel (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Wed, Oct 01, 2014 at 10:38:45AM +0100, Arnd Bergmann wrote: [...] > pci_mmap_page_range could either get generalized some more in an attempt > to have a __weak default implementation that works on ARM, or it could > be changed to lose the dependency on pci_sys_data instead. In either > case, the change would involve using the generic pci_host_bridge_window > list. On ARM pci_mmap_page_range requires pci_sys_data to retrieve its mem_offset parameter. I had a look, and I do not understand *why* it is required in that function, so I am asking. That function is basically used to map PCI resources to userspace, IIUC, through /proc or /sysfs file mappings. As far as I understand those mappings expect VMA pgoff to be the CPU address when files representing resources are mmapped from /proc and 0 when mmapped from /sys (I mean from userspace, then VMA pgoff should be updated by the kernel to map the resource). Question is: why pci_mmap_page_range() should apply an additional shift to the VMA pgoff based on pci_sys_data.mem_offset, which represents the offset from cpu->bus offset. I do not understand that. PowerPC does not seem to apply that fix-up (in PowerPC __pci_mmap_make_offset there is commented out code which prevents the pci_mem_offset shift to be applied). I think it all boils down to what the userspace interface is expecting when the memory areas are mmapped, if anyone has comments on this that is appreciated. Thanks, Lorenzo -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday 07 October 2014 13:06:59 Lorenzo Pieralisi wrote: > On Wed, Oct 01, 2014 at 10:38:45AM +0100, Arnd Bergmann wrote: > > [...] > > > pci_mmap_page_range could either get generalized some more in an attempt > > to have a __weak default implementation that works on ARM, or it could > > be changed to lose the dependency on pci_sys_data instead. In either > > case, the change would involve using the generic pci_host_bridge_window > > list. > > On ARM pci_mmap_page_range requires pci_sys_data to retrieve its > mem_offset parameter. I had a look, and I do not understand *why* > it is required in that function, so I am asking. That function > is basically used to map PCI resources to userspace, IIUC, through > /proc or /sysfs file mappings. As far as I understand those mappings > expect VMA pgoff to be the CPU address when files representing resources > are mmapped from /proc and 0 when mmapped from /sys (I mean from > userspace, then VMA pgoff should be updated by the kernel to map the > resource). Applying the mem_offset is certainly the more intuitive way, since that lets you read the PCI BAR values from a device and access the device with the appropriate offsets. > Question is: why pci_mmap_page_range() should apply an additional > shift to the VMA pgoff based on pci_sys_data.mem_offset, which represents > the offset from cpu->bus offset. I do not understand that. PowerPC > does not seem to apply that fix-up (in PowerPC __pci_mmap_make_offset there > is commented out code which prevents the pci_mem_offset shift to be > applied). I think it all boils down to what the userspace interface is > expecting when the memory areas are mmapped, if anyone has comments on > this that is appreciated. The important part is certainly that whatever transformation is done by pci_resource_to_user() gets undone by __pci_mmap_make_offset(). In case of PowerPC and Microblaze, the mem_offset handling is commented out in both, to work around X11 trying to use the same values on /dev/mem. However, they do have the respective fixup for io_offset. sparc applies the offset in both places for both io_offset and mem_offset. xtensa applies only io_offset in __pci_mmap_make_offset but neither in pci_resource_to_user. This probably works because the mem_offset is always zero there. mips applies a different fixup (for 36-bit addressing), but not the mem_offset. Every other architecture applies no offset here, neither in __pci_mmap_make_offset/pci_mmap_page_range nor in pci_resource_to_user The only hint I could find for how the ARM version came to be is from the historic kernel tree git log for linux-2.5.42, which added the current code as 2002/10/13 11:05:47+01:00 rmk [ARM] Update pcibios_enable_device, supply pci_mmap_page_range() Update pcibios_enable_device to only enable requested resources, mainly for IDE. Supply a pci_mmap_page_range() function to allow user space to mmap PCI regions. At that point, only two platforms had a nonzero mem_offset: footbridge/dc21285 and integrator/pci_v3. Both were using VGA, and presumably used this to make X work. (rmk might remember details). The code at the time matched what powerpc and sparc did, but then both implemented pci_resource_to_user() in order for libpciaccess to work correctly (bcea1db16b for sparc, 463ce0e103f for powerpc), and later powerpc changed it again to not apply the offset in pci_resource_to_user or pci_mmap_page_range in 396a1a5832ae. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 07, 2014 at 02:52:27PM +0100, Arnd Bergmann wrote: > On Tuesday 07 October 2014 13:06:59 Lorenzo Pieralisi wrote: > > On Wed, Oct 01, 2014 at 10:38:45AM +0100, Arnd Bergmann wrote: > > > > [...] > > > > > pci_mmap_page_range could either get generalized some more in an attempt > > > to have a __weak default implementation that works on ARM, or it could > > > be changed to lose the dependency on pci_sys_data instead. In either > > > case, the change would involve using the generic pci_host_bridge_window > > > list. > > > > On ARM pci_mmap_page_range requires pci_sys_data to retrieve its > > mem_offset parameter. I had a look, and I do not understand *why* > > it is required in that function, so I am asking. That function > > is basically used to map PCI resources to userspace, IIUC, through > > /proc or /sysfs file mappings. As far as I understand those mappings > > expect VMA pgoff to be the CPU address when files representing resources > > are mmapped from /proc and 0 when mmapped from /sys (I mean from > > userspace, then VMA pgoff should be updated by the kernel to map the > > resource). > > Applying the mem_offset is certainly the more intuitive way, since > that lets you read the PCI BAR values from a device and access the > device with the appropriate offsets. Ok, but I am referring to this snippet (drivers/pci/pci-sysfs.c): /* pci_mmap_page_range() expects the same kind of entry as coming * from /proc/bus/pci/ which is a "user visible" value. If this is * different from the resource itself, arch will do necessary fixup. */ pci_resource_to_user(pdev, i, res, &start, &end); --> Here start represents a CPU physical address, if pci_resource_to_user() does not fix it up, correct ? vma->vm_pgoff += start >> PAGE_SHIFT; [...] return pci_mmap_page_range(...); pci_mmap_page_range() applies (mem_offset >> PAGE_SHIFT) to pgoff in the ARM implemention. Is not there a mismatch here on platforms where mem_offset != 0 ? > > Question is: why pci_mmap_page_range() should apply an additional > > shift to the VMA pgoff based on pci_sys_data.mem_offset, which represents > > the offset from cpu->bus offset. I do not understand that. PowerPC > > does not seem to apply that fix-up (in PowerPC __pci_mmap_make_offset there > > is commented out code which prevents the pci_mem_offset shift to be > > applied). I think it all boils down to what the userspace interface is > > expecting when the memory areas are mmapped, if anyone has comments on > > this that is appreciated. > > The important part is certainly that whatever transformation is done > by pci_resource_to_user() gets undone by __pci_mmap_make_offset(). Exactly, it does not seem to be the case above, that's why I asked. > In case of PowerPC and Microblaze, the mem_offset handling is commented > out in both, to work around X11 trying to use the same values on > /dev/mem. However, they do have the respective fixup for io_offset. > > sparc applies the offset in both places for both io_offset and mem_offset. > xtensa applies only io_offset in __pci_mmap_make_offset but neither > in pci_resource_to_user. This probably works because the mem_offset is > always zero there. > mips applies a different fixup (for 36-bit addressing), but not the > mem_offset. > > Every other architecture applies no offset here, neither in __pci_mmap_make_offset/pci_mmap_page_range nor in pci_resource_to_user > > The only hint I could find for how the ARM version came to be is > from the historic kernel tree git log for linux-2.5.42, which added > the current code as > > 2002/10/13 11:05:47+01:00 rmk > [ARM] Update pcibios_enable_device, supply pci_mmap_page_range() > Update pcibios_enable_device to only enable requested resources, > mainly for IDE. Supply a pci_mmap_page_range() function to allow > user space to mmap PCI regions. > > At that point, only two platforms had a nonzero mem_offset: > footbridge/dc21285 and integrator/pci_v3. Both were using VGA, > and presumably used this to make X work. (rmk might remember > details). I think that, as I mentioned, it boils down to what the userspace interface (proc/sys and they seem to differ) is supposed to be passed from userspace processes upon mmap. > The code at the time matched what powerpc and sparc did, but then > both implemented pci_resource_to_user() in order for libpciaccess > to work correctly (bcea1db16b for sparc, 463ce0e103f for powerpc), > and later powerpc changed it again to not apply the offset in > pci_resource_to_user or pci_mmap_page_range in 396a1a5832ae. I will keep investigating, thank you for your help, any further comments appreciated. Lorenzo -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday 07 October 2014 15:47:50 Lorenzo Pieralisi wrote: > On Tue, Oct 07, 2014 at 02:52:27PM +0100, Arnd Bergmann wrote: > > On Tuesday 07 October 2014 13:06:59 Lorenzo Pieralisi wrote: > > > On Wed, Oct 01, 2014 at 10:38:45AM +0100, Arnd Bergmann wrote: > > > > > > [...] > > > > > > > pci_mmap_page_range could either get generalized some more in an attempt > > > > to have a __weak default implementation that works on ARM, or it could > > > > be changed to lose the dependency on pci_sys_data instead. In either > > > > case, the change would involve using the generic pci_host_bridge_window > > > > list. > > > > > > On ARM pci_mmap_page_range requires pci_sys_data to retrieve its > > > mem_offset parameter. I had a look, and I do not understand *why* > > > it is required in that function, so I am asking. That function > > > is basically used to map PCI resources to userspace, IIUC, through > > > /proc or /sysfs file mappings. As far as I understand those mappings > > > expect VMA pgoff to be the CPU address when files representing resources > > > are mmapped from /proc and 0 when mmapped from /sys (I mean from > > > userspace, then VMA pgoff should be updated by the kernel to map the > > > resource). > > > > Applying the mem_offset is certainly the more intuitive way, since > > that lets you read the PCI BAR values from a device and access the > > device with the appropriate offsets. > > Ok, but I am referring to this snippet (drivers/pci/pci-sysfs.c): > > /* pci_mmap_page_range() expects the same kind of entry as coming > * from /proc/bus/pci/ which is a "user visible" value. If this is > * different from the resource itself, arch will do necessary fixup. > */ > pci_resource_to_user(pdev, i, res, &start, &end); > > --> Here start represents a CPU physical address, if pci_resource_to_user() > does not fix it up, correct ? > > vma->vm_pgoff += start >> PAGE_SHIFT; > > [...] > > return pci_mmap_page_range(...); > > pci_mmap_page_range() applies (mem_offset >> PAGE_SHIFT) to pgoff in the > ARM implemention. > > Is not there a mismatch here on platforms where mem_offset != 0 ? Yes, I think that's right: ARM never gained its own pci_resource_to_user() implementation, presumably because nobody ran into this problem and debugged it all the way. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 07, 2014 at 10:39:47PM +0100, Arnd Bergmann wrote: > On Tuesday 07 October 2014 15:47:50 Lorenzo Pieralisi wrote: > > On Tue, Oct 07, 2014 at 02:52:27PM +0100, Arnd Bergmann wrote: > > > On Tuesday 07 October 2014 13:06:59 Lorenzo Pieralisi wrote: > > > > On Wed, Oct 01, 2014 at 10:38:45AM +0100, Arnd Bergmann wrote: > > > > > > > > [...] > > > > > > > > > pci_mmap_page_range could either get generalized some more in an attempt > > > > > to have a __weak default implementation that works on ARM, or it could > > > > > be changed to lose the dependency on pci_sys_data instead. In either > > > > > case, the change would involve using the generic pci_host_bridge_window > > > > > list. > > > > > > > > On ARM pci_mmap_page_range requires pci_sys_data to retrieve its > > > > mem_offset parameter. I had a look, and I do not understand *why* > > > > it is required in that function, so I am asking. That function > > > > is basically used to map PCI resources to userspace, IIUC, through > > > > /proc or /sysfs file mappings. As far as I understand those mappings > > > > expect VMA pgoff to be the CPU address when files representing resources > > > > are mmapped from /proc and 0 when mmapped from /sys (I mean from > > > > userspace, then VMA pgoff should be updated by the kernel to map the > > > > resource). > > > > > > Applying the mem_offset is certainly the more intuitive way, since > > > that lets you read the PCI BAR values from a device and access the > > > device with the appropriate offsets. > > > > Ok, but I am referring to this snippet (drivers/pci/pci-sysfs.c): > > > > /* pci_mmap_page_range() expects the same kind of entry as coming > > * from /proc/bus/pci/ which is a "user visible" value. If this is > > * different from the resource itself, arch will do necessary fixup. > > */ > > pci_resource_to_user(pdev, i, res, &start, &end); > > > > --> Here start represents a CPU physical address, if pci_resource_to_user() > > does not fix it up, correct ? > > > > vma->vm_pgoff += start >> PAGE_SHIFT; > > > > [...] > > > > return pci_mmap_page_range(...); > > > > pci_mmap_page_range() applies (mem_offset >> PAGE_SHIFT) to pgoff in the > > ARM implemention. > > > > Is not there a mismatch here on platforms where mem_offset != 0 ? > > Yes, I think that's right: ARM never gained its own pci_resource_to_user() > implementation, presumably because nobody ran into this problem and > debugged it all the way. Ok. So, unless I am missing something, on platform with mem_offset != 0 /proc and /sys interfaces for remapping PCI resources can't work (IIUC the proc interface expects the user to pass in the resource address as seen from /proc/bus/pci/devices - which are not BAR values. Even if the user passed the BAR value to mmap, pci_mmap_fits() in proc_bus_pci_mmap() would fail since it compares the pgoff to resource values, which are not BAR values). As things stand I think we can safely remove the mem_offset (and pci_sys_data dependency) from pci_mmap_page_range(). I do not think we can break userspace in any way, basically because it can't work at the moment, again, happy to be corrected if I am wrong, please shout. Or we can add mem_offset to the host bridge (after all architectures like PowerPC and Microblaze have a pci_mem_offset variable in their host controllers), but still, this removes pci_sys_data dependency but does not solve the pci_mmap_page_range() issue. Lorenzo -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday 08 October 2014 11:19:43 Lorenzo Pieralisi wrote: > > Ok. So, unless I am missing something, on platform with mem_offset != 0 > /proc and /sys interfaces for remapping PCI resources can't work (IIUC > the proc interface expects the user to pass in the resource address as > seen from /proc/bus/pci/devices - which are not BAR values. Even if the > user passed the BAR value to mmap, pci_mmap_fits() in proc_bus_pci_mmap() > would fail since it compares the pgoff to resource values, which are not > BAR values). I think you are right for the sysfs interface, that one can't possibly work because of the incorrect address computation. For the /procfs interface, I think it can work as long as the offsets used there are coming from the config space dump in /proc/bus/pci/* rather than from the /sys/bus/pci/devices/*/resource file. > As things stand I think we can safely remove the mem_offset (and > pci_sys_data dependency) from pci_mmap_page_range(). I do not think > we can break userspace in any way, basically because it can't work at > the moment, again, happy to be corrected if I am wrong, please shout. Please look at the procfs interface again. That one can be defined in two ways (either like sparc and arm, or like powerpc and microblaze) but either one should be able to work with user space that expects that interface and break with user space that expects the other one. > Or we can add mem_offset to the host bridge (after all architectures like > PowerPC and Microblaze have a pci_mem_offset variable in their host > controllers), but still, this removes pci_sys_data dependency but does > not solve the pci_mmap_page_range() issue. The host bridge already stores the mem_offset in terms of the resource list, so we could readily use that, except that it might break the powerpc hack if that is still in use. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 08, 2014 at 03:47:43PM +0100, Arnd Bergmann wrote: > On Wednesday 08 October 2014 11:19:43 Lorenzo Pieralisi wrote: > > > > Ok. So, unless I am missing something, on platform with mem_offset != 0 > > /proc and /sys interfaces for remapping PCI resources can't work (IIUC > > the proc interface expects the user to pass in the resource address as > > seen from /proc/bus/pci/devices - which are not BAR values. Even if the > > user passed the BAR value to mmap, pci_mmap_fits() in proc_bus_pci_mmap() > > would fail since it compares the pgoff to resource values, which are not > > BAR values). > > I think you are right for the sysfs interface, that one can't possibly > work because of the incorrect address computation. > > For the /procfs interface, I think it can work as long as the offsets > used there are coming from the config space dump in /proc/bus/pci/* > rather than from the /sys/bus/pci/devices/*/resource file. > > > As things stand I think we can safely remove the mem_offset (and > > pci_sys_data dependency) from pci_mmap_page_range(). I do not think > > we can break userspace in any way, basically because it can't work at > > the moment, again, happy to be corrected if I am wrong, please shout. > > Please look at the procfs interface again. That one can be defined > in two ways (either like sparc and arm, or like powerpc and microblaze) > but either one should be able to work with user space that expects > that interface and break with user space that expects the other one. I agree as long as pci_mmap_page_range() is concerned, but I am referring to the pci_mmap_fits() implementation here: start = vma->vm_pgoff; size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1; pci_start = (mmap_api == PCI_MMAP_PROCFS) ? pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0; if (start >= pci_start && start < pci_start + size && start + nr <= pci_start + size) return 1; return 0; pci_mmap_fits(), when mapping from procfs always check the offset against resources, which are fixed-up addresses. If we passed the values dumped from the device config space (as pci_mmap_page_range() expects on arm) IMHO the check above would fail (always referring to platforms where mem_offset != 0). Last changes where introduced by commit 8c05cd08a, whose commit log adds to my confusion: "[...] I think what we want here is for pci_start to be 0 when mmap_api == PCI_MMAP_PROCFS.[...]" But that's not what the code does. I will try to grab an integrator board to give it a go. > > Or we can add mem_offset to the host bridge (after all architectures like > > PowerPC and Microblaze have a pci_mem_offset variable in their host > > controllers), but still, this removes pci_sys_data dependency but does > > not solve the pci_mmap_page_range() issue. > > The host bridge already stores the mem_offset in terms of the resource > list, so we could readily use that, except that it might break the > powerpc hack if that is still in use. Well, yes, I am not saying it can't be done by using the resources list, I am just trying to understand if that's really useful. Thank you ! Lorenzo -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thursday 09 October 2014 10:04:20 Lorenzo Pieralisi wrote: > On Wed, Oct 08, 2014 at 03:47:43PM +0100, Arnd Bergmann wrote: > > On Wednesday 08 October 2014 11:19:43 Lorenzo Pieralisi wrote: > > Please look at the procfs interface again. That one can be defined > > in two ways (either like sparc and arm, or like powerpc and microblaze) > > but either one should be able to work with user space that expects > > that interface and break with user space that expects the other one. > > I agree as long as pci_mmap_page_range() is concerned, but I am > referring to the pci_mmap_fits() implementation here: > > start = vma->vm_pgoff; > size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1; > pci_start = (mmap_api == PCI_MMAP_PROCFS) ? > pci_resource_start(pdev, resno) >> PAGE_SHIFT : 0; > if (start >= pci_start && start < pci_start + size && > start + nr <= pci_start + size) > return 1; > return 0; > > pci_mmap_fits(), when mapping from procfs always check the offset against > resources, which are fixed-up addresses. If we passed the values dumped > from the device config space (as pci_mmap_page_range() expects on arm) IMHO > the check above would fail (always referring to platforms where > mem_offset != 0). Ah, I see it now too. > Last changes where introduced by commit 8c05cd08a, whose commit log adds > to my confusion: > > "[...] I think what we want here is for pci_start to be 0 when mmap_api == > PCI_MMAP_PROCFS.[...]" > > But that's not what the code does. My best guess is that this is a typo and that Darrick meant PCI_MMAP_SYSFS in the changelog, which is the same thing that the code does. It's also the sensible thing to do. This probably means that the procfs interface is now also broken on sparc. > I will try to grab an integrator board to give it a go. Ok, good idea. > > > Or we can add mem_offset to the host bridge (after all architectures like > > > PowerPC and Microblaze have a pci_mem_offset variable in their host > > > controllers), but still, this removes pci_sys_data dependency but does > > > not solve the pci_mmap_page_range() issue. > > > > The host bridge already stores the mem_offset in terms of the resource > > list, so we could readily use that, except that it might break the > > powerpc hack if that is still in use. > > Well, yes, I am not saying it can't be done by using the resources list, > I am just trying to understand if that's really useful. The PCI core tries to be ready for PCI host bridges that have multiple discontiguous memory spaces with different offsets, although I don't know of anybody has that. However if we decide to implement a generic pci_mmap_page_range that tries to take the offset into account, we should use the resource list in the host bridge because it can tell us the correct offsets. However, given what you found, the procfs interface being broken since 2010 on both architectures (arm32 and sparc) that try to honor the offset, we should probably go back to your previous suggestion of removing the offset handling, which would make it possible to use the procfs interface and the sysfs interface on all architectures. Would you be able to prepare a patch that does this and circulate that with the sparc, powerpc and microblaze maintainers as well as Darrick and Martin who were involved with the pci_mmap_fits change? Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 09, 2014 at 11:51:43AM +0100, Arnd Bergmann wrote: [...] > > Last changes where introduced by commit 8c05cd08a, whose commit log adds > > to my confusion: > > > > "[...] I think what we want here is for pci_start to be 0 when mmap_api == > > PCI_MMAP_PROCFS.[...]" > > > > But that's not what the code does. > > My best guess is that this is a typo and that Darrick meant PCI_MMAP_SYSFS > in the changelog, which is the same thing that the code does. It's also > the sensible thing to do. > > This probably means that the procfs interface is now also broken on > sparc. > > > I will try to grab an integrator board to give it a go. > > Ok, good idea. Grabbed, tested it, my theory was correct, I can't map PCI resources to userspace. Actually, if I pass resource offset as a fixed-up address, mmap succeeds through proc, but it does not mmap the resource, it maps the resource + mem_offset that happens to be RAM :D for the PCI slot I am using. I am testing on an oldish (3.16) kernel since I am having trouble with mainline PCI and my network adapter on integrator, but I do not see why this is a problem, this bug has been there forever. By removing mem_offset from pci_mmap_page_range() everything works fine, both proc and sys mappings are ok. > > > > Or we can add mem_offset to the host bridge (after all architectures like > > > > PowerPC and Microblaze have a pci_mem_offset variable in their host > > > > controllers), but still, this removes pci_sys_data dependency but does > > > > not solve the pci_mmap_page_range() issue. > > > > > > The host bridge already stores the mem_offset in terms of the resource > > > list, so we could readily use that, except that it might break the > > > powerpc hack if that is still in use. > > > > Well, yes, I am not saying it can't be done by using the resources list, > > I am just trying to understand if that's really useful. > > The PCI core tries to be ready for PCI host bridges that have multiple > discontiguous memory spaces with different offsets, although I don't know > of anybody has that. However if we decide to implement a generic > pci_mmap_page_range that tries to take the offset into account, we should > use the resource list in the host bridge because it can tell us the correct > offsets. > > However, given what you found, the procfs interface being broken since > 2010 on both architectures (arm32 and sparc) that try to honor the offset, > we should probably go back to your previous suggestion of removing > the offset handling, which would make it possible to use the procfs > interface and the sysfs interface on all architectures. > > Would you be able to prepare a patch that does this and circulate that > with the sparc, powerpc and microblaze maintainers as well as Darrick > and Martin who were involved with the pci_mmap_fits change? Yes, but let's step back a second. I think that the proc interface should expect an offset as passed to the user in /proc/bus/pci/devices, and there the resource is exposed through pci_resource_to_user(). Hence, the pci_mmap_fits() should check the offset against the resource filtered through pci_resource_to_user(), job done, patch is trivial, and does what pci_resource_to_user() was meant for IMHO. Then we have to decide what to do with arm32 code: 1) we remove mem_offset from pci_mmap_page_range() (and rely on default pci_resource_to_user()) or 2) we provide pci_resource_to_user() for arm32 which does the CPU->bus conversion for us (and leave mem_offset as-is in pci_mmap_range()) Thoughts ? Thanks, Lorenzo -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday 10 October 2014 14:58:04 Lorenzo Pieralisi wrote: > On Thu, Oct 09, 2014 at 11:51:43AM +0100, Arnd Bergmann wrote: > > > > Last changes where introduced by commit 8c05cd08a, whose commit log adds > > > to my confusion: > > > > > > "[...] I think what we want here is for pci_start to be 0 when mmap_api == > > > PCI_MMAP_PROCFS.[...]" > > > > > > But that's not what the code does. > > > > My best guess is that this is a typo and that Darrick meant PCI_MMAP_SYSFS > > in the changelog, which is the same thing that the code does. It's also > > the sensible thing to do. > > > > This probably means that the procfs interface is now also broken on > > sparc. > > > > > I will try to grab an integrator board to give it a go. > > > > Ok, good idea. > > Grabbed, tested it, my theory was correct, I can't map PCI resources > to userspace. Actually, if I pass resource offset as a fixed-up address, mmap > succeeds through proc, but it does not mmap the resource, it maps > the resource + mem_offset that happens to be RAM :D for the PCI slot I am > using. > > I am testing on an oldish (3.16) kernel since I am having trouble with > mainline PCI and my network adapter on integrator, but I do not see why this > is a problem, this bug has been there forever. I would guess that almost the only users of the sysfs and procfs interfaces are Xorg drivers, you certainly don't need it to get a network adapter working. > By removing mem_offset from pci_mmap_page_range() everything works fine, > both proc and sys mappings are ok. Ok, thanks for confirming! > > However, given what you found, the procfs interface being broken since > > 2010 on both architectures (arm32 and sparc) that try to honor the offset, > > we should probably go back to your previous suggestion of removing > > the offset handling, which would make it possible to use the procfs > > interface and the sysfs interface on all architectures. > > > > Would you be able to prepare a patch that does this and circulate that > > with the sparc, powerpc and microblaze maintainers as well as Darrick > > and Martin who were involved with the pci_mmap_fits change? > > Yes, but let's step back a second. I think that the proc interface > should expect an offset as passed to the user in /proc/bus/pci/devices, > and there the resource is exposed through pci_resource_to_user(). > > Hence, the pci_mmap_fits() should check the offset against the > resource filtered through pci_resource_to_user(), job done, patch > is trivial, and does what pci_resource_to_user() was meant for IMHO. My point was that there is no reason why sparc and powerpc should do this differently. At the moment they do and sparc is broken as you proved. We can either fix sparc to restore the old behavior by adding pci_resource_to_user to pci_mmap_fits, or by making it do what powerpc does, essentially removing the memory space handling from pci_resource_to_user. Whatever we do for sparc is probably what we need to do on ARM as well, except that ARM has been broken for a longer time than sparc. > Then we have to decide what to do with arm32 code: > > 1) we remove mem_offset from pci_mmap_page_range() (and rely on default > pci_resource_to_user()) > > or > > 2) we provide pci_resource_to_user() for arm32 which does the CPU->bus > conversion for us (and leave mem_offset as-is in pci_mmap_range()) I'd vote for 1) to get it in line with the only working architectures that currently use a nonzero offset, but Russell needs to have the final word on this, and I still think we have to involve the sparc and powerpc maintainers as well, hoping to find a common solution for everybody. Making a user space interface behave differently based on the CPU architecture is a bad idea. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 10, 2014 at 07:31:26PM +0100, Arnd Bergmann wrote: > On Friday 10 October 2014 14:58:04 Lorenzo Pieralisi wrote: > > On Thu, Oct 09, 2014 at 11:51:43AM +0100, Arnd Bergmann wrote: > > > > > > Last changes where introduced by commit 8c05cd08a, whose commit log adds > > > > to my confusion: > > > > > > > > "[...] I think what we want here is for pci_start to be 0 when mmap_api == > > > > PCI_MMAP_PROCFS.[...]" > > > > > > > > But that's not what the code does. > > > > > > My best guess is that this is a typo and that Darrick meant PCI_MMAP_SYSFS > > > in the changelog, which is the same thing that the code does. It's also > > > the sensible thing to do. > > > > > > This probably means that the procfs interface is now also broken on > > > sparc. > > > > > > > I will try to grab an integrator board to give it a go. > > > > > > Ok, good idea. > > > > Grabbed, tested it, my theory was correct, I can't map PCI resources > > to userspace. Actually, if I pass resource offset as a fixed-up address, mmap > > succeeds through proc, but it does not mmap the resource, it maps > > the resource + mem_offset that happens to be RAM :D for the PCI slot I am > > using. > > > > I am testing on an oldish (3.16) kernel since I am having trouble with > > mainline PCI and my network adapter on integrator, but I do not see why this > > is a problem, this bug has been there forever. > > I would guess that almost the only users of the sysfs and procfs > interfaces are Xorg drivers, you certainly don't need it to get > a network adapter working. The issue I am facing is not related to the PCI mmap implementation, that's certainly broken but does not stop me from using the board. [...] > > By removing mem_offset from pci_mmap_page_range() everything works fine, > > both proc and sys mappings are ok. > > Ok, thanks for confirming! > > > > However, given what you found, the procfs interface being broken since > > > 2010 on both architectures (arm32 and sparc) that try to honor the offset, > > > we should probably go back to your previous suggestion of removing > > > the offset handling, which would make it possible to use the procfs > > > interface and the sysfs interface on all architectures. > > > > > > Would you be able to prepare a patch that does this and circulate that > > > with the sparc, powerpc and microblaze maintainers as well as Darrick > > > and Martin who were involved with the pci_mmap_fits change? > > > > Yes, but let's step back a second. I think that the proc interface > > should expect an offset as passed to the user in /proc/bus/pci/devices, > > and there the resource is exposed through pci_resource_to_user(). > > > > Hence, the pci_mmap_fits() should check the offset against the > > resource filtered through pci_resource_to_user(), job done, patch > > is trivial, and does what pci_resource_to_user() was meant for IMHO. > > My point was that there is no reason why sparc and powerpc should > do this differently. At the moment they do and sparc is broken > as you proved. We can either fix sparc to restore the old behavior > by adding pci_resource_to_user to pci_mmap_fits, or by making it > do what powerpc does, essentially removing the memory space handling > from pci_resource_to_user. > > Whatever we do for sparc is probably what we need to do on ARM as well, > except that ARM has been broken for a longer time than sparc. > > > Then we have to decide what to do with arm32 code: > > > > 1) we remove mem_offset from pci_mmap_page_range() (and rely on default > > pci_resource_to_user()) > > > > or > > > > 2) we provide pci_resource_to_user() for arm32 which does the CPU->bus > > conversion for us (and leave mem_offset as-is in pci_mmap_range()) > > I'd vote for 1) to get it in line with the only working architectures > that currently use a nonzero offset, but Russell needs to have the final > word on this, and I still think we have to involve the sparc and powerpc > maintainers as well, hoping to find a common solution for everybody. > > Making a user space interface behave differently based on the CPU > architecture is a bad idea. I agree with you, I will put together a patchset and copy all people who should be involved. Lorenzo -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 01, 2014 at 10:38:45AM +0100, Arnd Bergmann wrote: [...] > The arm32 implementations of pci_domain_nr/pci_proc_domain can probably be > removed if we change the arm32 pcibios_init_hw function to call the new > interfaces that set the domain number. I wished, but it is a bit more complicated than I thought unfortunately, mostly because some drivers, eg cns3xxx set the domain numbers statically in pci_sys_data and this sets a chain of dependency that is not easy to untangle. I think cns3xxx is the only legacy driver that "uses" the domain number (in pci_sys_data) in a way that clashes with the generic domain_nr implementation, I need to give it more thought. > pci_mmap_page_range could either get generalized some more in an attempt > to have a __weak default implementation that works on ARM, or it could > be changed to lose the dependency on pci_sys_data instead. In either > case, the change would involve using the generic pci_host_bridge_window > list. I need to repost my series, but I *think* we can consider the dependency on pci_sys_data gone in pci_mmap_page_range(). > pcibios_align_resource should probably be per host, and we could move > that into a pointer in pci_host_bridge, something like this: Yes, and that's likely to be true for add_bus too. I wonder what's the best course of action. Putting together all the bits and pieces required to remove PCI bios dependency from this patch can take a while, I wonder whether we should aim for merging this driver (rebased on top of my port to the new parse ranges API) with the ARM/ARM64 ifdeffery and clean it up later or aim for the whole thing at once, I am just worried it can take us a while. Lorenzo > > diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c > index b7c3a5ea1fca..d9cb6c916d54 100644 > --- a/drivers/pci/setup-res.c > +++ b/drivers/pci/setup-res.c > @@ -200,11 +200,15 @@ static int pci_revert_fw_address(struct resource *res, struct pci_dev *dev, > static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev, > int resno, resource_size_t size, resource_size_t align) > { > + struct pci_host_bridge *host = find_pci_host_bridge(bus); > + resource_size_t (*alignf)(void *, const struct resource *, > + resource_size_t, resource_size_t), > struct resource *res = dev->resource + resno; > resource_size_t min; > int ret; > > min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM; > + alignf = host->align_resource ?: pcibios_align_resource; > > /* > * First, try exact prefetching match. Even if a 64-bit > @@ -215,7 +219,7 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev, > */ > ret = pci_bus_alloc_resource(bus, res, size, align, min, > IORESOURCE_PREFETCH | IORESOURCE_MEM_64, > - pcibios_align_resource, dev); > + alignf, dev); > if (ret == 0) > return 0; > > @@ -227,7 +231,7 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev, > (IORESOURCE_PREFETCH | IORESOURCE_MEM_64)) { > ret = pci_bus_alloc_resource(bus, res, size, align, min, > IORESOURCE_PREFETCH, > - pcibios_align_resource, dev); > + alignf, dev); > if (ret == 0) > return 0; > } > @@ -240,7 +244,7 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev, > */ > if (res->flags & (IORESOURCE_PREFETCH | IORESOURCE_MEM_64)) > ret = pci_bus_alloc_resource(bus, res, size, align, min, 0, > - pcibios_align_resource, dev); > + alignf, dev); > > return ret; > } > > > If we decide constantly calling find_pci_host_bridge() is too expensive, we can > be more clever about it. > > Arnd > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 22, 2014 at 9:59 AM, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote: > ... I wonder what's the > best course of action. Putting together all the bits and pieces required > to remove PCI bios dependency from this patch can take a while, I wonder > whether we should aim for merging this driver (rebased on top of my port to the > new parse ranges API) with the ARM/ARM64 ifdeffery and clean it up later > or aim for the whole thing at once, I am just worried it can take us a while. I haven't looked at your patches, but "the whole thing at once" is never *my* goal. A gradual cleanup is just fine with me. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c index b7c3a5ea1fca..d9cb6c916d54 100644 --- a/drivers/pci/setup-res.c +++ b/drivers/pci/setup-res.c @@ -200,11 +200,15 @@ static int pci_revert_fw_address(struct resource *res, struct pci_dev *dev, static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev, int resno, resource_size_t size, resource_size_t align) { + struct pci_host_bridge *host = find_pci_host_bridge(bus); + resource_size_t (*alignf)(void *, const struct resource *, + resource_size_t, resource_size_t), struct resource *res = dev->resource + resno; resource_size_t min; int ret; min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM; + alignf = host->align_resource ?: pcibios_align_resource; /* * First, try exact prefetching match. Even if a 64-bit @@ -215,7 +219,7 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev, */ ret = pci_bus_alloc_resource(bus, res, size, align, min, IORESOURCE_PREFETCH | IORESOURCE_MEM_64, - pcibios_align_resource, dev); + alignf, dev); if (ret == 0) return 0; @@ -227,7 +231,7 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev, (IORESOURCE_PREFETCH | IORESOURCE_MEM_64)) { ret = pci_bus_alloc_resource(bus, res, size, align, min, IORESOURCE_PREFETCH, - pcibios_align_resource, dev); + alignf, dev); if (ret == 0) return 0; } @@ -240,7 +244,7 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev, */ if (res->flags & (IORESOURCE_PREFETCH | IORESOURCE_MEM_64)) ret = pci_bus_alloc_resource(bus, res, size, align, min, 0, - pcibios_align_resource, dev); + alignf, dev); return ret; }