diff mbox series

[v3] PCI: Align pci memory space base address with page size

Message ID 20230606084539.1694441-1-maobibo@loongson.cn (mailing list archive)
State Superseded
Delegated to: Bjorn Helgaas
Headers show
Series [v3] PCI: Align pci memory space base address with page size | expand

Commit Message

bibo mao June 6, 2023, 8:45 a.m. UTC
Some PCI devices have only 4K memory space size, it is normal in general
machines and aligned with page size. However some architectures which
support different page size, default page size on LoongArch is 16K, and
ARM64 supports page size varying from 4K to 64K. On machines where larger
page size is use, memory space region of two different pci devices may be
in one page. It is not safe with mmu protection, also VFIO pci device
driver requires base address of pci memory space page aligned, so that it
can be memory mapped to qemu user space when it is passed-through to vm.

It consumes more pci memory resource with page size alignment requirement,
it should not be a problem on 64 bit system.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
---
 drivers/pci/setup-res.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

bibo mao June 6, 2023, 10:06 a.m. UTC | #1
Huacai,

Although I post this patch, I think this should be arch specified rather than general problem.
X86 has solved this problem, arm64 with 64K page size is not popular. However LoongArch
has this problem, page size is 16K rather than 4K. It is the problem of LoongArch system
rather than generic issue.

There is such discussion before:
https://patchwork.kernel.org/project/linux-pci/patch/22400b8828ad44ddbccb876cc5ca3b11@FE-MBX1012.de.bosch.com/#19319457

UEFI bios sets pci memory space 4K aligned, however Loongarch kernel rescans the pci
bus and reassigns pci memory resource. So it it strange like this, here is pci memory info on
 my 7A2000 board.
root@user-pc:~# lspci -vvv | grep Region
        Region 5: Memory at e003526e800 (32-bit, non-prefetchable) [size=1K]
        Region 0: Memory at e003526ec00 (64-bit, non-prefetchable) [size=1K]

Regards
Bibo, Mao

在 2023/6/6 16:45, Bibo Mao 写道:
> Some PCI devices have only 4K memory space size, it is normal in general
> machines and aligned with page size. However some architectures which
> support different page size, default page size on LoongArch is 16K, and
> ARM64 supports page size varying from 4K to 64K. On machines where larger
> page size is use, memory space region of two different pci devices may be
> in one page. It is not safe with mmu protection, also VFIO pci device
> driver requires base address of pci memory space page aligned, so that it
> can be memory mapped to qemu user space when it is passed-through to vm.
> 
> It consumes more pci memory resource with page size alignment requirement,
> it should not be a problem on 64 bit system.
> 
> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> ---
>  drivers/pci/setup-res.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
> index 967f9a758923..55440ae0128d 100644
> --- a/drivers/pci/setup-res.c
> +++ b/drivers/pci/setup-res.c
> @@ -339,6 +339,14 @@ int pci_assign_resource(struct pci_dev *dev, int resno)
>  		return -EINVAL;
>  	}
>  
> +#ifdef CONFIG_64BIT
> +	/*
> +	 * force minimum page alignment for vfio pci usage
> +	 * supposing there is enough pci memory resource on 64bit system
> +	 */
> +	if (res->flags & IORESOURCE_MEM)
> +		align = max_t(resource_size_t, PAGE_SIZE, align);
> +#endif
>  	size = resource_size(res);
>  	ret = _pci_assign_resource(dev, resno, size, align);
>
Bjorn Helgaas June 6, 2023, 8:45 p.m. UTC | #2
On Tue, Jun 06, 2023 at 06:06:04PM +0800, bibo, mao wrote:
> Huacai,
> 
> Although I post this patch, I think this should be arch specified
> rather than general problem.  X86 has solved this problem, arm64
> with 64K page size is not popular. However LoongArch has this
> problem, page size is 16K rather than 4K. It is the problem of
> LoongArch system rather than generic issue.

I think this is only related to the page size, not to any
LoongArch-specific details.  If that's the case, I don't think the
change should be arch-specific.

> There is such discussion before:
> https://patchwork.kernel.org/project/linux-pci/patch/22400b8828ad44ddbccb876cc5ca3b11@FE-MBX1012.de.bosch.com/#19319457
> 
> UEFI bios sets pci memory space 4K aligned, however Loongarch kernel rescans the pci
> bus and reassigns pci memory resource. So it it strange like this, here is pci memory info on
>  my 7A2000 board.
> root@user-pc:~# lspci -vvv | grep Region
>         Region 5: Memory at e003526e800 (32-bit, non-prefetchable) [size=1K]
>         Region 0: Memory at e003526ec00 (64-bit, non-prefetchable) [size=1K]

I guess these BARs are from different devices, and the problem you're
pointing out is that both BARs end up in the same 16K page (actually
even the same 4K page):

  (gdb) p/x 0xe003526e800 >> 12
  $1 = 0xe003526e
  (gdb) p/x 0xe003526ec00 >> 12
  $2 = 0xe003526e

I agree that's a potential issue because a user mmap of the first
device also gets access to the BAR of the second device.  But it
doesn't seem to be a *new* or LoongArch-specific issue.

So I *think* the point of this patch is to ensure that BARs of
different devices never share a page.  Why is that suddenly an issue
for LoongArch?  Is it only because you see more sharing with 16K pages
than other arches do with 4K pages?

> 在 2023/6/6 16:45, Bibo Mao 写道:
> > Some PCI devices have only 4K memory space size, it is normal in general
> > machines and aligned with page size. However some architectures which
> > support different page size, default page size on LoongArch is 16K, and
> > ARM64 supports page size varying from 4K to 64K. On machines where larger
> > page size is use, memory space region of two different pci devices may be
> > in one page. It is not safe with mmu protection, also VFIO pci device
> > driver requires base address of pci memory space page aligned, so that it
> > can be memory mapped to qemu user space when it is passed-through to vm.

The minimum memory BAR per spec is 128 bytes (not 4K).  You just
demonstrated that even with 4K pages, different devices can share a
page.

> > It consumes more pci memory resource with page size alignment requirement,
> > it should not be a problem on 64 bit system.
> > 
> > Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> > ---
> >  drivers/pci/setup-res.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
> > index 967f9a758923..55440ae0128d 100644
> > --- a/drivers/pci/setup-res.c
> > +++ b/drivers/pci/setup-res.c
> > @@ -339,6 +339,14 @@ int pci_assign_resource(struct pci_dev *dev, int resno)
> >  		return -EINVAL;
> >  	}
> >  
> > +#ifdef CONFIG_64BIT
> > +	/*
> > +	 * force minimum page alignment for vfio pci usage
> > +	 * supposing there is enough pci memory resource on 64bit system
> > +	 */
> > +	if (res->flags & IORESOURCE_MEM)
> > +		align = max_t(resource_size_t, PAGE_SIZE, align);
> > +#endif

Why is this under CONFIG_64BIT?  That doesn't have anything to do with
the BAR size.  The only reason I can see is that with CONFIG_64BIT=y,
we *might* have more MMIO space to use for BARs.

> >  	size = resource_size(res);
> >  	ret = _pci_assign_resource(dev, resno, size, align);
bibo mao June 7, 2023, 1:42 a.m. UTC | #3
在 2023/6/7 04:45, Bjorn Helgaas 写道:
> On Tue, Jun 06, 2023 at 06:06:04PM +0800, bibo, mao wrote:
>> Huacai,
>>
>> Although I post this patch, I think this should be arch specified
>> rather than general problem.  X86 has solved this problem, arm64
>> with 64K page size is not popular. However LoongArch has this
>> problem, page size is 16K rather than 4K. It is the problem of
>> LoongArch system rather than generic issue.
> 
> I think this is only related to the page size, not to any
> LoongArch-specific details.  If that's the case, I don't think the
> change should be arch-specific.
> 
>> There is such discussion before:
>> https://patchwork.kernel.org/project/linux-pci/patch/22400b8828ad44ddbccb876cc5ca3b11@FE-MBX1012.de.bosch.com/#19319457
>>
>> UEFI bios sets pci memory space 4K aligned, however Loongarch kernel rescans the pci
>> bus and reassigns pci memory resource. So it it strange like this, here is pci memory info on
>>  my 7A2000 board.
>> root@user-pc:~# lspci -vvv | grep Region
>>         Region 5: Memory at e003526e800 (32-bit, non-prefetchable) [size=1K]
>>         Region 0: Memory at e003526ec00 (64-bit, non-prefetchable) [size=1K]
> 
> I guess these BARs are from different devices, and the problem you're
> pointing out is that both BARs end up in the same 16K page (actually
> even the same 4K page):
> 
>   (gdb) p/x 0xe003526e800 >> 12
>   $1 = 0xe003526e
>   (gdb) p/x 0xe003526ec00 >> 12
>   $2 = 0xe003526e
> 
> I agree that's a potential issue because a user mmap of the first
> device also gets access to the BAR of the second device.  But it
> doesn't seem to be a *new* or LoongArch-specific issue.
> 
> So I *think* the point of this patch is to ensure that BARs of
> different devices never share a page.  Why is that suddenly an issue
> for LoongArch?  Is it only because you see more sharing with 16K pages
> than other arches do with 4K pages?
The UEFI BIOS has assigned the PCI device with minimal 4K aligned, I guest
Linux kernel on X86/ARM uses resource directly rather than reassign resource
again. So there is problem for hot-plug devices, however most hot-plug devices
are used for server system and they are high speed devices, PCI resource size
is larger than 4K. So it is not obvious on x86/ARM system.

However on LoongArch system with page size 16K, the problem is obvious since
pci devices with 4K resource size are popular.
> 
>> 在 2023/6/6 16:45, Bibo Mao 写道:
>>> Some PCI devices have only 4K memory space size, it is normal in general
>>> machines and aligned with page size. However some architectures which
>>> support different page size, default page size on LoongArch is 16K, and
>>> ARM64 supports page size varying from 4K to 64K. On machines where larger
>>> page size is use, memory space region of two different pci devices may be
>>> in one page. It is not safe with mmu protection, also VFIO pci device
>>> driver requires base address of pci memory space page aligned, so that it
>>> can be memory mapped to qemu user space when it is passed-through to vm.
> 
> The minimum memory BAR per spec is 128 bytes (not 4K).  You just
> demonstrated that even with 4K pages, different devices can share a
> page.
> 
>>> It consumes more pci memory resource with page size alignment requirement,
>>> it should not be a problem on 64 bit system.
>>>
>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>>> ---
>>>  drivers/pci/setup-res.c | 8 ++++++++
>>>  1 file changed, 8 insertions(+)
>>>
>>> diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
>>> index 967f9a758923..55440ae0128d 100644
>>> --- a/drivers/pci/setup-res.c
>>> +++ b/drivers/pci/setup-res.c
>>> @@ -339,6 +339,14 @@ int pci_assign_resource(struct pci_dev *dev, int resno)
>>>  		return -EINVAL;
>>>  	}
>>>  
>>> +#ifdef CONFIG_64BIT
>>> +	/*
>>> +	 * force minimum page alignment for vfio pci usage
>>> +	 * supposing there is enough pci memory resource on 64bit system
>>> +	 */
>>> +	if (res->flags & IORESOURCE_MEM)
>>> +		align = max_t(resource_size_t, PAGE_SIZE, align);
>>> +#endif
> 
> Why is this under CONFIG_64BIT?  That doesn't have anything to do with
> the BAR size.  The only reason I can see is that with CONFIG_64BIT=y,
> we *might* have more MMIO space to use for BARs.
yes, I assume that there is enough PCI memory resource with 64 bit system,
after all it consumes more memory resource and brings out negative effect.
For UIO and SRIOV requirements, they are most 64-bit server system, they
can suffer from wasting some PCI memory resource.

Regards
Bibo, Mao
> 
>>>  	size = resource_size(res);
>>>  	ret = _pci_assign_resource(dev, resno, size, align);
bibo mao June 7, 2023, 8:26 a.m. UTC | #4
在 2023/6/7 09:42, bibo, mao 写道:
> 
> 
> 在 2023/6/7 04:45, Bjorn Helgaas 写道:
>> On Tue, Jun 06, 2023 at 06:06:04PM +0800, bibo, mao wrote:
>>> Huacai,
>>>
>>> Although I post this patch, I think this should be arch specified
>>> rather than general problem.  X86 has solved this problem, arm64
>>> with 64K page size is not popular. However LoongArch has this
>>> problem, page size is 16K rather than 4K. It is the problem of
>>> LoongArch system rather than generic issue.
>>
>> I think this is only related to the page size, not to any
>> LoongArch-specific details.  If that's the case, I don't think the
>> change should be arch-specific.
>>
>>> There is such discussion before:
>>> https://patchwork.kernel.org/project/linux-pci/patch/22400b8828ad44ddbccb876cc5ca3b11@FE-MBX1012.de.bosch.com/#19319457
>>>
>>> UEFI bios sets pci memory space 4K aligned, however Loongarch kernel rescans the pci
>>> bus and reassigns pci memory resource. So it it strange like this, here is pci memory info on
>>>  my 7A2000 board.
>>> root@user-pc:~# lspci -vvv | grep Region
>>>         Region 5: Memory at e003526e800 (32-bit, non-prefetchable) [size=1K]
>>>         Region 0: Memory at e003526ec00 (64-bit, non-prefetchable) [size=1K]
>>
>> I guess these BARs are from different devices, and the problem you're
>> pointing out is that both BARs end up in the same 16K page (actually
>> even the same 4K page):
>>
>>   (gdb) p/x 0xe003526e800 >> 12
>>   $1 = 0xe003526e
>>   (gdb) p/x 0xe003526ec00 >> 12
>>   $2 = 0xe003526e
>>
>> I agree that's a potential issue because a user mmap of the first
>> device also gets access to the BAR of the second device.  But it
>> doesn't seem to be a *new* or LoongArch-specific issue.
>>
>> So I *think* the point of this patch is to ensure that BARs of
>> different devices never share a page.  Why is that suddenly an issue
>> for LoongArch?  Is it only because you see more sharing with 16K pages
>> than other arches do with 4K pages?
> The UEFI BIOS has assigned the PCI device with minimal 4K aligned, I guest
> Linux kernel on X86/ARM uses resource directly rather than reassign resource
> again. So there is problem for hot-plug devices, however most hot-plug devices
> are used for server system and they are high speed devices, PCI resource size
> is larger than 4K. So it is not obvious on x86/ARM system.
> 
> However on LoongArch system with page size 16K, the problem is obvious since
> pci devices with 4K resource size are popular.
>>
>>> 在 2023/6/6 16:45, Bibo Mao 写道:
>>>> Some PCI devices have only 4K memory space size, it is normal in general
>>>> machines and aligned with page size. However some architectures which
>>>> support different page size, default page size on LoongArch is 16K, and
>>>> ARM64 supports page size varying from 4K to 64K. On machines where larger
>>>> page size is use, memory space region of two different pci devices may be
>>>> in one page. It is not safe with mmu protection, also VFIO pci device
>>>> driver requires base address of pci memory space page aligned, so that it
>>>> can be memory mapped to qemu user space when it is passed-through to vm.
>>
>> The minimum memory BAR per spec is 128 bytes (not 4K).  You just
>> demonstrated that even with 4K pages, different devices can share a
>> page.
>>
>>>> It consumes more pci memory resource with page size alignment requirement,
>>>> it should not be a problem on 64 bit system.
>>>>
>>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>>>> ---
>>>>  drivers/pci/setup-res.c | 8 ++++++++
>>>>  1 file changed, 8 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
>>>> index 967f9a758923..55440ae0128d 100644
>>>> --- a/drivers/pci/setup-res.c
>>>> +++ b/drivers/pci/setup-res.c
>>>> @@ -339,6 +339,14 @@ int pci_assign_resource(struct pci_dev *dev, int resno)
>>>>  		return -EINVAL;
>>>>  	}
>>>>  
>>>> +#ifdef CONFIG_64BIT
>>>> +	/*
>>>> +	 * force minimum page alignment for vfio pci usage
>>>> +	 * supposing there is enough pci memory resource on 64bit system
>>>> +	 */
>>>> +	if (res->flags & IORESOURCE_MEM)
>>>> +		align = max_t(resource_size_t, PAGE_SIZE, align);
>>>> +#endif
>>
>> Why is this under CONFIG_64BIT?  That doesn't have anything to do with
>> the BAR size.  The only reason I can see is that with CONFIG_64BIT=y,
>> we *might* have more MMIO space to use for BARs.
> yes, I assume that there is enough PCI memory resource with 64 bit system,
> after all it consumes more memory resource and brings out negative effect.
> For UIO and SRIOV requirements, they are most 64-bit server system, they
> can suffer from wasting some PCI memory resource.
Can we add extra config option and let architecture selects whether enable it?
Here is piece of code:

#ifdef CONFIG_PCI_MEMRES_PAGE_ALIGN
        /*
         * force minimum page alignment for vfio pci usage
         */
        if (res->flags & IORESOURCE_MEM)
                align = max_t(resource_size_t, PAGE_SIZE, align);
#endif

Regards
Bibo, Mao
> 
> Regards
> Bibo, Mao
>>
>>>>  	size = resource_size(res);
>>>>  	ret = _pci_assign_resource(dev, resno, size, align);
diff mbox series

Patch

diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 967f9a758923..55440ae0128d 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -339,6 +339,14 @@  int pci_assign_resource(struct pci_dev *dev, int resno)
 		return -EINVAL;
 	}
 
+#ifdef CONFIG_64BIT
+	/*
+	 * force minimum page alignment for vfio pci usage
+	 * supposing there is enough pci memory resource on 64bit system
+	 */
+	if (res->flags & IORESOURCE_MEM)
+		align = max_t(resource_size_t, PAGE_SIZE, align);
+#endif
 	size = resource_size(res);
 	ret = _pci_assign_resource(dev, resno, size, align);