diff mbox series

[v4,bpf-next,1/2] mm: Enforce VM_IOREMAP flag and range in ioremap_page_range.

Message ID 20240305030516.41519-2-alexei.starovoitov@gmail.com (mailing list archive)
State Accepted
Commit 3e49a866c9dcbd8173e4f3e491293619a9e81fa4
Headers show
Series mm: Enforce ioremap address space and introduce sparse vm_area | expand

Commit Message

Alexei Starovoitov March 5, 2024, 3:05 a.m. UTC
From: Alexei Starovoitov <ast@kernel.org>

There are various users of get_vm_area() + ioremap_page_range() APIs.
Enforce that get_vm_area() was requested as VM_IOREMAP type and range
passed to ioremap_page_range() matches created vm_area to avoid
accidentally ioremap-ing into wrong address range.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 mm/vmalloc.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Marek Szyprowski March 8, 2024, 5:14 p.m. UTC | #1
On 05.03.2024 04:05, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
>
> There are various users of get_vm_area() + ioremap_page_range() APIs.
> Enforce that get_vm_area() was requested as VM_IOREMAP type and range
> passed to ioremap_page_range() matches created vm_area to avoid
> accidentally ioremap-ing into wrong address range.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---

This patch landed in today's linux-next as commit 3e49a866c9dc ("mm: 
Enforce VM_IOREMAP flag and range in ioremap_page_range."). 
Unfortunately it triggers the following warning on all my test machines 
with PCI bridges. Here is an example reproduced with QEMU and ARM64 
'virt' machine:

pci-host-generic 4010000000.pcie: host bridge /pcie@10000000 ranges:
pci-host-generic 4010000000.pcie:       IO 0x003eff0000..0x003effffff -> 
0x0000000000
pci-host-generic 4010000000.pcie:      MEM 0x0010000000..0x003efeffff -> 
0x0010000000
pci-host-generic 4010000000.pcie:      MEM 0x8000000000..0xffffffffff -> 
0x8000000000
------------[ cut here ]------------
vm_area at addr fffffbfffe800000 is not marked as VM_IOREMAP
WARNING: CPU: 0 PID: 1 at mm/vmalloc.c:315 ioremap_page_range+0x8c/0x174
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc6+ #14694
Hardware name: linux,dummy-virt (DT)
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : ioremap_page_range+0x8c/0x174
lr : ioremap_page_range+0x8c/0x174
sp : ffff800083faba10
...
Call trace:
  ioremap_page_range+0x8c/0x174
  pci_remap_iospace+0x74/0x88
  devm_pci_remap_iospace+0x54/0xac
  devm_of_pci_bridge_init+0x160/0x1fc
  devm_pci_alloc_host_bridge+0xb4/0xd0
  pci_host_common_probe+0x44/0x1a0
  platform_probe+0x68/0xd8
  really_probe+0x148/0x2b4
  __driver_probe_device+0x78/0x12c
  driver_probe_device+0xdc/0x164
  __driver_attach+0x9c/0x1ac
  bus_for_each_dev+0x74/0xd4
  driver_attach+0x24/0x30
  bus_add_driver+0xe4/0x1e8
  driver_register+0x60/0x128
  __platform_driver_register+0x28/0x34
  gen_pci_driver_init+0x1c/0x28
  do_one_initcall+0x74/0x2f4
  kernel_init_freeable+0x28c/0x4dc
  kernel_init+0x24/0x1dc
  ret_from_fork+0x10/0x20
irq event stamp: 74360
hardirqs last  enabled at (74359): [<ffff80008012cb9c>] 
console_unlock+0x120/0x12c
hardirqs last disabled at (74360): [<ffff80008122daa0>] el1_dbg+0x24/0x8c
softirqs last  enabled at (71258): [<ffff800080010a60>] 
__do_softirq+0x4a0/0x4e8
softirqs last disabled at (71245): [<ffff8000800169b0>] 
____do_softirq+0x10/0x1c
---[ end trace 0000000000000000 ]---
pci-host-generic 4010000000.pcie: error -22: failed to map resource [io  
0x0000-0xffff]
pci-host-generic 4010000000.pcie: Memory resource size exceeds max for 
32 bits
pci-host-generic 4010000000.pcie: ECAM at [mem 
0x4010000000-0x401fffffff] for [bus 00-ff]
pci-host-generic 4010000000.pcie: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [bus 00-ff]
pci_bus 0000:00: root bus resource [mem 0x10000000-0x3efeffff]
pci_bus 0000:00: root bus resource [mem 0x8000000000-0xffffffffff]
pci 0000:00:00.0: [1b36:0008] type 00 class 0x060000 conventional PCI 
endpoint

It looks that PCI related code must be somehow adjusted for this change.

>   mm/vmalloc.c | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index d12a17fc0c17..f42f98a127d5 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -307,8 +307,21 @@ static int vmap_range_noflush(unsigned long addr, unsigned long end,
>   int ioremap_page_range(unsigned long addr, unsigned long end,
>   		phys_addr_t phys_addr, pgprot_t prot)
>   {
> +	struct vm_struct *area;
>   	int err;
>   
> +	area = find_vm_area((void *)addr);
> +	if (!area || !(area->flags & VM_IOREMAP)) {
> +		WARN_ONCE(1, "vm_area at addr %lx is not marked as VM_IOREMAP\n", addr);
> +		return -EINVAL;
> +	}
> +	if (addr != (unsigned long)area->addr ||
> +	    (void *)end != area->addr + get_vm_area_size(area)) {
> +		WARN_ONCE(1, "ioremap request [%lx,%lx) doesn't match vm_area [%lx, %lx)\n",
> +			  addr, end, (long)area->addr,
> +			  (long)area->addr + get_vm_area_size(area));
> +		return -ERANGE;
> +	}
>   	err = vmap_range_noflush(addr, end, phys_addr, pgprot_nx(prot),
>   				 ioremap_max_page_shift);
>   	flush_cache_vmap(addr, end);

Best regards
Alexei Starovoitov March 8, 2024, 5:21 p.m. UTC | #2
On Fri, Mar 8, 2024 at 9:14 AM Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
>
> On 05.03.2024 04:05, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast@kernel.org>
> >
> > There are various users of get_vm_area() + ioremap_page_range() APIs.
> > Enforce that get_vm_area() was requested as VM_IOREMAP type and range
> > passed to ioremap_page_range() matches created vm_area to avoid
> > accidentally ioremap-ing into wrong address range.
> >
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > ---
>
> This patch landed in today's linux-next as commit 3e49a866c9dc ("mm:
> Enforce VM_IOREMAP flag and range in ioremap_page_range.").
> Unfortunately it triggers the following warning on all my test machines
> with PCI bridges. Here is an example reproduced with QEMU and ARM64
> 'virt' machine:

Sorry about the breakage.
Here is the thread where we're discussing the fix:
https://lore.kernel.org/bpf/CAADnVQLP=dxBb+RiMGXoaCEuRrbK387J6B+pfzWKF_F=aRgCPQ@mail.gmail.com/
diff mbox series

Patch

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d12a17fc0c17..f42f98a127d5 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -307,8 +307,21 @@  static int vmap_range_noflush(unsigned long addr, unsigned long end,
 int ioremap_page_range(unsigned long addr, unsigned long end,
 		phys_addr_t phys_addr, pgprot_t prot)
 {
+	struct vm_struct *area;
 	int err;
 
+	area = find_vm_area((void *)addr);
+	if (!area || !(area->flags & VM_IOREMAP)) {
+		WARN_ONCE(1, "vm_area at addr %lx is not marked as VM_IOREMAP\n", addr);
+		return -EINVAL;
+	}
+	if (addr != (unsigned long)area->addr ||
+	    (void *)end != area->addr + get_vm_area_size(area)) {
+		WARN_ONCE(1, "ioremap request [%lx,%lx) doesn't match vm_area [%lx, %lx)\n",
+			  addr, end, (long)area->addr,
+			  (long)area->addr + get_vm_area_size(area));
+		return -ERANGE;
+	}
 	err = vmap_range_noflush(addr, end, phys_addr, pgprot_nx(prot),
 				 ioremap_max_page_shift);
 	flush_cache_vmap(addr, end);