Message ID | 20220118235244.540103-1-yury.norov@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | vmap(): don't allow invalid pages | expand |
On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote: > vmap() takes struct page *pages as one of arguments, and user may provide > an invalid pointer which would lead to DABT at address translation later. Could we spell out 'DABT'? Presumably that's an ARM-specific thing. Just like we don't say #PF for Intel page faults, I think this is probably a 'data abort'? > Currently, kernel checks the pages against NULL. In my case, however, the > address was not NULL, and was big enough so that the hardware generated > Address Size Abort on arm64. > > Interestingly, this abort happens even if copy_from_kernel_nofault() is > used, which is quite inconvenient for debugging purposes. > > This patch adds a pfn_valid() check into vmap() path, so that invalid > mapping will not be created. > > RFC: https://lkml.org/lkml/2022/1/18/815 > v1: use pfn_valid() instead of adding an arch-specific > arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint. > > Signed-off-by: Yury Norov <yury.norov@gmail.com> Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > mm/vmalloc.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index d2a00ad4e1dd..a4134ee56b10 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr, > return -EBUSY; > if (WARN_ON(!page)) > return -ENOMEM; > + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) > + return -EINVAL; > set_pte_at(&init_mm, addr, pte, mk_pte(page, prot)); > (*nr)++; > } while (pte++, addr += PAGE_SIZE, addr != end); > -- > 2.30.2 >
On 1/19/22 6:21 AM, Matthew Wilcox wrote: > On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote: >> vmap() takes struct page *pages as one of arguments, and user may provide >> an invalid pointer which would lead to DABT at address translation later. > > Could we spell out 'DABT'? Presumably that's an ARM-specific thing. > Just like we don't say #PF for Intel page faults, I think this is > probably a 'data abort'? Right, it is data abort. > >> Currently, kernel checks the pages against NULL. In my case, however, the >> address was not NULL, and was big enough so that the hardware generated >> Address Size Abort on arm64. Could you please provide the actual abort stack here with details. >> >> Interestingly, this abort happens even if copy_from_kernel_nofault() is >> used, which is quite inconvenient for debugging purposes. >> >> This patch adds a pfn_valid() check into vmap() path, so that invalid >> mapping will not be created. >> >> RFC: https://lkml.org/lkml/2022/1/18/815 >> v1: use pfn_valid() instead of adding an arch-specific >> arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint. This should be after the '---' below. >> >> Signed-off-by: Yury Norov <yury.norov@gmail.com> > > Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org> > >> --- >> mm/vmalloc.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/mm/vmalloc.c b/mm/vmalloc.c >> index d2a00ad4e1dd..a4134ee56b10 100644 >> --- a/mm/vmalloc.c >> +++ b/mm/vmalloc.c >> @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr, >> return -EBUSY; >> if (WARN_ON(!page)) >> return -ENOMEM; >> + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) >> + return -EINVAL; >> set_pte_at(&init_mm, addr, pte, mk_pte(page, prot)); >> (*nr)++; >> } while (pte++, addr += PAGE_SIZE, addr != end); >> -- >> 2.30.2 >> Why should not this just scan over the entire user provided struct page array and make sure that all pages there in are valid via above method, but in vmap() itself before calling vmap_pages_range(). Because seems like a single invalid page detected in vmap_pages_pte_range() will anyways abort the entire vmap(). This will also enable us to drop the existing NULL check above.
Hi, I replied ot the original RFC before spotting this; duplicating those comments here because I think they apply regardless of the mechanism used to work around this. On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote: > vmap() takes struct page *pages as one of arguments, and user may provide > an invalid pointer which would lead to DABT at address translation later. > > Currently, kernel checks the pages against NULL. In my case, however, the > address was not NULL, and was big enough so that the hardware generated > Address Size Abort on arm64. Can you give an example of when this might happen? It sounds like you're actually hitting this, so a backtrace would be nice. I'm a bit confused as to when why we'd try to vmap() pages that we didn't have a legitimate struct page for -- where did these addresses come from? It sounds like this is going wrong at a higher level, and we're passing entirely bogus struct page pointers around. This seems like the sort of thing DEBUG_VIRTUAL or similar should check when we initially generate the struct page pointer. > Interestingly, this abort happens even if copy_from_kernel_nofault() is > used, which is quite inconvenient for debugging purposes. I can go take a look at this, but TBH we never expect to take an address size fault to begin with, so this is arguably correct -- it's an internal consistency problem. > This patch adds a pfn_valid() check into vmap() path, so that invalid > mapping will not be created. > > RFC: https://lkml.org/lkml/2022/1/18/815 > v1: use pfn_valid() instead of adding an arch-specific > arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint. > > Signed-off-by: Yury Norov <yury.norov@gmail.com> > --- > mm/vmalloc.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index d2a00ad4e1dd..a4134ee56b10 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr, > return -EBUSY; > if (WARN_ON(!page)) > return -ENOMEM; > + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) > + return -EINVAL; My fear here is that for this to fire, we've already passed a bogus struct page pointer around the intermediate infrastructure, and any of that might try to use it in unsafe ways (in future even if we don't use it today). I think the fundamental issue here is that we generate a bogus struct page pointer at all, and knowing where that came from would help to fix that. Thanks, Mark. > set_pte_at(&init_mm, addr, pte, mk_pte(page, prot)); > (*nr)++; > } while (pte++, addr += PAGE_SIZE, addr != end); > -- > 2.30.2 >
On 2022-01-18 23:52, Yury Norov wrote: > vmap() takes struct page *pages as one of arguments, and user may provide > an invalid pointer which would lead to DABT at address translation later. > > Currently, kernel checks the pages against NULL. In my case, however, the > address was not NULL, and was big enough so that the hardware generated > Address Size Abort on arm64. > > Interestingly, this abort happens even if copy_from_kernel_nofault() is > used, which is quite inconvenient for debugging purposes. > > This patch adds a pfn_valid() check into vmap() path, so that invalid > mapping will not be created. > > RFC: https://lkml.org/lkml/2022/1/18/815 > v1: use pfn_valid() instead of adding an arch-specific > arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint. > > Signed-off-by: Yury Norov <yury.norov@gmail.com> > --- > mm/vmalloc.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index d2a00ad4e1dd..a4134ee56b10 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr, > return -EBUSY; > if (WARN_ON(!page)) > return -ENOMEM; > + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) Is it page_to_pfn() guaranteed to work without blowing up if page is invalid in the first place? Looking at the CONFIG_SPARSEMEM case I'm not sure that's true... Robin. > + return -EINVAL; > set_pte_at(&init_mm, addr, pte, mk_pte(page, prot)); > (*nr)++; > } while (pte++, addr += PAGE_SIZE, addr != end);
On Wed, Jan 19, 2022 at 01:28:14PM +0000, Robin Murphy wrote: > > + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) > > Is it page_to_pfn() guaranteed to work without blowing up if page is invalid > in the first place? Looking at the CONFIG_SPARSEMEM case I'm not sure that's > true... Even if it does blow up, at least it's blowing up here where someone can start to debug it, rather than blowing up on first access, where we no longer have the invlid struct page pointer. I don't think we have a 'page_valid' function which will tell us whether a random pointer is actually a struct page or not.
On Wed, Jan 19, 2022 at 3:17 AM Mark Rutland <mark.rutland@arm.com> wrote: > > Hi, > > I replied ot the original RFC before spotting this; duplicating those comments > here because I think they apply regardless of the mechanism used to work around > this. > > On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote: > > vmap() takes struct page *pages as one of arguments, and user may provide > > an invalid pointer which would lead to DABT at address translation later. > > > > Currently, kernel checks the pages against NULL. In my case, however, the > > address was not NULL, and was big enough so that the hardware generated > > Address Size Abort on arm64. > > Can you give an example of when this might happen? It sounds like you're > actually hitting this, so a backtrace would be nice. > > I'm a bit confused as to when why we'd try to vmap() pages that we > didn't have a legitimate struct page for -- where did these addresses > come from? > > It sounds like this is going wrong at a higher level, and we're passing > entirely bogus struct page pointers around. This seems like the sort of > thing DEBUG_VIRTUAL or similar should check when we initially generate > the struct page pointer. Hi Mark, This is an out-of-tree code that does: vaddr1 = dma_alloc_coherent() page = virt_to_page() // Wrong here ... vaddr2 = vmap(page) memset(vaddr2) // Fault here virt_to_page() returns a wrong pointer if vaddr1 is not a linear kernel address. The problem is that vmap() populates pte with bad pfn successfully, and it's much harder to debug at memory access time. > > Interestingly, this abort happens even if copy_from_kernel_nofault() is > > used, which is quite inconvenient for debugging purposes. > > I can go take a look at this, but TBH we never expect to take an address size > fault to begin with, so this is arguably correct -- it's an internal > consistency problem. > > > This patch adds a pfn_valid() check into vmap() path, so that invalid > > mapping will not be created. > > > > RFC: https://lkml.org/lkml/2022/1/18/815 > > v1: use pfn_valid() instead of adding an arch-specific > > arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint. > > > > Signed-off-by: Yury Norov <yury.norov@gmail.com> > > --- > > mm/vmalloc.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index d2a00ad4e1dd..a4134ee56b10 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr, > > return -EBUSY; > > if (WARN_ON(!page)) > > return -ENOMEM; > > + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) > > + return -EINVAL; > > My fear here is that for this to fire, we've already passed a bogus struct page > pointer around the intermediate infrastructure, and any of that might try to > use it in unsafe ways (in future even if we don't use it today). > > I think the fundamental issue here is that we generate a bogus struct page > pointer at all, and knowing where that came from would help to fix that. You're right. That's why WARN_ON() is used for the page == null in the code above, I believe, - to let client code know that something goes wrong, and it's not a regular ENOMEM situation. Thanks, Yury > Thanks, > Mark. > > > set_pte_at(&init_mm, addr, pte, mk_pte(page, prot)); > > (*nr)++; > > } while (pte++, addr += PAGE_SIZE, addr != end); > > -- > > 2.30.2 > >
On Tue, Jan 18, 2022 at 10:17 PM Anshuman Khandual <anshuman.khandual@arm.com> wrote: > > > > On 1/19/22 6:21 AM, Matthew Wilcox wrote: > > On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote: > >> vmap() takes struct page *pages as one of arguments, and user may provide > >> an invalid pointer which would lead to DABT at address translation later. > > > > Could we spell out 'DABT'? Presumably that's an ARM-specific thing. > > Just like we don't say #PF for Intel page faults, I think this is > > probably a 'data abort'? > > Right, it is data abort. > > > > >> Currently, kernel checks the pages against NULL. In my case, however, the > >> address was not NULL, and was big enough so that the hardware generated > >> Address Size Abort on arm64. > > Could you please provide the actual abort stack here with details. [ 665.484101] Unhandled fault at 0xffff8000252cd000 [ 665.488807] Mem abort info: [ 665.491617] ESR = 0x96000043 [ 665.494675] EC = 0x25: DABT (current EL), IL = 32 bits [ 665.499985] SET = 0, FnV = 0 [ 665.503039] EA = 0, S1PTW = 0 [ 665.506167] Data abort info: [ 665.509047] ISV = 0, ISS = 0x00000043 [ 665.512882] CM = 0, WnR = 1 [ 665.515851] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000818cb000 [ 665.522550] [ffff8000252cd000] pgd=000000affcfff003, pud=000000affcffe003, pmd=0000008fad8c3003, pte=00688000a5217713 [ 665.533160] Internal error: level 3 address size fault: 96000043 [#1] SMP [ 665.539936] Modules linked in: [...] [ 665.616212] CPU: 178 PID: 13199 Comm: test Tainted: P OE 5.4.0-84-generic #94~18.04.1-Ubuntu [ 665.626806] Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.0.6 07/10/2018 [ 665.636618] pstate: 80400009 (Nzcv daif +PAN -UAO) [ 665.641407] pc : __memset+0x38/0x188 [ 665.645146] lr : test+0xcc/0x3f8 [ 665.650184] sp : ffff8000359bb840 [ 665.653486] x29: ffff8000359bb840 x28: 0000000000000000 [ 665.658785] x27: 0000000000000000 x26: 0000000000231000 [ 665.664083] x25: ffff00ae660f6110 x24: ffff00ae668cb800 [ 665.669382] x23: 0000000000000001 x22: ffff00af533e5000 [ 665.674680] x21: 0000000000001000 x20: 0000000000000000 [ 665.679978] x19: ffff00ae66950000 x18: ffffffffffffffff [ 665.685276] x17: 00000000588636a5 x16: 0000000000000013 [ 665.690574] x15: ffffffffffffffff x14: 000000000007ffff [ 665.695872] x13: 0000000080000000 x12: 0140000000000000 [ 665.701170] x11: 0000000000000041 x10: ffff8000652cd000 [ 665.706468] x9 : ffff8000252cf000 x8 : ffff8000252cd000 [ 665.711767] x7 : 0303030303030303 x6 : 0000000000001000 [ 665.717065] x5 : ffff8000252cd000 x4 : 0000000000000000 [ 665.722363] x3 : ffff8000252cdfff x2 : 0000000000000001 [ 665.727661] x1 : 0000000000000003 x0 : ffff8000252cd000 [ 665.732960] Call trace: [ 665.735395] __memset+0x38/0x188 [...] TCR_EL1 is 34b5503510, and so TCR_EL1.IPS is 0b100. The pfn that caused address size abort has bit#47 set, which is far above 16TB that is allowed by ips == 100. > >> > >> Interestingly, this abort happens even if copy_from_kernel_nofault() is > >> used, which is quite inconvenient for debugging purposes. > >> > >> This patch adds a pfn_valid() check into vmap() path, so that invalid > >> mapping will not be created. > >> > >> RFC: https://lkml.org/lkml/2022/1/18/815 > >> v1: use pfn_valid() instead of adding an arch-specific > >> arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint. > > This should be after the '---' below. > > >> > >> Signed-off-by: Yury Norov <yury.norov@gmail.com> > > > > Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org> > > > >> --- > >> mm/vmalloc.c | 2 ++ > >> 1 file changed, 2 insertions(+) > >> > >> diff --git a/mm/vmalloc.c b/mm/vmalloc.c > >> index d2a00ad4e1dd..a4134ee56b10 100644 > >> --- a/mm/vmalloc.c > >> +++ b/mm/vmalloc.c > >> @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr, > >> return -EBUSY; > >> if (WARN_ON(!page)) > >> return -ENOMEM; > >> + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) > >> + return -EINVAL; > >> set_pte_at(&init_mm, addr, pte, mk_pte(page, prot)); > >> (*nr)++; > >> } while (pte++, addr += PAGE_SIZE, addr != end); > >> -- > >> 2.30.2 > >> > > Why should not this just scan over the entire user provided struct page > array and make sure that all pages there in are valid via above method, > but in vmap() itself before calling vmap_pages_range(). Because seems > like a single invalid page detected in vmap_pages_pte_range() will > anyways abort the entire vmap(). This will also enable us to drop the > existing NULL check above. I can do this, but why is it any better than the current approach? Thanks, Yury
On Wed, Jan 19, 2022 at 04:27:32PM +0000, Matthew Wilcox wrote: > On Wed, Jan 19, 2022 at 01:28:14PM +0000, Robin Murphy wrote: > > > + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) > > > > Is it page_to_pfn() guaranteed to work without blowing up if page is invalid > > in the first place? Looking at the CONFIG_SPARSEMEM case I'm not sure that's > > true... > > Even if it does blow up, at least it's blowing up here where someone > can start to debug it, rather than blowing up on first access, where > we no longer have the invlid struct page pointer. > > I don't think we have a 'page_valid' function which will tell us whether > a random pointer is actually a struct page or not. Isn't it supposed to be: if (!pfn_valid(pfn)) { handle invalid pfn; } page = pfn_to_page(pfn); Anything else - even trying to convert an invalid page back to a pfn, could well be unreliable (sparsemem or discontigmem).
On Wed, Jan 19, 2022 at 05:54:15PM +0000, Russell King (Oracle) wrote: > On Wed, Jan 19, 2022 at 04:27:32PM +0000, Matthew Wilcox wrote: > > On Wed, Jan 19, 2022 at 01:28:14PM +0000, Robin Murphy wrote: > > > > + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) > > > > > > Is it page_to_pfn() guaranteed to work without blowing up if page is invalid > > > in the first place? Looking at the CONFIG_SPARSEMEM case I'm not sure that's > > > true... > > > > Even if it does blow up, at least it's blowing up here where someone > > can start to debug it, rather than blowing up on first access, where > > we no longer have the invlid struct page pointer. > > > > I don't think we have a 'page_valid' function which will tell us whether > > a random pointer is actually a struct page or not. > > Isn't it supposed to be: > > if (!pfn_valid(pfn)) { > handle invalid pfn; > } > > page = pfn_to_page(pfn); > > Anything else - even trying to convert an invalid page back to a pfn, > could well be unreliable (sparsemem or discontigmem). This function is passed an array of pages. We have no way of doing what you propose.
On Wed, Jan 19, 2022 at 09:00:23AM -0800, Yury Norov wrote: > On Wed, Jan 19, 2022 at 3:17 AM Mark Rutland <mark.rutland@arm.com> wrote: > > > > Hi, > > > > I replied ot the original RFC before spotting this; duplicating those comments > > here because I think they apply regardless of the mechanism used to work around > > this. > > > > On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote: > > > vmap() takes struct page *pages as one of arguments, and user may provide > > > an invalid pointer which would lead to DABT at address translation later. > > > > > > Currently, kernel checks the pages against NULL. In my case, however, the > > > address was not NULL, and was big enough so that the hardware generated > > > Address Size Abort on arm64. > > > > Can you give an example of when this might happen? It sounds like you're > > actually hitting this, so a backtrace would be nice. > > > > I'm a bit confused as to when why we'd try to vmap() pages that we > > didn't have a legitimate struct page for -- where did these addresses > > come from? > > > > It sounds like this is going wrong at a higher level, and we're passing > > entirely bogus struct page pointers around. This seems like the sort of > > thing DEBUG_VIRTUAL or similar should check when we initially generate > > the struct page pointer. > > Hi Mark, > > This is an out-of-tree code that does: > > vaddr1 = dma_alloc_coherent() > page = virt_to_page() // Wrong here > ... > vaddr2 = vmap(page) > memset(vaddr2) // Fault here > > virt_to_page() returns a wrong pointer if vaddr1 is not a linear kernel address. > The problem is that vmap() populates pte with bad pfn successfully, and it's > much harder to debug at memory access time. Ah, ok. FWIW, that case should be caught by DEBUG_VIRTUAL when that's enabled. It would be nice to put that within the commit message, since that clearly indicates this is about catching a mistake in buggy code, and indicates where to go looking next. > > > Interestingly, this abort happens even if copy_from_kernel_nofault() is > > > used, which is quite inconvenient for debugging purposes. > > > > I can go take a look at this, but TBH we never expect to take an address size > > fault to begin with, so this is arguably correct -- it's an internal > > consistency problem. > > > > > This patch adds a pfn_valid() check into vmap() path, so that invalid > > > mapping will not be created. > > > > > > RFC: https://lkml.org/lkml/2022/1/18/815 > > > v1: use pfn_valid() instead of adding an arch-specific > > > arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint. > > > > > > Signed-off-by: Yury Norov <yury.norov@gmail.com> > > > --- > > > mm/vmalloc.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > > index d2a00ad4e1dd..a4134ee56b10 100644 > > > --- a/mm/vmalloc.c > > > +++ b/mm/vmalloc.c > > > @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr, > > > return -EBUSY; > > > if (WARN_ON(!page)) > > > return -ENOMEM; > > > + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) > > > + return -EINVAL; > > > > My fear here is that for this to fire, we've already passed a bogus struct page > > pointer around the intermediate infrastructure, and any of that might try to > > use it in unsafe ways (in future even if we don't use it today). > > > > I think the fundamental issue here is that we generate a bogus struct page > > pointer at all, and knowing where that came from would help to fix that. > > You're right. That's why WARN_ON() is used for the page == null in the code > above, I believe, - to let client code know that something goes wrong, and it's > not a regular ENOMEM situation. Sure; with all the above in mind I think it's fine for this to be best-effort (e.g. as Robin noted page_to_pfn() might consume parts of the struct page before this), since either way that will indicate roughly where the problem is. As above, it would just be nice for the commit message to be a little more explicit as to that. Thanks, Mark.
On 2022-01-19 16:27, Matthew Wilcox wrote: > On Wed, Jan 19, 2022 at 01:28:14PM +0000, Robin Murphy wrote: >>> + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) >> >> Is it page_to_pfn() guaranteed to work without blowing up if page is invalid >> in the first place? Looking at the CONFIG_SPARSEMEM case I'm not sure that's >> true... > > Even if it does blow up, at least it's blowing up here where someone > can start to debug it, rather than blowing up on first access, where > we no longer have the invlid struct page pointer. But if that were the case then we'd blow up on the following line when mk_pte(page, prot) ends up calling it same anyway. Indeed it's arguably the best-case scenario since it would also blow up in page_address() if we hit the vmap_range loop rather than going down the vmap_small_pages_range_noflush() path. Furthermore, if you *are* lucky enough to take a fault upon accessing a bogus mapping, then surely a phys_to_page() calculation on whatever ended up in the PTE should get you back the original "pointer" anyway, shouldn't it? Sure it's a bit more work to extract the caller out of the VMA if necessary, but hey, that's debugging! Maybe vmap() failing means you then pass the offending nonsense to __free_pages() and that ruins your day anyway... The implications are infinitely worse if you've mapped something that did happen to be a valid page, but wasn't the *right* page, such that you don't fault but corrupt things or trigger a fatal system error instead. I'd echo Mark's point that if we can't trust a page pointer to be correct then we've already lost. In general the proportion of "wrong" pointers one can viably attempt to detect is so small that it's rarely ever worth trying, and the cases that are so obviously wrong tend to show up well enough in normal operation (although NULL-safety is of course a bit of a special case when it can simplify API usage). > I don't think we have a 'page_valid' function which will tell us whether > a random pointer is actually a struct page or not. Indeed, my impression is that the only legitimate way to get hold of a page pointer without assumed provenance is via pfn_to_page(), which is where pfn_valid() comes in. Thus pfn_valid(page_to_pfn()) really *should* be a tautology. I guess in this instance it would be technically feasible to implement a function which checks "is this a correctly-aligned pointer within memmap bounds", but IMO that would be a massive step in the wrong direction. We're developers; sometimes we introduce bugs when developing code, and it takes effort to debug them, but that still doesn't make it a good idea to optimise normal code paths for the expectation of writing new catastrophically-bad bugs. Plus logically if such a "page_valid()" check could be justified at all then that should rightfully lead to a churn-fest of adding it to pretty much every interface which accepts page pointers - one half of vmap() is hardly special. FWIW, If anything I reckon a DEBUG_VM option that made checks within page_to_x/x_to_page as appropriate would help Yury's issue just as much, while having the potential to be considerably more useful in general. Cheers, Robin.
On Wed, Jan 19, 2022 at 06:01:24PM +0000, Matthew Wilcox wrote: > On Wed, Jan 19, 2022 at 05:54:15PM +0000, Russell King (Oracle) wrote: > > On Wed, Jan 19, 2022 at 04:27:32PM +0000, Matthew Wilcox wrote: > > > On Wed, Jan 19, 2022 at 01:28:14PM +0000, Robin Murphy wrote: > > > > > + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) > > > > > > > > Is it page_to_pfn() guaranteed to work without blowing up if page is invalid > > > > in the first place? Looking at the CONFIG_SPARSEMEM case I'm not sure that's > > > > true... > > > > > > Even if it does blow up, at least it's blowing up here where someone > > > can start to debug it, rather than blowing up on first access, where > > > we no longer have the invlid struct page pointer. > > > > > > I don't think we have a 'page_valid' function which will tell us whether > > > a random pointer is actually a struct page or not. > > > > Isn't it supposed to be: > > > > if (!pfn_valid(pfn)) { > > handle invalid pfn; > > } > > > > page = pfn_to_page(pfn); > > > > Anything else - even trying to convert an invalid page back to a pfn, > > could well be unreliable (sparsemem or discontigmem). > > This function is passed an array of pages. We have no way of doing > what you propose. You can't go from a struct page to "this is valid", it's too late by the time you call vmap() - that's my fundamental point. If the translation from a PFN to a struct page can return pointers to something that isn't a valid struct page, then it can also (with sparsemem) return a pointer to _another_ struct page that could well be valid depending on how the struct page arrays are laid out in memory. To repeat: once you have a struct page, it's too late to determine if the struct page is valid.
On Wed, Jan 19, 2022 at 06:43:10PM +0000, Robin Murphy wrote: > Indeed, my impression is that the only legitimate way to get hold of a page > pointer without assumed provenance is via pfn_to_page(), which is where > pfn_valid() comes in. Thus pfn_valid(page_to_pfn()) really *should* be a > tautology. That can only be true if pfn == page_to_pfn(pfn_to_page(pfn)) for all values of pfn. Given how pfn_to_page() is defined in the sparsemem case: #define __pfn_to_page(pfn) \ ({ unsigned long __pfn = (pfn); \ struct mem_section *__sec = __pfn_to_section(__pfn); \ __section_mem_map_addr(__sec) + __pfn; \ }) #define page_to_pfn __page_to_pfn that isn't the case, especially when looking at page_to_pfn(): #define __page_to_pfn(pg) \ ({ const struct page *__pg = (pg); \ int __sec = page_to_section(__pg); \ (unsigned long)(__pg - __section_mem_map_addr(__nr_to_section(__sec))); \ }) Where: static inline unsigned long page_to_section(const struct page *page) { return (page->flags >> SECTIONS_PGSHIFT) & SECTIONS_MASK; } So if page_to_section() returns something that is, e.g. zero for an invalid page in a non-zero section, you're not going to end up with the right pfn from page_to_pfn(). As I've said now a couple of times, trying to determine of a struct page pointer is valid is the wrong question to be asking.
On Wed, Jan 19, 2022 at 06:57:34PM +0000, Russell King (Oracle) wrote: > On Wed, Jan 19, 2022 at 06:01:24PM +0000, Matthew Wilcox wrote: > > On Wed, Jan 19, 2022 at 05:54:15PM +0000, Russell King (Oracle) wrote: > > > On Wed, Jan 19, 2022 at 04:27:32PM +0000, Matthew Wilcox wrote: > > > > On Wed, Jan 19, 2022 at 01:28:14PM +0000, Robin Murphy wrote: > > > > > > + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) > > > > > > > > > > Is it page_to_pfn() guaranteed to work without blowing up if page is invalid > > > > > in the first place? Looking at the CONFIG_SPARSEMEM case I'm not sure that's > > > > > true... > > > > > > > > Even if it does blow up, at least it's blowing up here where someone > > > > can start to debug it, rather than blowing up on first access, where > > > > we no longer have the invlid struct page pointer. > > > > > > > > I don't think we have a 'page_valid' function which will tell us whether > > > > a random pointer is actually a struct page or not. > > > > > > Isn't it supposed to be: > > > > > > if (!pfn_valid(pfn)) { > > > handle invalid pfn; > > > } > > > > > > page = pfn_to_page(pfn); > > > > > > Anything else - even trying to convert an invalid page back to a pfn, > > > could well be unreliable (sparsemem or discontigmem). > > > > This function is passed an array of pages. We have no way of doing > > what you propose. > > You can't go from a struct page to "this is valid", it's too late by the > time you call vmap() - that's my fundamental point. Yes, and we have debugging code in __virt_to_phys() that would have caught this, had Yury enabled CONFIG_DEBUG_VIRTUAL. My point is that in this instance, page_to_pfn() doesn't crash, which lets vmap() set up a mapping to a completely bogus physical address. We're better off checking pfn_valid() here than not. > If the translation from a PFN to a struct page can return pointers to > something that isn't a valid struct page, then it can also (with > sparsemem) return a pointer to _another_ struct page that could well > be valid depending on how the struct page arrays are laid out in > memory. Sure, it's not going to catch everything. But I don't think that letting perfect be the enemy of the good here is the right approach.
On Wed, Jan 19, 2022 at 07:35:33PM +0000, Matthew Wilcox wrote: > On Wed, Jan 19, 2022 at 06:57:34PM +0000, Russell King (Oracle) wrote: > > On Wed, Jan 19, 2022 at 06:01:24PM +0000, Matthew Wilcox wrote: > > > On Wed, Jan 19, 2022 at 05:54:15PM +0000, Russell King (Oracle) wrote: > > > > On Wed, Jan 19, 2022 at 04:27:32PM +0000, Matthew Wilcox wrote: > > > > > On Wed, Jan 19, 2022 at 01:28:14PM +0000, Robin Murphy wrote: > > > > > > > + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) > > > > > > > > > > > > Is it page_to_pfn() guaranteed to work without blowing up if page is invalid > > > > > > in the first place? Looking at the CONFIG_SPARSEMEM case I'm not sure that's > > > > > > true... > > > > > > > > > > Even if it does blow up, at least it's blowing up here where someone > > > > > can start to debug it, rather than blowing up on first access, where > > > > > we no longer have the invlid struct page pointer. > > > > > > > > > > I don't think we have a 'page_valid' function which will tell us whether > > > > > a random pointer is actually a struct page or not. > > > > > > > > Isn't it supposed to be: > > > > > > > > if (!pfn_valid(pfn)) { > > > > handle invalid pfn; > > > > } > > > > > > > > page = pfn_to_page(pfn); > > > > > > > > Anything else - even trying to convert an invalid page back to a pfn, > > > > could well be unreliable (sparsemem or discontigmem). > > > > > > This function is passed an array of pages. We have no way of doing > > > what you propose. > > > > You can't go from a struct page to "this is valid", it's too late by the > > time you call vmap() - that's my fundamental point. > > Yes, and we have debugging code in __virt_to_phys() that would have > caught this, had Yury enabled CONFIG_DEBUG_VIRTUAL. My point is that > in this instance, page_to_pfn() doesn't crash, which lets vmap() set > up a mapping to a completely bogus physical address. We're better > off checking pfn_valid() here than not. I don't disagree that pfn_valid() will catch _some_ but it should, no, must not be a subsitute for ensuring that the proper checks are done when creating e.g. an array of struct pages.
On 1/19/22 10:52 PM, Yury Norov wrote: >> Why should not this just scan over the entire user provided struct page >> array and make sure that all pages there in are valid via above method, >> but in vmap() itself before calling vmap_pages_range(). Because seems >> like a single invalid page detected in vmap_pages_pte_range() will >> anyways abort the entire vmap(). This will also enable us to drop the >> existing NULL check above. > > I can do this, but why is it any better than the current approach? Because it will just return on the first instance where the valid page check fails, saving us some CPU cycles and an incomplete mapping ?
On Thu, Jan 20, 2022 at 09:07:11AM +0530, Anshuman Khandual wrote: > > > On 1/19/22 10:52 PM, Yury Norov wrote: > >> Why should not this just scan over the entire user provided struct page > >> array and make sure that all pages there in are valid via above method, > >> but in vmap() itself before calling vmap_pages_range(). Because seems > >> like a single invalid page detected in vmap_pages_pte_range() will > >> anyways abort the entire vmap(). This will also enable us to drop the > >> existing NULL check above. > > > > I can do this, but why is it any better than the current approach? > > Because it will just return on the first instance where the valid page > check fails, saving us some CPU cycles and an incomplete mapping ? The valid page check is never intended to fail! If you're worried about the efficiency of doing this, you have seriously confused your priorities.
On 2022-01-19 19:12, Russell King (Oracle) wrote: > On Wed, Jan 19, 2022 at 06:43:10PM +0000, Robin Murphy wrote: >> Indeed, my impression is that the only legitimate way to get hold of a page >> pointer without assumed provenance is via pfn_to_page(), which is where >> pfn_valid() comes in. Thus pfn_valid(page_to_pfn()) really *should* be a >> tautology. > > That can only be true if pfn == page_to_pfn(pfn_to_page(pfn)) for all > values of pfn. > > Given how pfn_to_page() is defined in the sparsemem case: > > #define __pfn_to_page(pfn) \ > ({ unsigned long __pfn = (pfn); \ > struct mem_section *__sec = __pfn_to_section(__pfn); \ > __section_mem_map_addr(__sec) + __pfn; \ > }) > #define page_to_pfn __page_to_pfn > > that isn't the case, especially when looking at page_to_pfn(): > > #define __page_to_pfn(pg) \ > ({ const struct page *__pg = (pg); \ > int __sec = page_to_section(__pg); \ > (unsigned long)(__pg - __section_mem_map_addr(__nr_to_section(__sec))); \ > }) > > Where: > > static inline unsigned long page_to_section(const struct page *page) > { > return (page->flags >> SECTIONS_PGSHIFT) & SECTIONS_MASK; > } > > So if page_to_section() returns something that is, e.g. zero for an > invalid page in a non-zero section, you're not going to end up with > the right pfn from page_to_pfn(). Right, I emphasised "should" in an attempt to imply "in the absence of serious bugs that have further-reaching consequences anyway". > As I've said now a couple of times, trying to determine of a struct > page pointer is valid is the wrong question to be asking. And doing so in one single place, on the justification of avoiding an incredibly niche symptom, is even more so. Not to mention that an address size fault is one of the best possible outcomes anyway, vs. the untold damage that may stem from accesses actually going through to random parts of the physical memory map. Robin.
On Thu, Jan 20, 2022 at 12:22:35PM +0000, Robin Murphy wrote: > On 2022-01-19 19:12, Russell King (Oracle) wrote: > > On Wed, Jan 19, 2022 at 06:43:10PM +0000, Robin Murphy wrote: > > > Indeed, my impression is that the only legitimate way to get hold of a page > > > pointer without assumed provenance is via pfn_to_page(), which is where > > > pfn_valid() comes in. Thus pfn_valid(page_to_pfn()) really *should* be a > > > tautology. > > > > That can only be true if pfn == page_to_pfn(pfn_to_page(pfn)) for all > > values of pfn. > > > > Given how pfn_to_page() is defined in the sparsemem case: > > > > #define __pfn_to_page(pfn) \ > > ({ unsigned long __pfn = (pfn); \ > > struct mem_section *__sec = __pfn_to_section(__pfn); \ > > __section_mem_map_addr(__sec) + __pfn; \ > > }) > > #define page_to_pfn __page_to_pfn > > > > that isn't the case, especially when looking at page_to_pfn(): > > > > #define __page_to_pfn(pg) \ > > ({ const struct page *__pg = (pg); \ > > int __sec = page_to_section(__pg); \ > > (unsigned long)(__pg - __section_mem_map_addr(__nr_to_section(__sec))); \ > > }) > > > > Where: > > > > static inline unsigned long page_to_section(const struct page *page) > > { > > return (page->flags >> SECTIONS_PGSHIFT) & SECTIONS_MASK; > > } > > > > So if page_to_section() returns something that is, e.g. zero for an > > invalid page in a non-zero section, you're not going to end up with > > the right pfn from page_to_pfn(). > > Right, I emphasised "should" in an attempt to imply "in the absence of > serious bugs that have further-reaching consequences anyway". > > > As I've said now a couple of times, trying to determine of a struct > > page pointer is valid is the wrong question to be asking. > > And doing so in one single place, on the justification of avoiding an > incredibly niche symptom, is even more so. Not to mention that an address > size fault is one of the best possible outcomes anyway, vs. the untold > damage that may stem from accesses actually going through to random parts of > the physical memory map. I don't see it as a "niche" symptom. If we start off with the struct page being invalid, then the result of page_to_pfn() can not be relied upon to produce something that is meaningful - which is exactly why the vmap() issue arises. With a pfn_valid() check, we at least know that the PFN points at memory. However, that memory could be _anything_ in the system - it could be the kernel image, and it could give userspace access to change kernel code. So, while it is useful to do a pfn_valid() check in vmap(), as I said to willy, this must _not_ be the primary check. It should IMHO use WARN_ON() to make it blatently obvious that it should be something we expect _not_ to trigger under normal circumstances, but is there to catch programming errors elsewhere.
On 2022-01-20 13:03, Russell King (Oracle) wrote: > On Thu, Jan 20, 2022 at 12:22:35PM +0000, Robin Murphy wrote: >> On 2022-01-19 19:12, Russell King (Oracle) wrote: >>> On Wed, Jan 19, 2022 at 06:43:10PM +0000, Robin Murphy wrote: >>>> Indeed, my impression is that the only legitimate way to get hold of a page >>>> pointer without assumed provenance is via pfn_to_page(), which is where >>>> pfn_valid() comes in. Thus pfn_valid(page_to_pfn()) really *should* be a >>>> tautology. >>> >>> That can only be true if pfn == page_to_pfn(pfn_to_page(pfn)) for all >>> values of pfn. >>> >>> Given how pfn_to_page() is defined in the sparsemem case: >>> >>> #define __pfn_to_page(pfn) \ >>> ({ unsigned long __pfn = (pfn); \ >>> struct mem_section *__sec = __pfn_to_section(__pfn); \ >>> __section_mem_map_addr(__sec) + __pfn; \ >>> }) >>> #define page_to_pfn __page_to_pfn >>> >>> that isn't the case, especially when looking at page_to_pfn(): >>> >>> #define __page_to_pfn(pg) \ >>> ({ const struct page *__pg = (pg); \ >>> int __sec = page_to_section(__pg); \ >>> (unsigned long)(__pg - __section_mem_map_addr(__nr_to_section(__sec))); \ >>> }) >>> >>> Where: >>> >>> static inline unsigned long page_to_section(const struct page *page) >>> { >>> return (page->flags >> SECTIONS_PGSHIFT) & SECTIONS_MASK; >>> } >>> >>> So if page_to_section() returns something that is, e.g. zero for an >>> invalid page in a non-zero section, you're not going to end up with >>> the right pfn from page_to_pfn(). >> >> Right, I emphasised "should" in an attempt to imply "in the absence of >> serious bugs that have further-reaching consequences anyway". >> >>> As I've said now a couple of times, trying to determine of a struct >>> page pointer is valid is the wrong question to be asking. >> >> And doing so in one single place, on the justification of avoiding an >> incredibly niche symptom, is even more so. Not to mention that an address >> size fault is one of the best possible outcomes anyway, vs. the untold >> damage that may stem from accesses actually going through to random parts of >> the physical memory map. > > I don't see it as a "niche" symptom. The commit message specifically cites a Data Abort "at address translation later". Broadly speaking, a Data Abort due to an address size fault only occurs if you've been lucky enough that the bogus PA which got mapped is so spectacularly wrong that it's beyond the range configured in TCR.IPS. How many other architectures even have a mechanism like that? > If we start off with the struct page being invalid, then the result of > page_to_pfn() can not be relied upon to produce something that is > meaningful - which is exactly why the vmap() issue arises. > > With a pfn_valid() check, we at least know that the PFN points at > memory. No, we know it points to some PA space which has a struct page to represent it. pfn_valid() only says that pfn_to_page() will yield a valid result. That also includes things like reserved pages covering non-RAM areas, where a kernel VA mapping existing at all could potentially be fatal to the system even if it's never explicitly accessed - for all we know it might be a carveout belonging to overly-aggressive Secure software such that even a speculative prefetch might trigger an instant system reset. > However, that memory could be _anything_ in the system - it > could be the kernel image, and it could give userspace access to > change kernel code. > > So, while it is useful to do a pfn_valid() check in vmap(), as I said > to willy, this must _not_ be the primary check. It should IMHO use > WARN_ON() to make it blatently obvious that it should be something we > expect _not_ to trigger under normal circumstances, but is there to > catch programming errors elsewhere. Rather, "to partially catch unrelated programming errors elsewhere, provided the buggy code happens to call vmap() rather than any of the many other functions with a struct page * argument." That's where it stretches my definition of "useful" just a bit too far. It's not about perfect being the enemy of good, it's about why vmap() should be special, and death by a thousand "useful" cuts - if we don't trust the pointer, why not check its alignment for basic plausibility first? If it seems valid, why not check if the page flags look sensible to make sure? How many useful little checks is too many? Every bit of code footprint and execution overhead imposed unconditionally on all end users to theoretically save developers' debugging time still adds up. Although on that note, it looks like arch/arm's pfn_valid() is still a linear scan of the memblock array, so the overhead of adding that for every page in every vmap() might not even be so small... Robin.
On Thu, Jan 20, 2022 at 04:37:01PM +0000, Robin Murphy wrote: > On 2022-01-20 13:03, Russell King (Oracle) wrote: > > On Thu, Jan 20, 2022 at 12:22:35PM +0000, Robin Murphy wrote: > > > On 2022-01-19 19:12, Russell King (Oracle) wrote: > > > > On Wed, Jan 19, 2022 at 06:43:10PM +0000, Robin Murphy wrote: > > > > > Indeed, my impression is that the only legitimate way to get hold of a page > > > > > pointer without assumed provenance is via pfn_to_page(), which is where > > > > > pfn_valid() comes in. Thus pfn_valid(page_to_pfn()) really *should* be a > > > > > tautology. > > > > > > > > That can only be true if pfn == page_to_pfn(pfn_to_page(pfn)) for all > > > > values of pfn. > > > > > > > > Given how pfn_to_page() is defined in the sparsemem case: > > > > > > > > #define __pfn_to_page(pfn) \ > > > > ({ unsigned long __pfn = (pfn); \ > > > > struct mem_section *__sec = __pfn_to_section(__pfn); \ > > > > __section_mem_map_addr(__sec) + __pfn; \ > > > > }) > > > > #define page_to_pfn __page_to_pfn > > > > > > > > that isn't the case, especially when looking at page_to_pfn(): > > > > > > > > #define __page_to_pfn(pg) \ > > > > ({ const struct page *__pg = (pg); \ > > > > int __sec = page_to_section(__pg); \ > > > > (unsigned long)(__pg - __section_mem_map_addr(__nr_to_section(__sec))); \ > > > > }) > > > > > > > > Where: > > > > > > > > static inline unsigned long page_to_section(const struct page *page) > > > > { > > > > return (page->flags >> SECTIONS_PGSHIFT) & SECTIONS_MASK; > > > > } > > > > > > > > So if page_to_section() returns something that is, e.g. zero for an > > > > invalid page in a non-zero section, you're not going to end up with > > > > the right pfn from page_to_pfn(). > > > > > > Right, I emphasised "should" in an attempt to imply "in the absence of > > > serious bugs that have further-reaching consequences anyway". > > > > > > > As I've said now a couple of times, trying to determine of a struct > > > > page pointer is valid is the wrong question to be asking. > > > > > > And doing so in one single place, on the justification of avoiding an > > > incredibly niche symptom, is even more so. Not to mention that an address > > > size fault is one of the best possible outcomes anyway, vs. the untold > > > damage that may stem from accesses actually going through to random parts of > > > the physical memory map. > > > > I don't see it as a "niche" symptom. > > The commit message specifically cites a Data Abort "at address translation > later". Broadly speaking, a Data Abort due to an address size fault only > occurs if you've been lucky enough that the bogus PA which got mapped is so > spectacularly wrong that it's beyond the range configured in TCR.IPS. How > many other architectures even have a mechanism like that? I think we're misinterpreting each other. > > If we start off with the struct page being invalid, then the result of > > page_to_pfn() can not be relied upon to produce something that is > > meaningful - which is exactly why the vmap() issue arises. > > > > With a pfn_valid() check, we at least know that the PFN points at > > memory. > > No, we know it points to some PA space which has a struct page to represent > it. pfn_valid() only says that pfn_to_page() will yield a valid result. That > also includes things like reserved pages covering non-RAM areas, where a > kernel VA mapping existing at all could potentially be fatal to the system > even if it's never explicitly accessed - for all we know it might be a > carveout belonging to overly-aggressive Secure software such that even a > speculative prefetch might trigger an instant system reset. So are you saying that the "address size fault" can happen because we've mapped something for which pfn_valid() returns true? > > However, that memory could be _anything_ in the system - it > > could be the kernel image, and it could give userspace access to > > change kernel code. > > > > So, while it is useful to do a pfn_valid() check in vmap(), as I said > > to willy, this must _not_ be the primary check. It should IMHO use > > WARN_ON() to make it blatently obvious that it should be something we > > expect _not_ to trigger under normal circumstances, but is there to > > catch programming errors elsewhere. > > Rather, "to partially catch unrelated programming errors elsewhere, provided > the buggy code happens to call vmap() rather than any of the many other > functions with a struct page * argument." That's where it stretches my > definition of "useful" just a bit too far. It's not about perfect being the > enemy of good, it's about why vmap() should be special, and death by a > thousand "useful" cuts - if we don't trust the pointer, why not check its > alignment for basic plausibility first? If it seems valid, why not check if > the page flags look sensible to make sure? How many useful little checks is > too many? Every bit of code footprint and execution overhead imposed > unconditionally on all end users to theoretically save developers' debugging > time still adds up. Although on that note, it looks like arch/arm's > pfn_valid() is still a linear scan of the memblock array, so the overhead of > adding that for every page in every vmap() might not even be so small... Well, I think I've adequately explained why I believe: pfn_is_valid(page_to_pfn(page)) being used as the primary check is substandard, and will likely lead to a future CVE. When generating an array of struct page's, I believe that it is the responsibility for the generator to ensure that the array only contains valid pages.
On Thu, Jan 20, 2022 at 04:54:03PM +0000, Russell King (Oracle) wrote: > Well, I think I've adequately explained why I believe: > > pfn_is_valid(page_to_pfn(page)) > > being used as the primary check is substandard, and will likely lead to > a future CVE. When generating an array of struct page's, I believe that > it is the responsibility for the generator to ensure that the array > only contains valid pages. So you're saying that virt_to_page() should be made more expensive?
On Wed, Jan 19, 2022 at 7:37 PM Anshuman Khandual <anshuman.khandual@arm.com> wrote: > > > > On 1/19/22 10:52 PM, Yury Norov wrote: > >> Why should not this just scan over the entire user provided struct page > >> array and make sure that all pages there in are valid via above method, > >> but in vmap() itself before calling vmap_pages_range(). Because seems > >> like a single invalid page detected in vmap_pages_pte_range() will > >> anyways abort the entire vmap(). This will also enable us to drop the > >> existing NULL check above. > > > > I can do this, but why is it any better than the current approach? > > Because it will just return on the first instance where the valid page > check fails, saving us some CPU cycles and an incomplete mapping ? This should normally never happen, that's why warn_on() is there. If it happens, there is a serious problem, and the code must be fixed. So, no CPU cycles saving in real life.
On Thu, Jan 20, 2022 at 8:37 AM Robin Murphy <robin.murphy@arm.com> wrote: > > On 2022-01-20 13:03, Russell King (Oracle) wrote: > > On Thu, Jan 20, 2022 at 12:22:35PM +0000, Robin Murphy wrote: > >> On 2022-01-19 19:12, Russell King (Oracle) wrote: > >>> On Wed, Jan 19, 2022 at 06:43:10PM +0000, Robin Murphy wrote: > >>>> Indeed, my impression is that the only legitimate way to get hold of a page > >>>> pointer without assumed provenance is via pfn_to_page(), which is where > >>>> pfn_valid() comes in. Thus pfn_valid(page_to_pfn()) really *should* be a > >>>> tautology. > >>> > >>> That can only be true if pfn == page_to_pfn(pfn_to_page(pfn)) for all > >>> values of pfn. > >>> > >>> Given how pfn_to_page() is defined in the sparsemem case: > >>> > >>> #define __pfn_to_page(pfn) \ > >>> ({ unsigned long __pfn = (pfn); \ > >>> struct mem_section *__sec = __pfn_to_section(__pfn); \ > >>> __section_mem_map_addr(__sec) + __pfn; \ > >>> }) > >>> #define page_to_pfn __page_to_pfn > >>> > >>> that isn't the case, especially when looking at page_to_pfn(): > >>> > >>> #define __page_to_pfn(pg) \ > >>> ({ const struct page *__pg = (pg); \ > >>> int __sec = page_to_section(__pg); \ > >>> (unsigned long)(__pg - __section_mem_map_addr(__nr_to_section(__sec))); \ > >>> }) > >>> > >>> Where: > >>> > >>> static inline unsigned long page_to_section(const struct page *page) > >>> { > >>> return (page->flags >> SECTIONS_PGSHIFT) & SECTIONS_MASK; > >>> } > >>> > >>> So if page_to_section() returns something that is, e.g. zero for an > >>> invalid page in a non-zero section, you're not going to end up with > >>> the right pfn from page_to_pfn(). > >> > >> Right, I emphasised "should" in an attempt to imply "in the absence of > >> serious bugs that have further-reaching consequences anyway". > >> > >>> As I've said now a couple of times, trying to determine of a struct > >>> page pointer is valid is the wrong question to be asking. > >> > >> And doing so in one single place, on the justification of avoiding an > >> incredibly niche symptom, is even more so. Not to mention that an address > >> size fault is one of the best possible outcomes anyway, vs. the untold > >> damage that may stem from accesses actually going through to random parts of > >> the physical memory map. It's not a single place. Many exported functions check arguments this way or another. __vunmap() in vfree() path, for example, checks for address alignment, which is quite similar to me. And later even makes BUG_ON(!page). > > I don't see it as a "niche" symptom. > > The commit message specifically cites a Data Abort "at address > translation later". Broadly speaking, a Data Abort due to an address > size fault only occurs if you've been lucky enough that the bogus PA > which got mapped is so spectacularly wrong that it's beyond the range > configured in TCR.IPS. How many other architectures even have a > mechanism like that? > > > If we start off with the struct page being invalid, then the result of > > page_to_pfn() can not be relied upon to produce something that is > > meaningful - which is exactly why the vmap() issue arises. > > > > With a pfn_valid() check, we at least know that the PFN points at > > memory. > > No, we know it points to some PA space which has a struct page to > represent it. pfn_valid() only says that pfn_to_page() will yield a > valid result. That also includes things like reserved pages covering > non-RAM areas, where a kernel VA mapping existing at all could > potentially be fatal to the system even if it's never explicitly > accessed - for all we know it might be a carveout belonging to > overly-aggressive Secure software such that even a speculative prefetch > might trigger an instant system reset. > > > However, that memory could be _anything_ in the system - it > > could be the kernel image, and it could give userspace access to > > change kernel code. > > > > So, while it is useful to do a pfn_valid() check in vmap(), as I said > > to willy, this must _not_ be the primary check. It should IMHO use > > WARN_ON() to make it blatently obvious that it should be something we > > expect _not_ to trigger under normal circumstances, but is there to > > catch programming errors elsewhere. It actually uses WARN_ON(). > Rather, "to partially catch unrelated programming errors elsewhere, > provided the buggy code happens to call vmap() rather than any of the > many other functions with a struct page * argument." That's where it > stretches my definition of "useful" just a bit too far. It's not about > perfect being the enemy of good, it's about why vmap() should be > special, and death by a thousand "useful" cuts - if we don't trust the > pointer, why not check its alignment for basic plausibility first? Because in that particular case pfn_valid() is enough. If someone else will have a real case where IS_ALIGNED() would help - I will be all for adding that check in vmap(). > If it > seems valid, why not check if the page flags look sensible to make sure? > How many useful little checks is too many? I'd put in 'too many' group those that test for something that never happened to people. > Every bit of code footprint > and execution overhead imposed unconditionally on all end users to > theoretically save developers' debugging time still adds up. Not theoretically - practically! End users will value kernel stability even when buggy drivers are installed. They will also value developers who fix bugs quickly. It has been noticed that DEBUG_VIRTUAL could catch this bug. But sometimes stopping the production hardware, building a custom kernel with many debug options in the hope that one of them will help, and running a suspicious driver for hours would take itself for more than a day. Thanks, Yury > Although on > that note, it looks like arch/arm's pfn_valid() is still a linear scan > of the memblock array, so the overhead of adding that for every page in > every vmap() might not even be so small... > > Robin.
On Wed, Jan 19, 2022 at 01:28:14PM +0000, Robin Murphy wrote: > Is it page_to_pfn() guaranteed to work without blowing up if page is invalid > in the first place? Looking at the CONFIG_SPARSEMEM case I'm not sure that's > true... Something that all the ARM people weighing in on this don't understand is that basically nobody uses SPARSEMEM without SPARSEMEM_VMEMMAP. So all this complicated code to do page_to_pfn() is never tested. Real users all do a simple subtraction and so the simple pfn_valid() works fine.
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index d2a00ad4e1dd..a4134ee56b10 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr, return -EBUSY; if (WARN_ON(!page)) return -ENOMEM; + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) + return -EINVAL; set_pte_at(&init_mm, addr, pte, mk_pte(page, prot)); (*nr)++; } while (pte++, addr += PAGE_SIZE, addr != end);
vmap() takes struct page *pages as one of arguments, and user may provide an invalid pointer which would lead to DABT at address translation later. Currently, kernel checks the pages against NULL. In my case, however, the address was not NULL, and was big enough so that the hardware generated Address Size Abort on arm64. Interestingly, this abort happens even if copy_from_kernel_nofault() is used, which is quite inconvenient for debugging purposes. This patch adds a pfn_valid() check into vmap() path, so that invalid mapping will not be created. RFC: https://lkml.org/lkml/2022/1/18/815 v1: use pfn_valid() instead of adding an arch-specific arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint. Signed-off-by: Yury Norov <yury.norov@gmail.com> --- mm/vmalloc.c | 2 ++ 1 file changed, 2 insertions(+)