Message ID | 1560413551-17460-1-git-send-email-anshuman.khandual@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/vmalloc: Check absolute error return from vmap_[p4d|pud|pmd|pte]_range() | expand |
On 2019-06-13 10:12, Anshuman Khandual wrote: > vmap_pte_range() returns an -EBUSY when it encounters a non-empty PTE. > But > currently vmap_pmd_range() unifies both -EBUSY and -ENOMEM return code > as > -ENOMEM and send it up the call chain which is wrong. Interestingly > enough > vmap_page_range_noflush() tests for the absolute error return value > from > vmap_p4d_range() but it does not help because -EBUSY has been merged > with > -ENOMEM. So all it can return is -ENOMEM. Fix this by testing for > absolute > error return from vmap_pmd_range() all the way up to vmap_p4d_range(). I could not find any real external caller of vmap API who really cares about the errno, and frankly why they should? This is allocation path, allocation failed - game over. When you step on -EBUSY case something has gone completely wrong in your kernel, you get a big warning in your dmesg and it is already does not matter what errno you get. -- Roman
On 06/13/2019 03:03 PM, Roman Penyaev wrote: > On 2019-06-13 10:12, Anshuman Khandual wrote: >> vmap_pte_range() returns an -EBUSY when it encounters a non-empty PTE. But >> currently vmap_pmd_range() unifies both -EBUSY and -ENOMEM return code as >> -ENOMEM and send it up the call chain which is wrong. Interestingly enough >> vmap_page_range_noflush() tests for the absolute error return value from >> vmap_p4d_range() but it does not help because -EBUSY has been merged with >> -ENOMEM. So all it can return is -ENOMEM. Fix this by testing for absolute >> error return from vmap_pmd_range() all the way up to vmap_p4d_range(). > > I could not find any real external caller of vmap API who really cares > about the errno, and frankly why they should? This is allocation path, map_vm_area() which is an exported symbol suppose to provide the right error code regardless whether it's current users care for it or not. > allocation failed - game over. When you step on -EBUSY case something > has gone completely wrong in your kernel, you get a big warning in > your dmesg and it is already does not matter what errno you get. Its true that vmap_pte_range() does warn during error conditions. But if we really dont care about error return code then we should just remove specific error details (ENOMEM/EBUSY) and instead replace them with simple boolean false/true or (0/1/-1) return values at each level. Will that be acceptable ? What we have currently is wrong where vmap_pmd_range() could just wrap EBUSY as ENOMEM and send up the call chain.
On Thu, Jun 13, 2019 at 08:51:17PM +0530, Anshuman Khandual wrote: > acceptable ? What we have currently is wrong where vmap_pmd_range() could > just wrap EBUSY as ENOMEM and send up the call chain. It's not wrong. We do it in lots of places. Unless there's a caller which really needs to know the difference, it's often better than returning the "real error".
On 06/13/2019 09:01 PM, Matthew Wilcox wrote: > On Thu, Jun 13, 2019 at 08:51:17PM +0530, Anshuman Khandual wrote: >> acceptable ? What we have currently is wrong where vmap_pmd_range() could >> just wrap EBUSY as ENOMEM and send up the call chain. > > It's not wrong. We do it in lots of places. Unless there's a caller > which really needs to know the difference, it's often better than > returning the "real error". I can understand the fact that because there are no active users of this return code, the current situation has been alright. But then I fail to understand how can EBUSY be made ENOMEM and let the caller to think that vmap_page_rage() failed because of lack of memory when it is clearly not the case. It is really surprising how it can be acceptable inside kernel (init_mm) page table functions which need to be thorough enough.
On Fri, Jun 14, 2019 at 10:57:42AM +0530, Anshuman Khandual wrote: > > > On 06/13/2019 09:01 PM, Matthew Wilcox wrote: > > On Thu, Jun 13, 2019 at 08:51:17PM +0530, Anshuman Khandual wrote: > >> acceptable ? What we have currently is wrong where vmap_pmd_range() could > >> just wrap EBUSY as ENOMEM and send up the call chain. > > > > It's not wrong. We do it in lots of places. Unless there's a caller > > which really needs to know the difference, it's often better than > > returning the "real error". > > I can understand the fact that because there are no active users of this > return code, the current situation has been alright. But then I fail to > understand how can EBUSY be made ENOMEM and let the caller to think that > vmap_page_rage() failed because of lack of memory when it is clearly not > the case. It is really surprising how it can be acceptable inside kernel > (init_mm) page table functions which need to be thorough enough. It's a corollary of Steinbach's Guideline for Systems Programming. There is no possible way to handle this error because this error is never supposed to happen. So we may as well return a different error that will still lead to the caller doing the right thing.
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 7350a124524b..6c7dd8df23c3 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -165,14 +165,16 @@ static int vmap_pmd_range(pud_t *pud, unsigned long addr, { pmd_t *pmd; unsigned long next; + int err = 0; pmd = pmd_alloc(&init_mm, pud, addr); if (!pmd) return -ENOMEM; do { next = pmd_addr_end(addr, end); - if (vmap_pte_range(pmd, addr, next, prot, pages, nr)) - return -ENOMEM; + err = vmap_pte_range(pmd, addr, next, prot, pages, nr); + if (err) + return err; } while (pmd++, addr = next, addr != end); return 0; } @@ -182,14 +184,16 @@ static int vmap_pud_range(p4d_t *p4d, unsigned long addr, { pud_t *pud; unsigned long next; + int err = 0; pud = pud_alloc(&init_mm, p4d, addr); if (!pud) return -ENOMEM; do { next = pud_addr_end(addr, end); - if (vmap_pmd_range(pud, addr, next, prot, pages, nr)) - return -ENOMEM; + err = vmap_pmd_range(pud, addr, next, prot, pages, nr); + if (err) + return err; } while (pud++, addr = next, addr != end); return 0; } @@ -199,14 +203,16 @@ static int vmap_p4d_range(pgd_t *pgd, unsigned long addr, { p4d_t *p4d; unsigned long next; + int err = 0; p4d = p4d_alloc(&init_mm, pgd, addr); if (!p4d) return -ENOMEM; do { next = p4d_addr_end(addr, end); - if (vmap_pud_range(p4d, addr, next, prot, pages, nr)) - return -ENOMEM; + err = vmap_pud_range(p4d, addr, next, prot, pages, nr); + if (err) + return err; } while (p4d++, addr = next, addr != end); return 0; }
vmap_pte_range() returns an -EBUSY when it encounters a non-empty PTE. But currently vmap_pmd_range() unifies both -EBUSY and -ENOMEM return code as -ENOMEM and send it up the call chain which is wrong. Interestingly enough vmap_page_range_noflush() tests for the absolute error return value from vmap_p4d_range() but it does not help because -EBUSY has been merged with -ENOMEM. So all it can return is -ENOMEM. Fix this by testing for absolute error return from vmap_pmd_range() all the way up to vmap_p4d_range(). Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Roman Gushchin <guro@fb.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Penyaev <rpenyaev@suse.de> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> --- mm/vmalloc.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)