diff mbox series

mm/vmalloc: Check absolute error return from vmap_[p4d|pud|pmd|pte]_range()

Message ID 1560413551-17460-1-git-send-email-anshuman.khandual@arm.com (mailing list archive)
State New, archived
Headers show
Series mm/vmalloc: Check absolute error return from vmap_[p4d|pud|pmd|pte]_range() | expand

Commit Message

Anshuman Khandual June 13, 2019, 8:12 a.m. UTC
vmap_pte_range() returns an -EBUSY when it encounters a non-empty PTE. But
currently vmap_pmd_range() unifies both -EBUSY and -ENOMEM return code as
-ENOMEM and send it up the call chain which is wrong. Interestingly enough
vmap_page_range_noflush() tests for the absolute error return value from
vmap_p4d_range() but it does not help because -EBUSY has been merged with
-ENOMEM. So all it can return is -ENOMEM. Fix this by testing for absolute
error return from vmap_pmd_range() all the way up to vmap_p4d_range().

Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Penyaev <rpenyaev@suse.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 mm/vmalloc.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

Comments

Roman Penyaev June 13, 2019, 9:33 a.m. UTC | #1
On 2019-06-13 10:12, Anshuman Khandual wrote:
> vmap_pte_range() returns an -EBUSY when it encounters a non-empty PTE. 
> But
> currently vmap_pmd_range() unifies both -EBUSY and -ENOMEM return code 
> as
> -ENOMEM and send it up the call chain which is wrong. Interestingly 
> enough
> vmap_page_range_noflush() tests for the absolute error return value 
> from
> vmap_p4d_range() but it does not help because -EBUSY has been merged 
> with
> -ENOMEM. So all it can return is -ENOMEM. Fix this by testing for 
> absolute
> error return from vmap_pmd_range() all the way up to vmap_p4d_range().

I could not find any real external caller of vmap API who really cares
about the errno, and frankly why they should?  This is allocation path,
allocation failed - game over.  When you step on -EBUSY case something
has gone completely wrong in your kernel, you get a big warning in
your dmesg and it is already does not matter what errno you get.

--
Roman
Anshuman Khandual June 13, 2019, 3:21 p.m. UTC | #2
On 06/13/2019 03:03 PM, Roman Penyaev wrote:
> On 2019-06-13 10:12, Anshuman Khandual wrote:
>> vmap_pte_range() returns an -EBUSY when it encounters a non-empty PTE. But
>> currently vmap_pmd_range() unifies both -EBUSY and -ENOMEM return code as
>> -ENOMEM and send it up the call chain which is wrong. Interestingly enough
>> vmap_page_range_noflush() tests for the absolute error return value from
>> vmap_p4d_range() but it does not help because -EBUSY has been merged with
>> -ENOMEM. So all it can return is -ENOMEM. Fix this by testing for absolute
>> error return from vmap_pmd_range() all the way up to vmap_p4d_range().
> 
> I could not find any real external caller of vmap API who really cares
> about the errno, and frankly why they should?  This is allocation path,

map_vm_area() which is an exported symbol suppose to provide the right
error code regardless whether it's current users care for it or not.

> allocation failed - game over.  When you step on -EBUSY case something
> has gone completely wrong in your kernel, you get a big warning in
> your dmesg and it is already does not matter what errno you get.

Its true that vmap_pte_range() does warn during error conditions. But if
we really dont care about error return code then we should just remove
specific error details (ENOMEM/EBUSY) and instead replace them with simple
boolean false/true or (0/1/-1) return values at each level. Will that be
acceptable ? What we have currently is wrong where vmap_pmd_range() could
just wrap EBUSY as ENOMEM and send up the call chain.
Matthew Wilcox June 13, 2019, 3:31 p.m. UTC | #3
On Thu, Jun 13, 2019 at 08:51:17PM +0530, Anshuman Khandual wrote:
> acceptable ? What we have currently is wrong where vmap_pmd_range() could
> just wrap EBUSY as ENOMEM and send up the call chain.

It's not wrong.  We do it in lots of places.  Unless there's a caller
which really needs to know the difference, it's often better than
returning the "real error".
Anshuman Khandual June 14, 2019, 5:27 a.m. UTC | #4
On 06/13/2019 09:01 PM, Matthew Wilcox wrote:
> On Thu, Jun 13, 2019 at 08:51:17PM +0530, Anshuman Khandual wrote:
>> acceptable ? What we have currently is wrong where vmap_pmd_range() could
>> just wrap EBUSY as ENOMEM and send up the call chain.
> 
> It's not wrong.  We do it in lots of places.  Unless there's a caller
> which really needs to know the difference, it's often better than
> returning the "real error".

I can understand the fact that because there are no active users of this
return code, the current situation has been alright. But then I fail to
understand how can EBUSY be made ENOMEM and let the caller to think that
vmap_page_rage() failed because of lack of memory when it is clearly not
the case. It is really surprising how it can be acceptable inside kernel
(init_mm) page table functions which need to be thorough enough.
Matthew Wilcox June 14, 2019, 1:42 p.m. UTC | #5
On Fri, Jun 14, 2019 at 10:57:42AM +0530, Anshuman Khandual wrote:
> 
> 
> On 06/13/2019 09:01 PM, Matthew Wilcox wrote:
> > On Thu, Jun 13, 2019 at 08:51:17PM +0530, Anshuman Khandual wrote:
> >> acceptable ? What we have currently is wrong where vmap_pmd_range() could
> >> just wrap EBUSY as ENOMEM and send up the call chain.
> > 
> > It's not wrong.  We do it in lots of places.  Unless there's a caller
> > which really needs to know the difference, it's often better than
> > returning the "real error".
> 
> I can understand the fact that because there are no active users of this
> return code, the current situation has been alright. But then I fail to
> understand how can EBUSY be made ENOMEM and let the caller to think that
> vmap_page_rage() failed because of lack of memory when it is clearly not
> the case. It is really surprising how it can be acceptable inside kernel
> (init_mm) page table functions which need to be thorough enough.

It's a corollary of Steinbach's Guideline for Systems Programming.
There is no possible way to handle this error because this error is
never supposed to happen.  So we may as well return a different error
that will still lead to the caller doing the right thing.
diff mbox series

Patch

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 7350a124524b..6c7dd8df23c3 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -165,14 +165,16 @@  static int vmap_pmd_range(pud_t *pud, unsigned long addr,
 {
 	pmd_t *pmd;
 	unsigned long next;
+	int err = 0;
 
 	pmd = pmd_alloc(&init_mm, pud, addr);
 	if (!pmd)
 		return -ENOMEM;
 	do {
 		next = pmd_addr_end(addr, end);
-		if (vmap_pte_range(pmd, addr, next, prot, pages, nr))
-			return -ENOMEM;
+		err = vmap_pte_range(pmd, addr, next, prot, pages, nr);
+		if (err)
+			return err;
 	} while (pmd++, addr = next, addr != end);
 	return 0;
 }
@@ -182,14 +184,16 @@  static int vmap_pud_range(p4d_t *p4d, unsigned long addr,
 {
 	pud_t *pud;
 	unsigned long next;
+	int err = 0;
 
 	pud = pud_alloc(&init_mm, p4d, addr);
 	if (!pud)
 		return -ENOMEM;
 	do {
 		next = pud_addr_end(addr, end);
-		if (vmap_pmd_range(pud, addr, next, prot, pages, nr))
-			return -ENOMEM;
+		err = vmap_pmd_range(pud, addr, next, prot, pages, nr);
+		if (err)
+			return err;
 	} while (pud++, addr = next, addr != end);
 	return 0;
 }
@@ -199,14 +203,16 @@  static int vmap_p4d_range(pgd_t *pgd, unsigned long addr,
 {
 	p4d_t *p4d;
 	unsigned long next;
+	int err = 0;
 
 	p4d = p4d_alloc(&init_mm, pgd, addr);
 	if (!p4d)
 		return -ENOMEM;
 	do {
 		next = p4d_addr_end(addr, end);
-		if (vmap_pud_range(p4d, addr, next, prot, pages, nr))
-			return -ENOMEM;
+		err = vmap_pud_range(p4d, addr, next, prot, pages, nr);
+		if (err)
+			return err;
 	} while (p4d++, addr = next, addr != end);
 	return 0;
 }