diff mbox series

module: Replace kmap() with kmap_local_page()

Message ID 20220718002645.28817-1-fmdefrancesco@gmail.com (mailing list archive)
State New, archived
Headers show
Series module: Replace kmap() with kmap_local_page() | expand

Commit Message

Fabio M. De Francesco July 18, 2022, 12:26 a.m. UTC
kmap() is being deprecated in favor of kmap_local_page().

Two main problems with kmap(): (1) It comes with an overhead as mapping
space is restricted and protected by a global lock for synchronization and
(2) it also requires global TLB invalidation when the kmap’s pool wraps
and it might block when the mapping space is fully utilized until a slot
becomes available.

With kmap_local_page() the mappings are per thread, CPU local, can take
page faults, and can be called from any context (including interrupts).
It is faster than kmap() in kernels with HIGHMEM enabled. Its use in
module_gzip_decompress() and in module_xz_decompress() is safe and
should be preferred.

Therefore, replace kmap() with kmap_local_page().

Tested on a QEMU/KVM x86_32 VM with 4GB RAM, booting kernels with
HIGHMEM64GB enabled. Modules compressed with XZ or GZIP decompress
properly.

Suggested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
---
 kernel/module/decompress.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Luis Chamberlain July 18, 2022, 10:19 p.m. UTC | #1
On Mon, Jul 18, 2022 at 02:26:45AM +0200, Fabio M. De Francesco wrote:
> kmap() is being deprecated in favor of kmap_local_page().
> 
> Two main problems with kmap(): (1) It comes with an overhead as mapping
> space is restricted and protected by a global lock for synchronization and
> (2) it also requires global TLB invalidation when the kmap’s pool wraps
> and it might block when the mapping space is fully utilized until a slot
> becomes available.

Neat!

> With kmap_local_page() the mappings are per thread, CPU local, can take
> page faults, and can be called from any context (including interrupts).

Yes but the mapping can only be accessed from within this thread and the
thread is bound to the CPU while the mapping is active. So, what if we
get a preemption during decompression here? What happens?

> It is faster than kmap() in kernels with HIGHMEM enabled. Its use in
> module_gzip_decompress() and in module_xz_decompress() is safe and
> should be preferred.
> 
> Therefore, replace kmap() with kmap_local_page().

While this churn is going on everywhere I was wondering why not
go ahead and adopt kmap_local_folio() instead?

  Luis

> Tested on a QEMU/KVM x86_32 VM with 4GB RAM, booting kernels with
> HIGHMEM64GB enabled. Modules compressed with XZ or GZIP decompress
> properly.
> 
> Suggested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
> ---
>  kernel/module/decompress.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/module/decompress.c b/kernel/module/decompress.c
> index 2fc7081dd7c1..4d0bcb3d9e44 100644
> --- a/kernel/module/decompress.c
> +++ b/kernel/module/decompress.c
> @@ -119,10 +119,10 @@ static ssize_t module_gzip_decompress(struct load_info *info,
>  			goto out_inflate_end;
>  		}
>  
> -		s.next_out = kmap(page);
> +		s.next_out = kmap_local_page(page);
>  		s.avail_out = PAGE_SIZE;
>  		rc = zlib_inflate(&s, 0);
> -		kunmap(page);
> +		kunmap_local(s.next_out);
>  
>  		new_size += PAGE_SIZE - s.avail_out;
>  	} while (rc == Z_OK);
> @@ -178,11 +178,11 @@ static ssize_t module_xz_decompress(struct load_info *info,
>  			goto out;
>  		}
>  
> -		xz_buf.out = kmap(page);
> +		xz_buf.out = kmap_local_page(page);
>  		xz_buf.out_pos = 0;
>  		xz_buf.out_size = PAGE_SIZE;
>  		xz_ret = xz_dec_run(xz_dec, &xz_buf);
> -		kunmap(page);
> +		kunmap_local(xz_buf.out);
>  
>  		new_size += xz_buf.out_pos;
>  	} while (xz_buf.out_pos == PAGE_SIZE && xz_ret == XZ_OK);
> -- 
> 2.37.1
>
Fabio M. De Francesco July 19, 2022, 9:19 a.m. UTC | #2
On martedì 19 luglio 2022 00:19:50 CEST Luis Chamberlain wrote:
> On Mon, Jul 18, 2022 at 02:26:45AM +0200, Fabio M. De Francesco wrote:
> > kmap() is being deprecated in favor of kmap_local_page().
> > 
> > Two main problems with kmap(): (1) It comes with an overhead as mapping
> > space is restricted and protected by a global lock for synchronization 
and
> > (2) it also requires global TLB invalidation when the kmap’s pool wraps
> > and it might block when the mapping space is fully utilized until a 
slot
> > becomes available.
> 
> Neat!
> 

Thanks!

> > With kmap_local_page() the mappings are per thread, CPU local, can take
> > page faults, and can be called from any context (including interrupts).
> 
> Yes but the mapping can only be accessed from within this thread and the
> thread is bound to the CPU while the mapping is active. So, what if we
> get a preemption during decompression here? What happens?

No problems at all, please read commit f3ba3c710ac5 ("mm/highmem: Provide 
kmap_local*") from Thomas Gleixner:

"[] Now that the kmap atomic index is stored in task struct provide a
preemptible variant. On context switch the maps of an outgoing task are
removed and the map of the incoming task are restored. That's obviously
slow, but highmem is slow anyway.
   
The kmap_local.*() functions can be invoked from both preemptible and
atomic context. kmap local sections disable migration to keep the resulting
virtual mapping address correct, but disable neither pagefaults nor
preemption.".

However, yours is a good question. It made me realize that this specific 
topic is not discussed properly in highmem.rst (which also embeds kdocs 
from highmem.h and highmem-internal.h).

Some weeks ago I reworked highmem.rst and, within a couple of days, I'll 
also send a second round of additions and clean-ups to the documentation of 
kmap_local_page(). 

I'll use this opportunity to explain more clearly what happens to the local 
mappings when a task is preempted and then is restored. It looks like 
currently this documentation lacks clarity about the fact that tasks which 
use pointers returned by kmap_local_page() _can_ be preempted and that 
these pointers are still valid when the tasks are rescheduled to run again.

In the meantime, I'll submit a v2 of this patch with a slightly extended 
commit message which informs about this specific topic.

Again thanks for pointing it out that the commit message of this patch 
should be more clear about what happens with preemption.

> > It is faster than kmap() in kernels with HIGHMEM enabled. Its use in
> > module_gzip_decompress() and in module_xz_decompress() is safe and
> > should be preferred.
> > 
> > Therefore, replace kmap() with kmap_local_page().
> 
> While this churn is going on everywhere I was wondering why not
> go ahead and adopt kmap_local_folio() instead?

I'm sorry but, due to my lack of knowledge and experience, I'm not sure to 
understand how kmap_local_folio() could help here. My fault. I'm going to 
make some research and ask for help from more experienced developers. 

>   Luis
> 
> > Tested on a QEMU/KVM x86_32 VM with 4GB RAM, booting kernels with
> > HIGHMEM64GB enabled. Modules compressed with XZ or GZIP decompress
> > properly.
> > 
> > Suggested-by: Ira Weiny <ira.weiny@intel.com>
> > Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
> > ---
> >  kernel/module/decompress.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)

[snip]

Thanks,

Fabio
Matthew Wilcox (Oracle) July 19, 2022, 8:23 p.m. UTC | #3
On Tue, Jul 19, 2022 at 11:19:24AM +0200, Fabio M. De Francesco wrote:
> On martedì 19 luglio 2022 00:19:50 CEST Luis Chamberlain wrote:
> > > Therefore, replace kmap() with kmap_local_page().
> > 
> > While this churn is going on everywhere I was wondering why not
> > go ahead and adopt kmap_local_folio() instead?
> 
> I'm sorry but, due to my lack of knowledge and experience, I'm not sure to 
> understand how kmap_local_folio() could help here. My fault. I'm going to 
> make some research and ask for help from more experienced developers. 

I haven't made this suggestion to Fabio before for a few reasons.

First, it makes his work harder.  He not only has to understand the
implications of the kmap semantic changes but also the implications of
the folio change.

Then, I'm not sure that I necessarily have enough infrastructure in place
for doing a folio conversion everywhere that he's doing a kmap/kmap_atomic
to kmap_local_page conversion.

What makes it particularly tricky is that you can only kmap a single
page out of a folio at a time; there's no ability to kmap the entire
folio, no matter how large it is.  I've looked at doing the conversion
for ext2 directories, and it's _tricky_.  There's only one 'checked'
flag for the entire folio, but ext2_check_page() needs to take a mapped
page.  So now we have to make a judgement call about whether to support
caching ext2 directories with large folios or whether to restrict them
to single-page folios.

So yes, there's probably a second patch coming for maintainers to look
at that will convert the kmap_local_page() to kmap_local_folio().
However, I think it's actually less of a burden for maintainers if
these two different conversions happen separately because there are very
different considerations to review.  Also, there's no equivalent to kmap()
or kmap_atomic() for folios (deliberately), so the more conversions to
kmap_local_page() Fabio gets done, the easier it will be for a later
folio conversion.
Luis Chamberlain July 19, 2022, 9:35 p.m. UTC | #4
On Tue, Jul 19, 2022 at 09:23:49PM +0100, Matthew Wilcox wrote:
> On Tue, Jul 19, 2022 at 11:19:24AM +0200, Fabio M. De Francesco wrote:
> > On martedì 19 luglio 2022 00:19:50 CEST Luis Chamberlain wrote:
> > > > Therefore, replace kmap() with kmap_local_page().
> > > 
> > > While this churn is going on everywhere I was wondering why not
> > > go ahead and adopt kmap_local_folio() instead?
> > 
> > I'm sorry but, due to my lack of knowledge and experience, I'm not sure to 
> > understand how kmap_local_folio() could help here. My fault. I'm going to 
> > make some research and ask for help from more experienced developers. 
> 
> I haven't made this suggestion to Fabio before for a few reasons.
> 
> First, it makes his work harder.  He not only has to understand the
> implications of the kmap semantic changes but also the implications of
> the folio change.
> 
> Then, I'm not sure that I necessarily have enough infrastructure in place
> for doing a folio conversion everywhere that he's doing a kmap/kmap_atomic
> to kmap_local_page conversion.
> 
> What makes it particularly tricky is that you can only kmap a single
> page out of a folio at a time; there's no ability to kmap the entire
> folio, no matter how large it is.  I've looked at doing the conversion
> for ext2 directories, and it's _tricky_.  There's only one 'checked'
> flag for the entire folio, but ext2_check_page() needs to take a mapped
> page.  So now we have to make a judgement call about whether to support
> caching ext2 directories with large folios or whether to restrict them
> to single-page folios.
> 
> So yes, there's probably a second patch coming for maintainers to look
> at that will convert the kmap_local_page() to kmap_local_folio().
> However, I think it's actually less of a burden for maintainers if
> these two different conversions happen separately because there are very
> different considerations to review.  Also, there's no equivalent to kmap()
> or kmap_atomic() for folios (deliberately), so the more conversions to
> kmap_local_page() Fabio gets done, the easier it will be for a later
> folio conversion.

Makes sense, thanks for the feedback. I'll wrestle with ensuring the first
step to kmap_local_page() doens't break things where I see them
suggested first.

  Luis
diff mbox series

Patch

diff --git a/kernel/module/decompress.c b/kernel/module/decompress.c
index 2fc7081dd7c1..4d0bcb3d9e44 100644
--- a/kernel/module/decompress.c
+++ b/kernel/module/decompress.c
@@ -119,10 +119,10 @@  static ssize_t module_gzip_decompress(struct load_info *info,
 			goto out_inflate_end;
 		}
 
-		s.next_out = kmap(page);
+		s.next_out = kmap_local_page(page);
 		s.avail_out = PAGE_SIZE;
 		rc = zlib_inflate(&s, 0);
-		kunmap(page);
+		kunmap_local(s.next_out);
 
 		new_size += PAGE_SIZE - s.avail_out;
 	} while (rc == Z_OK);
@@ -178,11 +178,11 @@  static ssize_t module_xz_decompress(struct load_info *info,
 			goto out;
 		}
 
-		xz_buf.out = kmap(page);
+		xz_buf.out = kmap_local_page(page);
 		xz_buf.out_pos = 0;
 		xz_buf.out_size = PAGE_SIZE;
 		xz_ret = xz_dec_run(xz_dec, &xz_buf);
-		kunmap(page);
+		kunmap_local(xz_buf.out);
 
 		new_size += xz_buf.out_pos;
 	} while (xz_buf.out_pos == PAGE_SIZE && xz_ret == XZ_OK);