diff mbox series

[RFC,2/2] kread: avoid duplicates

Message ID 20230414052840.1994456-3-mcgrof@kernel.org (mailing list archive)
State New
Headers show
Series module: fix virtual memory wasted on finit_module() | expand

Commit Message

Luis Chamberlain April 14, 2023, 5:28 a.m. UTC
With this we run into 0 wasted virtual memory bytes.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 fs/kernel_read_file.c | 150 ++++++++++++++++++++++++++++++++++++++++++
 kernel/module/main.c  |   6 +-
 2 files changed, 154 insertions(+), 2 deletions(-)

Comments

Greg Kroah-Hartman April 14, 2023, 6:35 a.m. UTC | #1
On Thu, Apr 13, 2023 at 10:28:40PM -0700, Luis Chamberlain wrote:
> With this we run into 0 wasted virtual memory bytes.

This changelog does not make any sense at all, sorry.  What are you
doing here and why?

> 
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  fs/kernel_read_file.c | 150 ++++++++++++++++++++++++++++++++++++++++++
>  kernel/module/main.c  |   6 +-
>  2 files changed, 154 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/kernel_read_file.c b/fs/kernel_read_file.c
> index 5d826274570c..209c56764442 100644
> --- a/fs/kernel_read_file.c
> +++ b/fs/kernel_read_file.c
> @@ -4,6 +4,7 @@
>  #include <linux/kernel_read_file.h>
>  #include <linux/security.h>
>  #include <linux/vmalloc.h>
> +#include <linux/fdtable.h>
>  
>  /**
>   * kernel_read_file() - read file contents into a kernel buffer
> @@ -171,17 +172,166 @@ ssize_t kernel_read_file_from_path_initns(const char *path, loff_t offset,
>  }
>  EXPORT_SYMBOL_GPL(kernel_read_file_from_path_initns);
>  
> +DEFINE_MUTEX(kread_dup_mutex);

static?

I stopped reading here :)

thanks,

greg k-h
Luis Chamberlain April 14, 2023, 4:35 p.m. UTC | #2
On Fri, Apr 14, 2023 at 08:35:53AM +0200, Greg KH wrote:
> On Thu, Apr 13, 2023 at 10:28:40PM -0700, Luis Chamberlain wrote:
> > With this we run into 0 wasted virtual memory bytes.
> 
> This changelog does not make any sense at all, sorry.  What are you
> doing here and why?

It's an RFC and the cover letter described the motivation looking for
ideas for an alternative, and it is the reason I was so terse on the
commit log.

  Luis
Christoph Hellwig April 16, 2023, 6:04 a.m. UTC | #3
On Thu, Apr 13, 2023 at 10:28:40PM -0700, Luis Chamberlain wrote:
> With this we run into 0 wasted virtual memory bytes.

Avoid what duplicates?
Luis Chamberlain April 16, 2023, 6:41 a.m. UTC | #4
On Sat, Apr 15, 2023 at 11:04:12PM -0700, Christoph Hellwig wrote:
> On Thu, Apr 13, 2023 at 10:28:40PM -0700, Luis Chamberlain wrote:
> > With this we run into 0 wasted virtual memory bytes.
> 
> Avoid what duplicates?

David Hildenbrand had reported that with over 400 CPUs vmap space
runs out and it seems it was related to module loading. I took a
look and confirmed it. Module loading ends up requiring in the
worst case 3 vmalloc allocations, so typically at least twice
the size of the module size and in the worst case just add
the decompressed module size:

a) initial kernel_read*() call
b) optional module decompression
c) the actual module data copy we will keep

Duplicate module requests that come from userspace end up being thrown
in the trash bin, as only one module will be allocated.  Although there
are checks for a module prior to requesting a module udev still doesn't
do the best of a job to avoid that and so we end up with tons of
duplicate module requests. We're talking about gigabytes of vmalloc
bytes just lost because of this for large systems and megabytes for
average systems. So for example with just 255 CPUs we can loose about
13.58 GiB, and for 8 CPUs about 226.53 MiB.

I have patches to curtail 1/2 of that space by doing a check in kernel
before we do the allocation in c) if the module is already present. For
a) it is harder because userspace just passes a file descriptor. But
since we can get the file path without the vmalloc this RFC suggest
maybe we can add a new kernel_read*() for module loading where it makes
sense to have only one read happen at a time.

  Luis
Greg Kroah-Hartman April 16, 2023, 12:50 p.m. UTC | #5
On Sat, Apr 15, 2023 at 11:41:28PM -0700, Luis Chamberlain wrote:
> On Sat, Apr 15, 2023 at 11:04:12PM -0700, Christoph Hellwig wrote:
> > On Thu, Apr 13, 2023 at 10:28:40PM -0700, Luis Chamberlain wrote:
> > > With this we run into 0 wasted virtual memory bytes.
> > 
> > Avoid what duplicates?
> 
> David Hildenbrand had reported that with over 400 CPUs vmap space
> runs out and it seems it was related to module loading. I took a
> look and confirmed it. Module loading ends up requiring in the
> worst case 3 vmalloc allocations, so typically at least twice
> the size of the module size and in the worst case just add
> the decompressed module size:
> 
> a) initial kernel_read*() call
> b) optional module decompression
> c) the actual module data copy we will keep
> 
> Duplicate module requests that come from userspace end up being thrown
> in the trash bin, as only one module will be allocated.  Although there
> are checks for a module prior to requesting a module udev still doesn't
> do the best of a job to avoid that and so we end up with tons of
> duplicate module requests. We're talking about gigabytes of vmalloc
> bytes just lost because of this for large systems and megabytes for
> average systems. So for example with just 255 CPUs we can loose about
> 13.58 GiB, and for 8 CPUs about 226.53 MiB.

How does the memory get "lost"?  Shouldn't it be properly freed when the
duplicate module load fails?

thanks,

greg k-h
Luis Chamberlain April 16, 2023, 6:46 p.m. UTC | #6
On Sun, Apr 16, 2023 at 02:50:01PM +0200, Greg KH wrote:
> On Sat, Apr 15, 2023 at 11:41:28PM -0700, Luis Chamberlain wrote:
> > On Sat, Apr 15, 2023 at 11:04:12PM -0700, Christoph Hellwig wrote:
> > > On Thu, Apr 13, 2023 at 10:28:40PM -0700, Luis Chamberlain wrote:
> > > > With this we run into 0 wasted virtual memory bytes.
> > > 
> > > Avoid what duplicates?
> > 
> > David Hildenbrand had reported that with over 400 CPUs vmap space
> > runs out and it seems it was related to module loading. I took a
> > look and confirmed it. Module loading ends up requiring in the
> > worst case 3 vmalloc allocations, so typically at least twice
> > the size of the module size and in the worst case just add
> > the decompressed module size:
> > 
> > a) initial kernel_read*() call
> > b) optional module decompression
> > c) the actual module data copy we will keep
> > 
> > Duplicate module requests that come from userspace end up being thrown
> > in the trash bin, as only one module will be allocated.  Although there
> > are checks for a module prior to requesting a module udev still doesn't
> > do the best of a job to avoid that and so we end up with tons of
> > duplicate module requests. We're talking about gigabytes of vmalloc
> > bytes just lost because of this for large systems and megabytes for
> > average systems. So for example with just 255 CPUs we can loose about
> > 13.58 GiB, and for 8 CPUs about 226.53 MiB.
> 
> How does the memory get "lost"?  Shouldn't it be properly freed when the
> duplicate module load fails?

Yes memory gets freed, but since virtual memory space can be limitted it
also means you can end up eventually getting to the point -ENOMEMs will
happen as you have more CPUS and you cannot use virtual memory for other
things during kernel bootup and bootup fails. This is apparently
exacerbated with KASAN enabled.

  Luis
Greg Kroah-Hartman April 17, 2023, 6:05 a.m. UTC | #7
On Sun, Apr 16, 2023 at 11:46:44AM -0700, Luis Chamberlain wrote:
> On Sun, Apr 16, 2023 at 02:50:01PM +0200, Greg KH wrote:
> > On Sat, Apr 15, 2023 at 11:41:28PM -0700, Luis Chamberlain wrote:
> > > On Sat, Apr 15, 2023 at 11:04:12PM -0700, Christoph Hellwig wrote:
> > > > On Thu, Apr 13, 2023 at 10:28:40PM -0700, Luis Chamberlain wrote:
> > > > > With this we run into 0 wasted virtual memory bytes.
> > > > 
> > > > Avoid what duplicates?
> > > 
> > > David Hildenbrand had reported that with over 400 CPUs vmap space
> > > runs out and it seems it was related to module loading. I took a
> > > look and confirmed it. Module loading ends up requiring in the
> > > worst case 3 vmalloc allocations, so typically at least twice
> > > the size of the module size and in the worst case just add
> > > the decompressed module size:
> > > 
> > > a) initial kernel_read*() call
> > > b) optional module decompression
> > > c) the actual module data copy we will keep
> > > 
> > > Duplicate module requests that come from userspace end up being thrown
> > > in the trash bin, as only one module will be allocated.  Although there
> > > are checks for a module prior to requesting a module udev still doesn't
> > > do the best of a job to avoid that and so we end up with tons of
> > > duplicate module requests. We're talking about gigabytes of vmalloc
> > > bytes just lost because of this for large systems and megabytes for
> > > average systems. So for example with just 255 CPUs we can loose about
> > > 13.58 GiB, and for 8 CPUs about 226.53 MiB.
> > 
> > How does the memory get "lost"?  Shouldn't it be properly freed when the
> > duplicate module load fails?
> 
> Yes memory gets freed, but since virtual memory space can be limitted it
> also means you can end up eventually getting to the point -ENOMEMs will
> happen as you have more CPUS and you cannot use virtual memory for other
> things during kernel bootup and bootup fails. This is apparently
> exacerbated with KASAN enabled.

Then why not just rate-limit the module loader in userspace on such
large systems if that's an issue?  No kernel changes needed to do that.

thanks,

greg k-h
Edgecombe, Rick P April 17, 2023, 5:33 p.m. UTC | #8
On Sat, 2023-04-15 at 23:41 -0700, Luis Chamberlain wrote:
> On Sat, Apr 15, 2023 at 11:04:12PM -0700, Christoph Hellwig wrote:
> > On Thu, Apr 13, 2023 at 10:28:40PM -0700, Luis Chamberlain wrote:
> > > With this we run into 0 wasted virtual memory bytes.
> > 
> > Avoid what duplicates?
> 
> David Hildenbrand had reported that with over 400 CPUs vmap space
> runs out and it seems it was related to module loading. I took a
> look and confirmed it. Module loading ends up requiring in the
> worst case 3 vmalloc allocations, so typically at least twice
> the size of the module size and in the worst case just add
> the decompressed module size:
> 
> a) initial kernel_read*() call
> b) optional module decompression
> c) the actual module data copy we will keep
> 
> Duplicate module requests that come from userspace end up being
> thrown
> in the trash bin, as only one module will be allocated.  Although
> there
> are checks for a module prior to requesting a module udev still
> doesn't
> do the best of a job to avoid that and so we end up with tons of
> duplicate module requests. We're talking about gigabytes of vmalloc
> bytes just lost because of this for large systems and megabytes for
> average systems. So for example with just 255 CPUs we can loose about
> 13.58 GiB, and for 8 CPUs about 226.53 MiB.
> 
> I have patches to curtail 1/2 of that space by doing a check in
> kernel
> before we do the allocation in c) if the module is already present.
> For
> a) it is harder because userspace just passes a file descriptor. But
> since we can get the file path without the vmalloc this RFC suggest
> maybe we can add a new kernel_read*() for module loading where it
> makes
> sense to have only one read happen at a time.

I'm wondering how difficult it would be to just try to remove the
vmallocs in (a) and (b) and operate on a list of pages.

So the operations before module_patient_check_exists() are now:
1. decompressing (vmalloc)
2. signature check (vmalloc)
3. elf_validity_cache_copy()
4. early_mod_check() -> module_patient_check_exists()

Right? Then after that a bunch of arch code and other code outside of
modules operates on the vmalloc, so this other code would take a large
amount of changes to switch to a list of pages.

But did you consider teaching just 1-3 to operate on a list of pages?
And then move module_patient_check_exists() a little up in the order?
After module_patient_check_exists() you could vmap() the pages and hand
it off to the existing code located all over the place.

Then you can catch the duplicate requests before any vmalloc happens.
It also takes (a) and (b) down to one vmalloc even in the normal case,
as a side benefit. The changes to the signature check part might be
tricky though.

Sorry if this idea is off, I've got a little confused as this series
split into all these offshoots series.
Luis Chamberlain April 17, 2023, 10:05 p.m. UTC | #9
On Mon, Apr 17, 2023 at 08:05:31AM +0200, Greg KH wrote:
> On Sun, Apr 16, 2023 at 11:46:44AM -0700, Luis Chamberlain wrote:
> > On Sun, Apr 16, 2023 at 02:50:01PM +0200, Greg KH wrote:
> > > On Sat, Apr 15, 2023 at 11:41:28PM -0700, Luis Chamberlain wrote:
> > > > On Sat, Apr 15, 2023 at 11:04:12PM -0700, Christoph Hellwig wrote:
> > > > > On Thu, Apr 13, 2023 at 10:28:40PM -0700, Luis Chamberlain wrote:
> > > > > > With this we run into 0 wasted virtual memory bytes.
> > > > > 
> > > > > Avoid what duplicates?
> > > > 
> > > > David Hildenbrand had reported that with over 400 CPUs vmap space
> > > > runs out and it seems it was related to module loading. I took a
> > > > look and confirmed it. Module loading ends up requiring in the
> > > > worst case 3 vmalloc allocations, so typically at least twice
> > > > the size of the module size and in the worst case just add
> > > > the decompressed module size:
> > > > 
> > > > a) initial kernel_read*() call
> > > > b) optional module decompression
> > > > c) the actual module data copy we will keep
> > > > 
> > > > Duplicate module requests that come from userspace end up being thrown
> > > > in the trash bin, as only one module will be allocated.  Although there
> > > > are checks for a module prior to requesting a module udev still doesn't
> > > > do the best of a job to avoid that and so we end up with tons of
> > > > duplicate module requests. We're talking about gigabytes of vmalloc
> > > > bytes just lost because of this for large systems and megabytes for
> > > > average systems. So for example with just 255 CPUs we can loose about
> > > > 13.58 GiB, and for 8 CPUs about 226.53 MiB.
> > > 
> > > How does the memory get "lost"?  Shouldn't it be properly freed when the
> > > duplicate module load fails?
> > 
> > Yes memory gets freed, but since virtual memory space can be limitted it
> > also means you can end up eventually getting to the point -ENOMEMs will
> > happen as you have more CPUS and you cannot use virtual memory for other
> > things during kernel bootup and bootup fails. This is apparently
> > exacerbated with KASAN enabled.
> 
> Then why not just rate-limit the module loader in userspace on such
> large systems if that's an issue?  No kernel changes needed to do that.

We can certainly just take a stance punt this as a userspace problem. I thought
it would be good to see what a kernel style of workaround would look like for
us to evluate.

  Luis
Luis Chamberlain April 17, 2023, 10:08 p.m. UTC | #10
On Mon, Apr 17, 2023 at 05:33:49PM +0000, Edgecombe, Rick P wrote:
> On Sat, 2023-04-15 at 23:41 -0700, Luis Chamberlain wrote:
> > On Sat, Apr 15, 2023 at 11:04:12PM -0700, Christoph Hellwig wrote:
> > > On Thu, Apr 13, 2023 at 10:28:40PM -0700, Luis Chamberlain wrote:
> > > > With this we run into 0 wasted virtual memory bytes.
> > > 
> > > Avoid what duplicates?
> > 
> > David Hildenbrand had reported that with over 400 CPUs vmap space
> > runs out and it seems it was related to module loading. I took a
> > look and confirmed it. Module loading ends up requiring in the
> > worst case 3 vmalloc allocations, so typically at least twice
> > the size of the module size and in the worst case just add
> > the decompressed module size:
> > 
> > a) initial kernel_read*() call
> > b) optional module decompression
> > c) the actual module data copy we will keep
> > 
> > Duplicate module requests that come from userspace end up being
> > thrown
> > in the trash bin, as only one module will be allocated.  Although
> > there
> > are checks for a module prior to requesting a module udev still
> > doesn't
> > do the best of a job to avoid that and so we end up with tons of
> > duplicate module requests. We're talking about gigabytes of vmalloc
> > bytes just lost because of this for large systems and megabytes for
> > average systems. So for example with just 255 CPUs we can loose about
> > 13.58 GiB, and for 8 CPUs about 226.53 MiB.
> > 
> > I have patches to curtail 1/2 of that space by doing a check in
> > kernel
> > before we do the allocation in c) if the module is already present.
> > For
> > a) it is harder because userspace just passes a file descriptor. But
> > since we can get the file path without the vmalloc this RFC suggest
> > maybe we can add a new kernel_read*() for module loading where it
> > makes
> > sense to have only one read happen at a time.
> 
> I'm wondering how difficult it would be to just try to remove the
> vmallocs in (a) and (b) and operate on a list of pages.

Yes I think it's worth long term to do that, if possible with seq reads.

  Luis
Luis Chamberlain April 18, 2023, 6:46 p.m. UTC | #11
On Mon, Apr 17, 2023 at 03:08:34PM -0700, Luis Chamberlain wrote:
> On Mon, Apr 17, 2023 at 05:33:49PM +0000, Edgecombe, Rick P wrote:
> > On Sat, 2023-04-15 at 23:41 -0700, Luis Chamberlain wrote:
> > > On Sat, Apr 15, 2023 at 11:04:12PM -0700, Christoph Hellwig wrote:
> > > > On Thu, Apr 13, 2023 at 10:28:40PM -0700, Luis Chamberlain wrote:
> > > > > With this we run into 0 wasted virtual memory bytes.
> > > > 
> > > > Avoid what duplicates?
> > > 
> > > David Hildenbrand had reported that with over 400 CPUs vmap space
> > > runs out and it seems it was related to module loading. I took a
> > > look and confirmed it. Module loading ends up requiring in the
> > > worst case 3 vmalloc allocations, so typically at least twice
> > > the size of the module size and in the worst case just add
> > > the decompressed module size:
> > > 
> > > a) initial kernel_read*() call
> > > b) optional module decompression
> > > c) the actual module data copy we will keep
> > > 
> > > Duplicate module requests that come from userspace end up being
> > > thrown
> > > in the trash bin, as only one module will be allocated.  Although
> > > there
> > > are checks for a module prior to requesting a module udev still
> > > doesn't
> > > do the best of a job to avoid that and so we end up with tons of
> > > duplicate module requests. We're talking about gigabytes of vmalloc
> > > bytes just lost because of this for large systems and megabytes for
> > > average systems. So for example with just 255 CPUs we can loose about
> > > 13.58 GiB, and for 8 CPUs about 226.53 MiB.
> > > 
> > > I have patches to curtail 1/2 of that space by doing a check in
> > > kernel
> > > before we do the allocation in c) if the module is already present.
> > > For
> > > a) it is harder because userspace just passes a file descriptor. But
> > > since we can get the file path without the vmalloc this RFC suggest
> > > maybe we can add a new kernel_read*() for module loading where it
> > > makes
> > > sense to have only one read happen at a time.
> > 
> > I'm wondering how difficult it would be to just try to remove the
> > vmallocs in (a) and (b) and operate on a list of pages.
> 
> Yes I think it's worth long term to do that, if possible with seq reads.

OK here's what I suggest we do then:

I'll resubmit the first patch which allows us to prove / disprove if
module-autoloading is the culprit. With that in place folks can debug
their setup and verify how udev is to blame.

I'll drop the second kernel_read*() patch / effort and punt this as a
userspace problem as this is also not extremely pressing.

Long term should evaluate how we can avoid vmalloc for the kread and
module decompression.

If this really becomes a pressing issue we can revisit if we want an in
kernel solution, but at this point that likely would be systems with
over 400-500 CPUs with KASAN enabled. Without KASAN the issue should
eventually trigger if you're enablig modules but its hard to say at what
point you'd hit this issue.

  Luis
diff mbox series

Patch

diff --git a/fs/kernel_read_file.c b/fs/kernel_read_file.c
index 5d826274570c..209c56764442 100644
--- a/fs/kernel_read_file.c
+++ b/fs/kernel_read_file.c
@@ -4,6 +4,7 @@ 
 #include <linux/kernel_read_file.h>
 #include <linux/security.h>
 #include <linux/vmalloc.h>
+#include <linux/fdtable.h>
 
 /**
  * kernel_read_file() - read file contents into a kernel buffer
@@ -171,17 +172,166 @@  ssize_t kernel_read_file_from_path_initns(const char *path, loff_t offset,
 }
 EXPORT_SYMBOL_GPL(kernel_read_file_from_path_initns);
 
+DEFINE_MUTEX(kread_dup_mutex);
+static LIST_HEAD(kread_dup_reqs);
+
+struct kread_dup_req {
+	struct list_head list;
+	char name[PATH_MAX];
+	struct completion first_req_done;
+	struct work_struct complete_work;
+	struct delayed_work delete_work;
+	int dup_ret;
+};
+
+static struct kread_dup_req *kread_dup_request_lookup(char *name)
+{
+	struct kread_dup_req *kread_req;
+
+	list_for_each_entry_rcu(kread_req, &kread_dup_reqs, list,
+				lockdep_is_held(&kread_dup_mutex)) {
+		if (strlen(kread_req->name) == strlen(name) &&
+		    !memcmp(kread_req->name, name, strlen(name))) {
+			return kread_req;
+                }
+        }
+
+	return NULL;
+}
+
+static void kread_dup_request_delete(struct work_struct *work)
+{
+	struct kread_dup_req *kread_req;
+	kread_req = container_of(to_delayed_work(work), struct kread_dup_req, delete_work);
+
+	mutex_lock(&kread_dup_mutex);
+	list_del_rcu(&kread_req->list);
+	synchronize_rcu();
+	mutex_unlock(&kread_dup_mutex);
+	kfree(kread_req);
+}
+
+static void kread_dup_request_complete(struct work_struct *work)
+{
+	struct kread_dup_req *kread_req;
+
+	kread_req = container_of(work, struct kread_dup_req, complete_work);
+
+	complete_all(&kread_req->first_req_done);
+	queue_delayed_work(system_wq, &kread_req->delete_work, 60 * HZ);
+}
+
+static bool kread_dup_request_exists_wait(char *name, int *dup_ret)
+{
+	struct kread_dup_req *kread_req, *new_kread_req;
+	int ret;
+
+	/*
+	 * Pre-allocate the entry in case we have to use it later
+	 * to avoid contention with the mutex.
+	 */
+	new_kread_req = kzalloc(sizeof(*new_kread_req), GFP_KERNEL);
+	if (!new_kread_req)
+		return false;
+
+	memcpy(new_kread_req->name, name, strlen(name));
+	INIT_WORK(&new_kread_req->complete_work, kread_dup_request_complete);
+	INIT_DELAYED_WORK(&new_kread_req->delete_work, kread_dup_request_delete);
+	init_completion(&new_kread_req->first_req_done);
+
+	mutex_lock(&kread_dup_mutex);
+
+	kread_req = kread_dup_request_lookup(name);
+	if (!kread_req) {
+		/*
+		 * There was no duplicate, just add the request so we can
+		 * keep tab on duplicates later.
+		 */
+		//pr_info("New kread request for %s\n", name);
+		list_add_rcu(&new_kread_req->list, &kread_dup_reqs);
+		mutex_unlock(&kread_dup_mutex);
+		return false;
+	}
+	mutex_unlock(&kread_dup_mutex);
+
+	/* We are dealing with a duplicate request now */
+
+	kfree(new_kread_req);
+
+	//pr_warn("kread: duplicate request for file %s\n", name);
+
+	ret = wait_for_completion_state(&kread_req->first_req_done,
+					TASK_UNINTERRUPTIBLE | TASK_KILLABLE);
+	if (ret) {
+		*dup_ret = ret;
+		return true;
+	}
+
+	/* breath */
+	schedule_timeout(2*HZ);
+
+	*dup_ret = kread_req->dup_ret;
+
+	return true;
+}
+
+void kread_dup_request_announce(char *name, int ret)
+{
+	struct kread_dup_req *kread_req;
+
+	mutex_lock(&kread_dup_mutex);
+
+	kread_req = kread_dup_request_lookup(name);
+	if (!kread_req)
+		goto out;
+
+	kread_req->dup_ret = ret;
+
+	/*
+	 * If we complete() here we may allow duplicate threads
+	 * to continue before the first one that submitted the
+	 * request. We're in no rush but avoid boot delays caused
+	 * by these threads waiting too long.
+	 */
+	queue_work(system_wq, &kread_req->complete_work);
+
+out:
+	mutex_unlock(&kread_dup_mutex);
+}
+
 ssize_t kernel_read_file_from_fd(int fd, loff_t offset, void **buf,
 				 size_t buf_size, size_t *file_size,
 				 enum kernel_read_file_id id)
 {
 	struct fd f = fdget(fd);
 	ssize_t ret = -EBADF;
+	char *name, *path;
+	int dup_ret;
 
 	if (!f.file || !(f.file->f_mode & FMODE_READ))
 		goto out;
 
+	path = kzalloc(PATH_MAX, GFP_KERNEL);
+	if (!path)
+		return -ENOMEM;
+
+	name = file_path(f.file, path, PATH_MAX);
+	if (IS_ERR(name)) {
+		ret = PTR_ERR(name);
+		goto out_mem;
+	}
+
+	if (kread_dup_request_exists_wait(name, &dup_ret)) {
+		ret = -EBUSY;
+		goto out_mem;
+	}
+
 	ret = kernel_read_file(f.file, offset, buf, buf_size, file_size, id);
+
+	kread_dup_request_announce(name, ret);
+
+out_mem:
+	kfree(path);
 out:
 	fdput(f);
 	return ret;
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 1ed373145278..e99419b4d85c 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -3080,8 +3080,10 @@  SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags)
 	len = kernel_read_file_from_fd(fd, 0, &buf, INT_MAX, NULL,
 				       READING_MODULE);
 	if (len < 0) {
-		mod_stat_inc(&failed_kreads);
-		mod_stat_add_long(len, &invalid_kread_bytes);
+		if (len != -EBUSY) {
+			mod_stat_inc(&failed_kreads);
+			mod_stat_add_long(len, &invalid_kread_bytes);
+		}
 		return len;
 	}