diff mbox

[-V3,0/8] migrate_pages(): fix several bugs in error path

Message ID 20220817081408.513338-1-ying.huang@intel.com (mailing list archive)
State New
Headers show

Commit Message

Huang, Ying Aug. 17, 2022, 8:14 a.m. UTC
From: "Huang, Ying" <ying.huang@intel.com>

During review the code of migrate_pages() and build a test program for
it.  Several bugs in error path are identified and fixed in this
series.

Most patches are tested via

- Apply error-inject.patch in Linux kernel
- Compile test-migrate.c (with -lnuma)
- Test with test-migrate.sh

error-inject.patch, test-migrate.c, and test-migrate.sh are as below.
It turns out that error injection is an important tool to fix bugs in
error path.

Changes:

v3:

- Rebased on mm-unstable (20220816)

- Added Baolin's patch to avoid retry 10 times for fail to migrate THP subpages

v2:

- Rebased on v5.19-rc5

- Addressed some comments from Baolin, Thanks!

- Added reviewed-by tags

Best Regards,
Huang, Ying

------------------------- error-inject.patch -------------------------
From 295ea21204f3f025a041fe39c68a2eaec8313c68 Mon Sep 17 00:00:00 2001
From: Huang Ying <ying.huang@intel.com>
Date: Tue, 21 Jun 2022 11:08:30 +0800
Subject: [PATCH] migrate_pages: error inject

---
 mm/migrate.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 55 insertions(+), 3 deletions(-)

Comments

Andrew Morton Aug. 17, 2022, 3:48 p.m. UTC | #1
On Wed, 17 Aug 2022 16:14:00 +0800 Huang Ying <ying.huang@intel.com> wrote:

> error-inject.patch, test-migrate.c, and test-migrate.sh are as below.
> It turns out that error injection is an important tool to fix bugs in
> error path.

Indeed, and thanks for doing this.

Did you consider lib/*inject*.c?  If so, was it unsuitable?
Huang, Ying Aug. 18, 2022, 12:51 a.m. UTC | #2
Andrew Morton <akpm@linux-foundation.org> writes:

> On Wed, 17 Aug 2022 16:14:00 +0800 Huang Ying <ying.huang@intel.com> wrote:
>
>> error-inject.patch, test-migrate.c, and test-migrate.sh are as below.
>> It turns out that error injection is an important tool to fix bugs in
>> error path.
>
> Indeed, and thanks for doing this.
>
> Did you consider lib/*inject*.c?  If so, was it unsuitable?

I haven't take a deep look at that.  Will do that.

Best Regards,
Huang, Ying
Huang, Ying Sept. 20, 2022, 2:43 a.m. UTC | #3
Andrew Morton <akpm@linux-foundation.org> writes:

> On Wed, 17 Aug 2022 16:14:00 +0800 Huang Ying <ying.huang@intel.com> wrote:
>
>> error-inject.patch, test-migrate.c, and test-migrate.sh are as below.
>> It turns out that error injection is an important tool to fix bugs in
>> error path.
>
> Indeed, and thanks for doing this.
>
> Did you consider lib/*inject*.c?  If so, was it unsuitable?

I have done some experiments to use some existing error injection
mechanisms in kernel to test the error path of migrate_pages().  After
some googling, I found that the BPF based error injection described in
the following URL is most suitable for my purpose.

https://lwn.net/Articles/740146/

Because the BPF seems quite flexible to satisfy various requirements of
error injection.  With it, the execution of some functions can be
skipped and some specified error code can be returned instead.

Works out of box
================

Some error injection functionality just works out of box.  For example,
inject some page allocation error in some path.  Firstly,
CONFIG_BPF_KPROBE_OVERRIDE needs to be enabled in kernel config.  Then,
a simple bpftrace script as follows can be used to inject page
allocation error during migrate_pages().

--------------------ENOMEM-----------------------
kprobe:migrate_pages { @in_migrate_pages++; }
kretprobe:migrate_pages  { @in_migrate_pages--; }
kprobe:should_fail_alloc_page / @in_migrate_pages > 0 / {
	override(1);
}
-------------------------------------------------

The call chain of error injection is specified via the first 2 lines.  I
copied the methods used in BCC inject script.  Is there any better
method to specify the call chain?

We can inject error only for THP page allocation too.

--------------------ENOMEM THP-------------------
kprobe:migrate_pages { @in_migrate_pages++; }
kretprobe:migrate_pages  { @in_migrate_pages--; }
kprobe:should_fail_alloc_page / @in_migrate_pages > 0 && arg1 == 9 / {
	override(1);
}
-------------------------------------------------

Use some hack to override any function
======================================

The in-kernel BPF based error injection mechanism can only override
function return value for the functions in the whitelist, that is,
functions marked with ALLOW_ERROR_INJECTION().  That is quite limited.
The thorough error path testing needs to override the return value of
arbitrary function.  So, I use a simple hack patch as follows for that.

-----------------------8<---------------------------------
From 3bcd401a3731bc7316d222501070a2a71fdf7170 Mon Sep 17 00:00:00 2001
From: Huang Ying <ying.huang@intel.com>
Date: Tue, 20 Sep 2022 09:08:25 +0800
Subject: [PATCH] dbg: allow override any function with bpf_error_injection

---
 lib/error-inject.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/lib/error-inject.c b/lib/error-inject.c
index 1afca1b1cdea..82a402e0f15c 100644
--- a/lib/error-inject.c
+++ b/lib/error-inject.c
@@ -21,6 +21,7 @@ struct ei_entry {
 	void *priv;
 };
 
+#if 0
 bool within_error_injection_list(unsigned long addr)
 {
 	struct ei_entry *ent;
@@ -36,6 +37,12 @@ bool within_error_injection_list(unsigned long addr)
 	mutex_unlock(&ei_mutex);
 	return ret;
 }
+#else
+bool within_error_injection_list(unsigned long addr)
+{
+	return true;
+}
+#endif
 
 int get_injectable_error_type(unsigned long addr)
 {
Yang Shi Sept. 21, 2022, 12:52 a.m. UTC | #4
On Mon, Sep 19, 2022 at 7:44 PM Huang, Ying <ying.huang@intel.com> wrote:
>
> Andrew Morton <akpm@linux-foundation.org> writes:
>
> > On Wed, 17 Aug 2022 16:14:00 +0800 Huang Ying <ying.huang@intel.com> wrote:
> >
> >> error-inject.patch, test-migrate.c, and test-migrate.sh are as below.
> >> It turns out that error injection is an important tool to fix bugs in
> >> error path.
> >
> > Indeed, and thanks for doing this.
> >
> > Did you consider lib/*inject*.c?  If so, was it unsuitable?
>

Thanks, Ying. Great tips!

> I have done some experiments to use some existing error injection
> mechanisms in kernel to test the error path of migrate_pages().  After
> some googling, I found that the BPF based error injection described in
> the following URL is most suitable for my purpose.
>
> https://lwn.net/Articles/740146/
>
> Because the BPF seems quite flexible to satisfy various requirements of
> error injection.  With it, the execution of some functions can be
> skipped and some specified error code can be returned instead.
>
> Works out of box
> ================
>
> Some error injection functionality just works out of box.  For example,
> inject some page allocation error in some path.  Firstly,
> CONFIG_BPF_KPROBE_OVERRIDE needs to be enabled in kernel config.  Then,
> a simple bpftrace script as follows can be used to inject page
> allocation error during migrate_pages().
>
> --------------------ENOMEM-----------------------
> kprobe:migrate_pages { @in_migrate_pages++; }
> kretprobe:migrate_pages  { @in_migrate_pages--; }
> kprobe:should_fail_alloc_page / @in_migrate_pages > 0 / {
>         override(1);
> }
> -------------------------------------------------
>
> The call chain of error injection is specified via the first 2 lines.  I
> copied the methods used in BCC inject script.  Is there any better
> method to specify the call chain?
>
> We can inject error only for THP page allocation too.
>
> --------------------ENOMEM THP-------------------
> kprobe:migrate_pages { @in_migrate_pages++; }
> kretprobe:migrate_pages  { @in_migrate_pages--; }
> kprobe:should_fail_alloc_page / @in_migrate_pages > 0 && arg1 == 9 / {
>         override(1);
> }
> -------------------------------------------------
>
> Use some hack to override any function
> ======================================
>
> The in-kernel BPF based error injection mechanism can only override
> function return value for the functions in the whitelist, that is,
> functions marked with ALLOW_ERROR_INJECTION().  That is quite limited.
> The thorough error path testing needs to override the return value of
> arbitrary function.  So, I use a simple hack patch as follows for that.
>
> -----------------------8<---------------------------------
> From 3bcd401a3731bc7316d222501070a2a71fdf7170 Mon Sep 17 00:00:00 2001
> From: Huang Ying <ying.huang@intel.com>
> Date: Tue, 20 Sep 2022 09:08:25 +0800
> Subject: [PATCH] dbg: allow override any function with bpf_error_injection
>
> ---
>  lib/error-inject.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/lib/error-inject.c b/lib/error-inject.c
> index 1afca1b1cdea..82a402e0f15c 100644
> --- a/lib/error-inject.c
> +++ b/lib/error-inject.c
> @@ -21,6 +21,7 @@ struct ei_entry {
>         void *priv;
>  };
>
> +#if 0
>  bool within_error_injection_list(unsigned long addr)
>  {
>         struct ei_entry *ent;
> @@ -36,6 +37,12 @@ bool within_error_injection_list(unsigned long addr)
>         mutex_unlock(&ei_mutex);
>         return ret;
>  }
> +#else
> +bool within_error_injection_list(unsigned long addr)
> +{
> +       return true;
> +}
> +#endif
>
>  int get_injectable_error_type(unsigned long addr)
>  {
> --
> 2.35.1
> ----------------------------------------------------------
>
> With this debug patch, most error path can be tested.  For example,
>
> --------------------ENOSYS THP + EAGAIN----------
> #include <linux/mm.h>
> kprobe:migrate_pages { @in_migrate_pages++; }
> kretprobe:migrate_pages  { @in_migrate_pages--; }
> kprobe:unmap_and_move / @in_migrate_pages > 0 / {
>         if (((struct page *)arg3)->flags & (1 << PG_head)) {
>                 override(-38);
>         } else {
>                 override(-11);
>         }
> }
> -------------------------------------------------
>
> With this, unmap_and_move() will return -ENOSYS (-38) for THP, and
> -EAGAIN (-11) for normal page.  This can be used to test the
> corresponding error path in migrate_pages().
>
> I think that it's quite common for developers to inject error for
> arbitrary function to test the error path.  Is it a good idea to turn on
> the arbitrary error injection if a special kernel configuration
> (e.g. CONFIG_BPF_KPROBE_OVERRIDE_ANY_FUNCTION) is enabled for debugging
> purpose only?
>
> Some hacks are still necessary for complete coverage
> ====================================================
>
> Even if we can override the return value of any function.  Some hacks
> are still necessary for complete coverage.  For example, some functions
> may be inlined, if we want to override its return value, we need to mark
> it with "noinline".  And some error cannot be injected with return value
> overridden directly.  For example, if we want to test when THP split
> isn't allowed condition in migrate_pages().  Then, some hack patch need
> to be used to do that.  For example, the below patch can do that.
>
> -----------------------8<---------------------------------
> From ca371806dc7f96148cbdf03fdf8f92309306a325 Mon Sep 17 00:00:00 2001
> From: Huang Ying <ying.huang@intel.com>
> Date: Tue, 20 Sep 2022 09:37:53 +0800
> Subject: [PATCH] dbg: inject nosplit
>
> ---
>  mm/migrate.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 571d8c9fd5bc..d4ee76c285b2 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -57,6 +57,11 @@
>
>  #include "internal.h"
>
> +static noinline bool error_inject_nosplit(void)
> +{
> +       return false;
> +}
> +
>  int isolate_movable_page(struct page *page, isolate_mode_t mode)
>  {
>         const struct movable_operations *mops;
> @@ -1412,6 +1417,9 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>         bool nosplit = (reason == MR_NUMA_MISPLACED);
>         bool no_subpage_counting = false;
>
> +       if (error_inject_nosplit())
> +               nosplit = true;
> +
>         trace_mm_migrate_pages_start(mode, reason);
>
>  thp_subpage_migration:
> --
> 2.35.1
> ----------------------------------------------------------
>
> With the help of the above patch, the following bpftrace script can
> inject the expected error,
>
> --------------------ENOMEM THP + nosplit---------
> kprobe:migrate_pages { @in_migrate_pages++; }
> kretprobe:migrate_pages  { @in_migrate_pages--; }
> kprobe:should_fail_alloc_page / @in_migrate_pages > 0 && arg1 == 9 / {
>         override(1);
> }
> kprobe:error_inject_nosplit / @in_migrate_pages > 0 / {
>         override(1);
> }
> -------------------------------------------------
>
> Although some hack patches are needed.  This is still simpler than my
> original hand-made error injection solution.  So I will recommend
> developers to use it in the error path testing in the future.
>
> Best Regards,
> Huang, Ying
diff mbox

Patch

diff --git a/mm/migrate.c b/mm/migrate.c
index 399904015d23..87d47064ec6c 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -337,6 +337,42 @@  void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
 }
 #endif
 
+#define EI_MP_ENOSYS		0x0001
+#define EI_MP_THP_ENOMEM	0x0002
+#define EI_MP_NP_ENOMEM		0x0004
+#define EI_MP_EAGAIN		0x0008
+#define EI_MP_EOTHER		0x0010
+#define EI_MP_NOSPLIT		0x0020
+#define EI_MP_SPLIT_FAIL	0x0040
+#define EI_MP_EAGAIN_PERM	0x0080
+#define EI_MP_EBUSY		0x0100
+
+static unsigned int ei_migrate_pages;
+
+module_param(ei_migrate_pages, uint, 0644);
+
+static bool ei_thp_migration_supported(void)
+{
+	if (ei_migrate_pages & EI_MP_ENOSYS)
+		return false;
+	else
+		return thp_migration_supported();
+}
+
+static int ei_trylock_page(struct page *page)
+{
+	if (ei_migrate_pages & EI_MP_EAGAIN)
+		return 0;
+	return trylock_page(page);
+}
+
+static int ei_split_huge_page_to_list(struct page *page, struct list_head *list)
+{
+	if (ei_migrate_pages & EI_MP_SPLIT_FAIL)
+		return -EBUSY;
+	return split_huge_page_to_list(page, list);
+}
+
 static int expected_page_refs(struct address_space *mapping, struct page *page)
 {
 	int expected_count = 1;
@@ -368,6 +404,9 @@  int folio_migrate_mapping(struct address_space *mapping,
 		if (folio_ref_count(folio) != expected_count)
 			return -EAGAIN;
 
+		if (ei_migrate_pages & EI_MP_EAGAIN_PERM)
+			return -EAGAIN;
+
 		/* No turning back from here */
 		newfolio->index = folio->index;
 		newfolio->mapping = folio->mapping;
@@ -929,7 +968,7 @@  static int __unmap_and_move(struct page *page, struct page *newpage,
 	struct anon_vma *anon_vma = NULL;
 	bool is_lru = !__PageMovable(page);
 
-	if (!trylock_page(page)) {
+	if (!ei_trylock_page(page)) {
 		if (!force || mode == MIGRATE_ASYNC)
 			goto out;
 
@@ -952,6 +991,11 @@  static int __unmap_and_move(struct page *page, struct page *newpage,
 		lock_page(page);
 	}
 
+	if (ei_migrate_pages & EI_MP_EBUSY) {
+		rc = -EBUSY;
+		goto out_unlock;
+	}
+
 	if (PageWriteback(page)) {
 		/*
 		 * Only in the case of a full synchronous migration is it
@@ -1086,7 +1130,7 @@  static int unmap_and_move(new_page_t get_new_page,
 	int rc = MIGRATEPAGE_SUCCESS;
 	struct page *newpage = NULL;
 
-	if (!thp_migration_supported() && PageTransHuge(page))
+	if (!ei_thp_migration_supported() && PageTransHuge(page))
 		return -ENOSYS;
 
 	if (page_count(page) == 1) {
@@ -1102,6 +1146,11 @@  static int unmap_and_move(new_page_t get_new_page,
 		goto out;
 	}
 
+	if ((ei_migrate_pages & EI_MP_THP_ENOMEM) && PageTransHuge(page))
+		return -ENOMEM;
+	if ((ei_migrate_pages & EI_MP_NP_ENOMEM) && !PageTransHuge(page))
+		return -ENOMEM;
+
 	newpage = get_new_page(page, private);
 	if (!newpage)
 		return -ENOMEM;
@@ -1305,7 +1354,7 @@  static inline int try_split_thp(struct page *page, struct list_head *split_pages
 	int rc;
 
 	lock_page(page);
-	rc = split_huge_page_to_list(page, split_pages);
+	rc = ei_split_huge_page_to_list(page, split_pages);
 	unlock_page(page);
 
 	return rc;
@@ -1358,6 +1407,9 @@  int migrate_pages(struct list_head *from, new_page_t get_new_page,
 	bool nosplit = (reason == MR_NUMA_MISPLACED);
 	bool no_subpage_counting = false;
 
+	if (ei_migrate_pages & EI_MP_NOSPLIT)
+		nosplit = true;
+
 	trace_mm_migrate_pages_start(mode, reason);
 
 thp_subpage_migration: