diff mbox

mm: fix kswap excessive pressure after wrong condition transfer

Message ID 20180531193420.26087-1-ikalvachev@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ivan Kalvachev May 31, 2018, 7:34 p.m. UTC
Fixes commit 69d763fc6d3aee787a3e8c8c35092b4f4960fa5d
(mm: pin address_space before dereferencing it while isolating an LRU page)

working code:

    mapping = page_mapping(page);
    if (mapping && !mapping->a_ops->migratepage)
        return ret;

buggy code:

    if (!trylock_page(page))
        return ret;

    mapping = page_mapping(page);
    migrate_dirty = mapping && mapping->a_ops->migratepage;
    unlock_page(page);
    if (!migrate_dirty)
        return ret;

The problem is that !(a && b) = (!a || !b) while the old code was (a && !b).
The commit message of the buggy commit explains the need for locking/unlocking
around the check but does not give any reason for the change of the condition.
It seems to be an unintended change.

The result of that change is noticeable under swap pressure.
Big memory consumers like browsers would have a lot of pages swapped out,
even pages that are been used actively, causing the process to repeatedly
block for second or longer. At the same time there would be gigabytes of
unused free memory (sometimes half of the total RAM).
The buffers/cache would also be at minimum size.

Fixes: 69d763fc6d3a ("mm: pin address_space before dereferencing it while isolating an LRU page")
Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
---
 mm/vmscan.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Greg Thelen May 31, 2018, 7:51 p.m. UTC | #1
On Thu, May 31, 2018 at 12:34 PM Ivan Kalvachev <ikalvachev@gmail.com> wrote:
>
> Fixes commit 69d763fc6d3aee787a3e8c8c35092b4f4960fa5d
> (mm: pin address_space before dereferencing it while isolating an LRU page)
>
> working code:
>
>     mapping = page_mapping(page);
>     if (mapping && !mapping->a_ops->migratepage)
>         return ret;
>
> buggy code:
>
>     if (!trylock_page(page))
>         return ret;
>
>     mapping = page_mapping(page);
>     migrate_dirty = mapping && mapping->a_ops->migratepage;
>     unlock_page(page);
>     if (!migrate_dirty)
>         return ret;
>
> The problem is that !(a && b) = (!a || !b) while the old code was (a && !b).
> The commit message of the buggy commit explains the need for locking/unlocking
> around the check but does not give any reason for the change of the condition.
> It seems to be an unintended change.
>
> The result of that change is noticeable under swap pressure.
> Big memory consumers like browsers would have a lot of pages swapped out,
> even pages that are been used actively, causing the process to repeatedly
> block for second or longer. At the same time there would be gigabytes of
> unused free memory (sometimes half of the total RAM).
> The buffers/cache would also be at minimum size.
>
> Fixes: 69d763fc6d3a ("mm: pin address_space before dereferencing it while isolating an LRU page")
> Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
> ---
>  mm/vmscan.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 9b697323a88c..83df26078d13 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1418,9 +1418,9 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode)
>                                 return ret;
>
>                         mapping = page_mapping(page);
> -                       migrate_dirty = mapping && mapping->a_ops->migratepage;
> +                       migrate_dirty = mapping && !mapping->a_ops->migratepage;
>                         unlock_page(page);
> -                       if (!migrate_dirty)
> +                       if (migrate_dirty)
>                                 return ret;
>                 }
>         }
> --
> 2.17.1

This looks like yesterday's https://lkml.org/lkml/2018/5/30/1158
Ivan Kalvachev May 31, 2018, 9:39 p.m. UTC | #2
On 5/31/18, Greg Thelen <gthelen@google.com> wrote:
> On Thu, May 31, 2018 at 12:34 PM Ivan Kalvachev <ikalvachev@gmail.com>
> wrote:
>>
>> Fixes commit 69d763fc6d3aee787a3e8c8c35092b4f4960fa5d
>> (mm: pin address_space before dereferencing it while isolating an LRU
>> page)
>>
>> working code:
>>
>>     mapping = page_mapping(page);
>>     if (mapping && !mapping->a_ops->migratepage)
>>         return ret;
>>
>> buggy code:
>>
>>     if (!trylock_page(page))
>>         return ret;
>>
>>     mapping = page_mapping(page);
>>     migrate_dirty = mapping && mapping->a_ops->migratepage;
>>     unlock_page(page);
>>     if (!migrate_dirty)
>>         return ret;
>>
>> The problem is that !(a && b) = (!a || !b) while the old code was (a &&
>> !b).
>> The commit message of the buggy commit explains the need for
>> locking/unlocking
>> around the check but does not give any reason for the change of the
>> condition.
>> It seems to be an unintended change.
>>
>> The result of that change is noticeable under swap pressure.
>> Big memory consumers like browsers would have a lot of pages swapped out,
>> even pages that are been used actively, causing the process to repeatedly
>> block for second or longer. At the same time there would be gigabytes of
>> unused free memory (sometimes half of the total RAM).
>> The buffers/cache would also be at minimum size.
>>
>> Fixes: 69d763fc6d3a ("mm: pin address_space before dereferencing it while
>> isolating an LRU page")
>> Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
>> ---
>>  mm/vmscan.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 9b697323a88c..83df26078d13 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1418,9 +1418,9 @@ int __isolate_lru_page(struct page *page,
>> isolate_mode_t mode)
>>                                 return ret;
>>
>>                         mapping = page_mapping(page);
>> -                       migrate_dirty = mapping &&
>> mapping->a_ops->migratepage;
>> +                       migrate_dirty = mapping &&
>> !mapping->a_ops->migratepage;
>>                         unlock_page(page);
>> -                       if (!migrate_dirty)
>> +                       if (migrate_dirty)
>>                                 return ret;
>>                 }
>>         }
>> --
>> 2.17.1
>
> This looks like yesterday's https://lkml.org/lkml/2018/5/30/1158
>

Yes, it seems to be the same problem.
It also have better technical description.

Such let down.
It took me so much time to bisect the issue...

Well, I hope that the fix will get into 4.17 release in time.


On 5/31/18, Greg Thelen <gthelen@google.com> wrote:
> On Thu, May 31, 2018 at 12:34 PM Ivan Kalvachev <ikalvachev@gmail.com>
> wrote:
>>
>> Fixes commit 69d763fc6d3aee787a3e8c8c35092b4f4960fa5d
>> (mm: pin address_space before dereferencing it while isolating an LRU
>> page)
>>
>> working code:
>>
>>     mapping = page_mapping(page);
>>     if (mapping && !mapping->a_ops->migratepage)
>>         return ret;
>>
>> buggy code:
>>
>>     if (!trylock_page(page))
>>         return ret;
>>
>>     mapping = page_mapping(page);
>>     migrate_dirty = mapping && mapping->a_ops->migratepage;
>>     unlock_page(page);
>>     if (!migrate_dirty)
>>         return ret;
>>
>> The problem is that !(a && b) = (!a || !b) while the old code was (a &&
>> !b).
>> The commit message of the buggy commit explains the need for
>> locking/unlocking
>> around the check but does not give any reason for the change of the
>> condition.
>> It seems to be an unintended change.
>>
>> The result of that change is noticeable under swap pressure.
>> Big memory consumers like browsers would have a lot of pages swapped out,
>> even pages that are been used actively, causing the process to repeatedly
>> block for second or longer. At the same time there would be gigabytes of
>> unused free memory (sometimes half of the total RAM).
>> The buffers/cache would also be at minimum size.
>>
>> Fixes: 69d763fc6d3a ("mm: pin address_space before dereferencing it while
>> isolating an LRU page")
>> Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
>> ---
>>  mm/vmscan.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 9b697323a88c..83df26078d13 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1418,9 +1418,9 @@ int __isolate_lru_page(struct page *page,
>> isolate_mode_t mode)
>>                                 return ret;
>>
>>                         mapping = page_mapping(page);
>> -                       migrate_dirty = mapping &&
>> mapping->a_ops->migratepage;
>> +                       migrate_dirty = mapping &&
>> !mapping->a_ops->migratepage;
>>                         unlock_page(page);
>> -                       if (!migrate_dirty)
>> +                       if (migrate_dirty)
>>                                 return ret;
>>                 }
>>         }
>> --
>> 2.17.1
>
> This looks like yesterday's https://lkml.org/lkml/2018/5/30/1158
>
Hugh Dickins May 31, 2018, 11:30 p.m. UTC | #3
On Fri, 1 Jun 2018, Ivan Kalvachev wrote:
> On 5/31/18, Greg Thelen <gthelen@google.com> wrote:
> > On Thu, May 31, 2018 at 12:34 PM Ivan Kalvachev <ikalvachev@gmail.com>
> > wrote:
> >>
> >> Fixes commit 69d763fc6d3aee787a3e8c8c35092b4f4960fa5d
> >> (mm: pin address_space before dereferencing it while isolating an LRU
> >> page)
> >>
> >> working code:
> >>
> >>     mapping = page_mapping(page);
> >>     if (mapping && !mapping->a_ops->migratepage)
> >>         return ret;
> >>
> >> buggy code:
> >>
> >>     if (!trylock_page(page))
> >>         return ret;
> >>
> >>     mapping = page_mapping(page);
> >>     migrate_dirty = mapping && mapping->a_ops->migratepage;
> >>     unlock_page(page);
> >>     if (!migrate_dirty)
> >>         return ret;
> >>
> >> The problem is that !(a && b) = (!a || !b) while the old code was (a &&
> >> !b).
> >> The commit message of the buggy commit explains the need for
> >> locking/unlocking
> >> around the check but does not give any reason for the change of the
> >> condition.
> >> It seems to be an unintended change.
> >>
> >> The result of that change is noticeable under swap pressure.
> >> Big memory consumers like browsers would have a lot of pages swapped out,
> >> even pages that are been used actively, causing the process to repeatedly
> >> block for second or longer. At the same time there would be gigabytes of
> >> unused free memory (sometimes half of the total RAM).
> >> The buffers/cache would also be at minimum size.
> >>
> >> Fixes: 69d763fc6d3a ("mm: pin address_space before dereferencing it while
> >> isolating an LRU page")
> >> Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
> >> ---
> >>  mm/vmscan.c | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >> index 9b697323a88c..83df26078d13 100644
> >> --- a/mm/vmscan.c
> >> +++ b/mm/vmscan.c
> >> @@ -1418,9 +1418,9 @@ int __isolate_lru_page(struct page *page,
> >> isolate_mode_t mode)
> >>                                 return ret;
> >>
> >>                         mapping = page_mapping(page);
> >> -                       migrate_dirty = mapping &&
> >> mapping->a_ops->migratepage;
> >> +                       migrate_dirty = mapping &&
> >> !mapping->a_ops->migratepage;
> >>                         unlock_page(page);
> >> -                       if (!migrate_dirty)
> >> +                       if (migrate_dirty)
> >>                                 return ret;
> >>                 }
> >>         }
> >> --
> >> 2.17.1
> >
> > This looks like yesterday's https://lkml.org/lkml/2018/5/30/1158
> >
> 
> Yes, it seems to be the same problem.
> It also have better technical description.

Well, your paragraph above on "Big memory consumers" gives a much
better user viewpoint, and a more urgent case for the patch to go in,
to stable if it does not make 4.17.0.

But I am surprised: the change is in a block of code only used in
one of the modes of compaction (not in  reclaim itself), and I thought
it was a mode which gives up quite easily, rather than visibly blocking. 

So I wonder if there's another issue to be improved here,
and the mistreatment of the ex-swap pages just exposed it somehow.
Cc'ing Vlastimil and David in case it triggers any insight from them.

> 
> Such let down.
> It took me so much time to bisect the issue...

Thank you for all your work on it, odd how we found it at the same
time: I was just porting Mel's patch into another tree, had to make
a change near there, and suddenly noticed that the test was wrong.

Hugh

> 
> Well, I hope that the fix will get into 4.17 release in time.
Vlastimil Babka June 1, 2018, 8:49 a.m. UTC | #4
On 06/01/2018 01:30 AM, Hugh Dickins wrote:
> On Fri, 1 Jun 2018, Ivan Kalvachev wrote:
>> On 5/31/18, Greg Thelen <gthelen@google.com> wrote:
>>>
>>> This looks like yesterday's https://lkml.org/lkml/2018/5/30/1158
>>>
>>
>> Yes, it seems to be the same problem.
>> It also have better technical description.
> 
> Well, your paragraph above on "Big memory consumers" gives a much
> better user viewpoint, and a more urgent case for the patch to go in,
> to stable if it does not make 4.17.0.
> 
> But I am surprised: the change is in a block of code only used in
> one of the modes of compaction (not in  reclaim itself), and I thought
> it was a mode which gives up quite easily, rather than visibly blocking. 
> 
> So I wonder if there's another issue to be improved here,
> and the mistreatment of the ex-swap pages just exposed it somehow.
> Cc'ing Vlastimil and David in case it triggers any insight from them.

My guess is that the problem is compaction fails because of the
isolation failures, causing further reclaim/complaction attempts with
higher priority, in the context of non-costly thus non-failing
allocations. Initially I thought that increased priority of compaction
would eventually synchronous and thus not go via this block of code
anymore. But (see isolate_migratepages()) only MIGRATE_SYNC compaction
mode drops the ISOLATE_ASYNC_MIGRATE isolate_mode flag. And MIGRATE_SYNC
is only used for compaction triggered via /proc - direct compaction
stops at MIGRATE_SYNC_LIGHT. Maybe that could be changed? Mel had
reasons to limit to SYNC_LIGHT, I guess...

If the above is correct, it means that even with gigabytes of free
memory you can fail order-3 (max non-costly order) allocation if
compaction doesn't work properly. That's a bit surprising, but not
impossible I guess...

Vlastimil

>>
>> Such let down.
>> It took me so much time to bisect the issue...
> 
> Thank you for all your work on it, odd how we found it at the same
> time: I was just porting Mel's patch into another tree, had to make
> a change near there, and suddenly noticed that the test was wrong.
> 
> Hugh
> 
>>
>> Well, I hope that the fix will get into 4.17 release in time.
>
Ivan Kalvachev June 11, 2018, 3:38 p.m. UTC | #5
On 6/1/18, Vlastimil Babka <vbabka@suse.cz> wrote:
> On 06/01/2018 01:30 AM, Hugh Dickins wrote:
>> On Fri, 1 Jun 2018, Ivan Kalvachev wrote:
>>> On 5/31/18, Greg Thelen <gthelen@google.com> wrote:
>>>>
>>>> This looks like yesterday's https://lkml.org/lkml/2018/5/30/1158
>>>>
>>>
>>> Yes, it seems to be the same problem.
>>> It also have better technical description.
>>
>> Well, your paragraph above on "Big memory consumers" gives a much
>> better user viewpoint, and a more urgent case for the patch to go in,
>> to stable if it does not make 4.17.0.
>>
>> But I am surprised: the change is in a block of code only used in
>> one of the modes of compaction (not in  reclaim itself), and I thought
>> it was a mode which gives up quite easily, rather than visibly blocking.
>>
>> So I wonder if there's another issue to be improved here,
>> and the mistreatment of the ex-swap pages just exposed it somehow.
>> Cc'ing Vlastimil and David in case it triggers any insight from them.
>
> My guess is that the problem is compaction fails because of the
> isolation failures, causing further reclaim/complaction attempts with
> higher priority, in the context of non-costly thus non-failing
> allocations. Initially I thought that increased priority of compaction
> would eventually synchronous and thus not go via this block of code
> anymore. But (see isolate_migratepages()) only MIGRATE_SYNC compaction
> mode drops the ISOLATE_ASYNC_MIGRATE isolate_mode flag. And MIGRATE_SYNC
> is only used for compaction triggered via /proc - direct compaction
> stops at MIGRATE_SYNC_LIGHT. Maybe that could be changed? Mel had
> reasons to limit to SYNC_LIGHT, I guess...
>
> If the above is correct, it means that even with gigabytes of free
> memory you can fail order-3 (max non-costly order) allocation if
> compaction doesn't work properly. That's a bit surprising, but not
> impossible I guess...

Is somebody working on testing this guess?

I don't fully understand this explanation, however I cannot imagine
non-costly allocation to fail when there are gigabytes of free
(unused) memory.

That's why I still think that the possibility that this bug is
triggering some underlying issue. So I did a little bit more poking
around.

For clarity, I'll be referring to the commits as:
-the bug : 69d763fc6d3a ("mm: pin address_space before dereferencing
it while isolating an LRU page")
-the fix : 145e1a71e090("mm: fix the NULL mapping case in __isolate_lru_page()")

The following results might be interesting to you:

1. I've discovered that 4.14.41 does not exhibit any problems, despite
having "the bug" backported into it . I used it again for a while, to
make sure I haven't overlooked it. No issues at all.

2. The 4.15 kernels were shortly supported so I backported "the bug"
on my own and run the kernel (first 4.15.18, later 4.15.0 ). At first
I thought that they were not affected, because I was not getting
blocking during use. However `top` showed that they also tend to
accumulate gigabytes of "free ram". Likely they were just better at
swapping unused pages.

3. I've tried the original 4.16.13 that has "the bug" but not "the
fix", however this time I disabled the "Transparent Hugepage Support"
from `make menuconfig`.
I ran that kernel for a while without any sign of issues.

So, before I start another round of bisect,
Does anybody have an educated guess what commit might have introduced
this behavior?

Do you think it is unintended behavior that should be investigated?

Any other hits?

Best Regards
   Ivan Kalvachev


>>>
>>> Such let down.
>>> It took me so much time to bisect the issue...
>>
>> Thank you for all your work on it, odd how we found it at the same
>> time: I was just porting Mel's patch into another tree, had to make
>> a change near there, and suddenly noticed that the test was wrong.
>>
>> Hugh
>>
>>>
>>> Well, I hope that the fix will get into 4.17 release in time.
>>
>
>
diff mbox

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9b697323a88c..83df26078d13 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1418,9 +1418,9 @@  int __isolate_lru_page(struct page *page, isolate_mode_t mode)
 				return ret;
 
 			mapping = page_mapping(page);
-			migrate_dirty = mapping && mapping->a_ops->migratepage;
+			migrate_dirty = mapping && !mapping->a_ops->migratepage;
 			unlock_page(page);
-			if (!migrate_dirty)
+			if (migrate_dirty)
 				return ret;
 		}
 	}