Message ID | 20240426190253.541419-1-zi.yan@sent.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v5] mm/rmap: do not add fully unmapped large folio to deferred split list | expand |
On 26.04.24 21:02, Zi Yan wrote: > From: Zi Yan <ziy@nvidia.com> > > In __folio_remove_rmap(), a large folio is added to deferred split list > if any page in a folio loses its final mapping. But it is possible that > the folio is fully unmapped and adding it to deferred split list is > unnecessary. > > For PMD-mapped THPs, that was not really an issue, because removing the > last PMD mapping in the absence of PTE mappings would not have added the > folio to the deferred split queue. > > However, for PTE-mapped THPs, which are now more prominent due to mTHP, > they are always added to the deferred split queue. One side effect > is that the THP_DEFERRED_SPLIT_PAGE stat for a PTE-mapped folio can be > unintentionally increased, making it look like there are many partially > mapped folios -- although the whole folio is fully unmapped stepwise. > > Core-mm now tries batch-unmapping consecutive PTEs of PTE-mapped THPs > where possible starting from commit b06dc281aa99 ("mm/rmap: introduce > folio_remove_rmap_[pte|ptes|pmd]()"). When it happens, a whole PTE-mapped > folio is unmapped in one go and can avoid being added to deferred split > list, reducing the THP_DEFERRED_SPLIT_PAGE noise. But there will still be > noise when we cannot batch-unmap a complete PTE-mapped folio in one go > -- or where this type of batching is not implemented yet, e.g., migration. > > To avoid the unnecessary addition, folio->_nr_pages_mapped is checked > to tell if the whole folio is unmapped. If the folio is already on > deferred split list, it will be skipped, too. > > Note: commit 98046944a159 ("mm: huge_memory: add the missing > folio_test_pmd_mappable() for THP split statistics") tried to exclude > mTHP deferred split stats from THP_DEFERRED_SPLIT_PAGE, but it does not > fix the above issue. A fully unmapped PTE-mapped order-9 THP was still > added to deferred split list and counted as THP_DEFERRED_SPLIT_PAGE, > since nr is 512 (non zero), level is RMAP_LEVEL_PTE, and inside > deferred_split_folio() the order-9 folio is folio_test_pmd_mappable(). > > Suggested-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Zi Yan <ziy@nvidia.com> > --- > mm/rmap.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/mm/rmap.c b/mm/rmap.c > index 2608c40dffad..a9bd64ebdd9a 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1495,6 +1495,7 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, > { > atomic_t *mapped = &folio->_nr_pages_mapped; > int last, nr = 0, nr_pmdmapped = 0; > + bool partially_mapped = false; > enum node_stat_item idx; > > __folio_rmap_sanity_checks(folio, page, nr_pages, level); > @@ -1515,6 +1516,8 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, > nr++; > } > } while (page++, --nr_pages > 0); > + > + partially_mapped = !!nr && !!atomic_read(mapped); Nit: The && should remove the need for both !!. Reviewed-by: David Hildenbrand <david@redhat.com>
On 26 Apr 2024, at 15:08, David Hildenbrand wrote: > On 26.04.24 21:02, Zi Yan wrote: >> From: Zi Yan <ziy@nvidia.com> >> >> In __folio_remove_rmap(), a large folio is added to deferred split list >> if any page in a folio loses its final mapping. But it is possible that >> the folio is fully unmapped and adding it to deferred split list is >> unnecessary. >> >> For PMD-mapped THPs, that was not really an issue, because removing the >> last PMD mapping in the absence of PTE mappings would not have added the >> folio to the deferred split queue. >> >> However, for PTE-mapped THPs, which are now more prominent due to mTHP, >> they are always added to the deferred split queue. One side effect >> is that the THP_DEFERRED_SPLIT_PAGE stat for a PTE-mapped folio can be >> unintentionally increased, making it look like there are many partially >> mapped folios -- although the whole folio is fully unmapped stepwise. >> >> Core-mm now tries batch-unmapping consecutive PTEs of PTE-mapped THPs >> where possible starting from commit b06dc281aa99 ("mm/rmap: introduce >> folio_remove_rmap_[pte|ptes|pmd]()"). When it happens, a whole PTE-mapped >> folio is unmapped in one go and can avoid being added to deferred split >> list, reducing the THP_DEFERRED_SPLIT_PAGE noise. But there will still be >> noise when we cannot batch-unmap a complete PTE-mapped folio in one go >> -- or where this type of batching is not implemented yet, e.g., migration. >> >> To avoid the unnecessary addition, folio->_nr_pages_mapped is checked >> to tell if the whole folio is unmapped. If the folio is already on >> deferred split list, it will be skipped, too. >> >> Note: commit 98046944a159 ("mm: huge_memory: add the missing >> folio_test_pmd_mappable() for THP split statistics") tried to exclude >> mTHP deferred split stats from THP_DEFERRED_SPLIT_PAGE, but it does not >> fix the above issue. A fully unmapped PTE-mapped order-9 THP was still >> added to deferred split list and counted as THP_DEFERRED_SPLIT_PAGE, >> since nr is 512 (non zero), level is RMAP_LEVEL_PTE, and inside >> deferred_split_folio() the order-9 folio is folio_test_pmd_mappable(). >> >> Suggested-by: David Hildenbrand <david@redhat.com> >> Signed-off-by: Zi Yan <ziy@nvidia.com> >> --- >> mm/rmap.c | 12 +++++++++--- >> 1 file changed, 9 insertions(+), 3 deletions(-) >> >> diff --git a/mm/rmap.c b/mm/rmap.c >> index 2608c40dffad..a9bd64ebdd9a 100644 >> --- a/mm/rmap.c >> +++ b/mm/rmap.c >> @@ -1495,6 +1495,7 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, >> { >> atomic_t *mapped = &folio->_nr_pages_mapped; >> int last, nr = 0, nr_pmdmapped = 0; >> + bool partially_mapped = false; >> enum node_stat_item idx; >> __folio_rmap_sanity_checks(folio, page, nr_pages, level); >> @@ -1515,6 +1516,8 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, >> nr++; >> } >> } while (page++, --nr_pages > 0); >> + >> + partially_mapped = !!nr && !!atomic_read(mapped); > > Nit: The && should remove the need for both !!. My impression was that !! is needed to convert from int to bool and I do find "!!int && !!int" use in the kernel. If this is unnecessary, Andrew can apply the fixup below. I can send a new version if it is really needed. diff --git a/mm/rmap.c b/mm/rmap.c index a9bd64ebdd9a..c1fd5828409b 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1517,7 +1517,7 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, } } while (page++, --nr_pages > 0); - partially_mapped = !!nr && !!atomic_read(mapped); + partially_mapped = nr && atomic_read(mapped); break; case RMAP_LEVEL_PMD: atomic_dec(&folio->_large_mapcount); > > Reviewed-by: David Hildenbrand <david@redhat.com> Thanks. -- Best Regards, Yan, Zi
On 26.04.24 21:20, Zi Yan wrote: > On 26 Apr 2024, at 15:08, David Hildenbrand wrote: > >> On 26.04.24 21:02, Zi Yan wrote: >>> From: Zi Yan <ziy@nvidia.com> >>> >>> In __folio_remove_rmap(), a large folio is added to deferred split list >>> if any page in a folio loses its final mapping. But it is possible that >>> the folio is fully unmapped and adding it to deferred split list is >>> unnecessary. >>> >>> For PMD-mapped THPs, that was not really an issue, because removing the >>> last PMD mapping in the absence of PTE mappings would not have added the >>> folio to the deferred split queue. >>> >>> However, for PTE-mapped THPs, which are now more prominent due to mTHP, >>> they are always added to the deferred split queue. One side effect >>> is that the THP_DEFERRED_SPLIT_PAGE stat for a PTE-mapped folio can be >>> unintentionally increased, making it look like there are many partially >>> mapped folios -- although the whole folio is fully unmapped stepwise. >>> >>> Core-mm now tries batch-unmapping consecutive PTEs of PTE-mapped THPs >>> where possible starting from commit b06dc281aa99 ("mm/rmap: introduce >>> folio_remove_rmap_[pte|ptes|pmd]()"). When it happens, a whole PTE-mapped >>> folio is unmapped in one go and can avoid being added to deferred split >>> list, reducing the THP_DEFERRED_SPLIT_PAGE noise. But there will still be >>> noise when we cannot batch-unmap a complete PTE-mapped folio in one go >>> -- or where this type of batching is not implemented yet, e.g., migration. >>> >>> To avoid the unnecessary addition, folio->_nr_pages_mapped is checked >>> to tell if the whole folio is unmapped. If the folio is already on >>> deferred split list, it will be skipped, too. >>> >>> Note: commit 98046944a159 ("mm: huge_memory: add the missing >>> folio_test_pmd_mappable() for THP split statistics") tried to exclude >>> mTHP deferred split stats from THP_DEFERRED_SPLIT_PAGE, but it does not >>> fix the above issue. A fully unmapped PTE-mapped order-9 THP was still >>> added to deferred split list and counted as THP_DEFERRED_SPLIT_PAGE, >>> since nr is 512 (non zero), level is RMAP_LEVEL_PTE, and inside >>> deferred_split_folio() the order-9 folio is folio_test_pmd_mappable(). >>> >>> Suggested-by: David Hildenbrand <david@redhat.com> >>> Signed-off-by: Zi Yan <ziy@nvidia.com> >>> --- >>> mm/rmap.c | 12 +++++++++--- >>> 1 file changed, 9 insertions(+), 3 deletions(-) >>> >>> diff --git a/mm/rmap.c b/mm/rmap.c >>> index 2608c40dffad..a9bd64ebdd9a 100644 >>> --- a/mm/rmap.c >>> +++ b/mm/rmap.c >>> @@ -1495,6 +1495,7 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, >>> { >>> atomic_t *mapped = &folio->_nr_pages_mapped; >>> int last, nr = 0, nr_pmdmapped = 0; >>> + bool partially_mapped = false; >>> enum node_stat_item idx; >>> __folio_rmap_sanity_checks(folio, page, nr_pages, level); >>> @@ -1515,6 +1516,8 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, >>> nr++; >>> } >>> } while (page++, --nr_pages > 0); >>> + >>> + partially_mapped = !!nr && !!atomic_read(mapped); >> >> Nit: The && should remove the need for both !!. > > My impression was that !! is needed to convert from int to bool and I do > find "!!int && !!int" use in the kernel. I might be wrong about this, but if you wouldn't write if (!!nr && !!atomic_read(mapped)) then bool partially_mapped = nr && atomic_read(mapped); is sufficient. && would make sure that the result is either 0 or 1, which you can store safely in a bool, no matter which underlying type is used to store that value. But I *think* nowdays, the compiler will always handle that correctly, even without the "&&" (ever since C99 added _Bool). Likely, also bool partially_mapped = nr & atomic_read(mapped); Would nowadays work, but looks stupid. Related: https://lkml.org/lkml/2013/8/31/138 --- #include <stdio.h> #include <stdbool.h> #include <stdint.h> #include <inttypes.h> volatile uint64_t a = 0x8000000000000000ull; void main (void) { printf("uint64_t a = a: 0x%" PRIx64 "\n", a); int i1 = a; printf("int i1 = a: %d\n", i1); int i2 = !!a; printf("int i2 = !!a: %d\n", i2); bool b1 = a; printf("bool b1 = a: %d\n", b1); bool b2 = !!a; printf("bool b2 = !!a: %d\n", b2); } --- $ ./test uint64_t a = a: 0x8000000000000000 int i1 = a: 0 int i2 = !!a: 1 bool b1 = a: 1 bool b2 = !!a: 1 --- Note that if bool would be defined as "int", you would need the !!, otherwise you would lose information. But even for b1, the gcc generates now: 40118c: 48 8b 05 7d 2e 00 00 mov 0x2e7d(%rip),%rax # 404010 <a> 401193: 48 85 c0 test %rax,%rax 401196: 0f 95 c0 setne %al My stdbool.h contains #define bool _Bool And I think C99 added _Bool that makes that work. But I didn't read the standard, and it's time for the weekend :)
On 26 Apr 2024, at 16:15, David Hildenbrand wrote: > On 26.04.24 21:20, Zi Yan wrote: >> On 26 Apr 2024, at 15:08, David Hildenbrand wrote: >> >>> On 26.04.24 21:02, Zi Yan wrote: >>>> From: Zi Yan <ziy@nvidia.com> >>>> >>>> In __folio_remove_rmap(), a large folio is added to deferred split list >>>> if any page in a folio loses its final mapping. But it is possible that >>>> the folio is fully unmapped and adding it to deferred split list is >>>> unnecessary. >>>> >>>> For PMD-mapped THPs, that was not really an issue, because removing the >>>> last PMD mapping in the absence of PTE mappings would not have added the >>>> folio to the deferred split queue. >>>> >>>> However, for PTE-mapped THPs, which are now more prominent due to mTHP, >>>> they are always added to the deferred split queue. One side effect >>>> is that the THP_DEFERRED_SPLIT_PAGE stat for a PTE-mapped folio can be >>>> unintentionally increased, making it look like there are many partially >>>> mapped folios -- although the whole folio is fully unmapped stepwise. >>>> >>>> Core-mm now tries batch-unmapping consecutive PTEs of PTE-mapped THPs >>>> where possible starting from commit b06dc281aa99 ("mm/rmap: introduce >>>> folio_remove_rmap_[pte|ptes|pmd]()"). When it happens, a whole PTE-mapped >>>> folio is unmapped in one go and can avoid being added to deferred split >>>> list, reducing the THP_DEFERRED_SPLIT_PAGE noise. But there will still be >>>> noise when we cannot batch-unmap a complete PTE-mapped folio in one go >>>> -- or where this type of batching is not implemented yet, e.g., migration. >>>> >>>> To avoid the unnecessary addition, folio->_nr_pages_mapped is checked >>>> to tell if the whole folio is unmapped. If the folio is already on >>>> deferred split list, it will be skipped, too. >>>> >>>> Note: commit 98046944a159 ("mm: huge_memory: add the missing >>>> folio_test_pmd_mappable() for THP split statistics") tried to exclude >>>> mTHP deferred split stats from THP_DEFERRED_SPLIT_PAGE, but it does not >>>> fix the above issue. A fully unmapped PTE-mapped order-9 THP was still >>>> added to deferred split list and counted as THP_DEFERRED_SPLIT_PAGE, >>>> since nr is 512 (non zero), level is RMAP_LEVEL_PTE, and inside >>>> deferred_split_folio() the order-9 folio is folio_test_pmd_mappable(). >>>> >>>> Suggested-by: David Hildenbrand <david@redhat.com> >>>> Signed-off-by: Zi Yan <ziy@nvidia.com> >>>> --- >>>> mm/rmap.c | 12 +++++++++--- >>>> 1 file changed, 9 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/mm/rmap.c b/mm/rmap.c >>>> index 2608c40dffad..a9bd64ebdd9a 100644 >>>> --- a/mm/rmap.c >>>> +++ b/mm/rmap.c >>>> @@ -1495,6 +1495,7 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, >>>> { >>>> atomic_t *mapped = &folio->_nr_pages_mapped; >>>> int last, nr = 0, nr_pmdmapped = 0; >>>> + bool partially_mapped = false; >>>> enum node_stat_item idx; >>>> __folio_rmap_sanity_checks(folio, page, nr_pages, level); >>>> @@ -1515,6 +1516,8 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, >>>> nr++; >>>> } >>>> } while (page++, --nr_pages > 0); >>>> + >>>> + partially_mapped = !!nr && !!atomic_read(mapped); >>> >>> Nit: The && should remove the need for both !!. >> >> My impression was that !! is needed to convert from int to bool and I do >> find "!!int && !!int" use in the kernel. > > I might be wrong about this, but if you wouldn't write > > if (!!nr && !!atomic_read(mapped)) > > then > > bool partially_mapped = nr && atomic_read(mapped); > > is sufficient. > > && would make sure that the result is either 0 or 1, which > you can store safely in a bool, no matter which underlying type > is used to store that value. > > But I *think* nowdays, the compiler will always handle that > correctly, even without the "&&" (ever since C99 added _Bool). > > Likely, also > > bool partially_mapped = nr & atomic_read(mapped); > > Would nowadays work, but looks stupid. > > > Related: https://lkml.org/lkml/2013/8/31/138 > > --- > #include <stdio.h> > #include <stdbool.h> > #include <stdint.h> > #include <inttypes.h> > > volatile uint64_t a = 0x8000000000000000ull; > > void main (void) { > printf("uint64_t a = a: 0x%" PRIx64 "\n", a); > > int i1 = a; > printf("int i1 = a: %d\n", i1); > > int i2 = !!a; > printf("int i2 = !!a: %d\n", i2); > > bool b1 = a; > printf("bool b1 = a: %d\n", b1); > > bool b2 = !!a; > printf("bool b2 = !!a: %d\n", b2); > } > --- > $ ./test > uint64_t a = a: 0x8000000000000000 > int i1 = a: 0 > int i2 = !!a: 1 > bool b1 = a: 1 > bool b2 = !!a: 1 > --- > > Note that if bool would be defined as "int", you would need the !!, otherwise you > would lose information. Thank you for all above. Really appreciate it! And you are right about && and !!. My memory on !! must be from the old days and now is refreshed. :) > > But even for b1, the gcc generates now: > > 40118c: 48 8b 05 7d 2e 00 00 mov 0x2e7d(%rip),%rax # 404010 <a> > 401193: 48 85 c0 test %rax,%rax > 401196: 0f 95 c0 setne %al > > > My stdbool.h contains > > #define bool _Bool > > And I think C99 added _Bool that makes that work. > > But I didn't read the standard, and it's time for the weekend :) Have a good weekend! -- Best Regards, Yan, Zi
On Fri, Apr 26, 2024 at 12:02 PM Zi Yan <zi.yan@sent.com> wrote: > > From: Zi Yan <ziy@nvidia.com> > > In __folio_remove_rmap(), a large folio is added to deferred split list > if any page in a folio loses its final mapping. But it is possible that > the folio is fully unmapped and adding it to deferred split list is > unnecessary. > > For PMD-mapped THPs, that was not really an issue, because removing the > last PMD mapping in the absence of PTE mappings would not have added the > folio to the deferred split queue. > > However, for PTE-mapped THPs, which are now more prominent due to mTHP, > they are always added to the deferred split queue. One side effect > is that the THP_DEFERRED_SPLIT_PAGE stat for a PTE-mapped folio can be > unintentionally increased, making it look like there are many partially > mapped folios -- although the whole folio is fully unmapped stepwise. > > Core-mm now tries batch-unmapping consecutive PTEs of PTE-mapped THPs > where possible starting from commit b06dc281aa99 ("mm/rmap: introduce > folio_remove_rmap_[pte|ptes|pmd]()"). When it happens, a whole PTE-mapped > folio is unmapped in one go and can avoid being added to deferred split > list, reducing the THP_DEFERRED_SPLIT_PAGE noise. But there will still be > noise when we cannot batch-unmap a complete PTE-mapped folio in one go > -- or where this type of batching is not implemented yet, e.g., migration. > > To avoid the unnecessary addition, folio->_nr_pages_mapped is checked > to tell if the whole folio is unmapped. If the folio is already on > deferred split list, it will be skipped, too. > > Note: commit 98046944a159 ("mm: huge_memory: add the missing > folio_test_pmd_mappable() for THP split statistics") tried to exclude > mTHP deferred split stats from THP_DEFERRED_SPLIT_PAGE, but it does not > fix the above issue. A fully unmapped PTE-mapped order-9 THP was still > added to deferred split list and counted as THP_DEFERRED_SPLIT_PAGE, > since nr is 512 (non zero), level is RMAP_LEVEL_PTE, and inside > deferred_split_folio() the order-9 folio is folio_test_pmd_mappable(). > > Suggested-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Yang Shi <shy828301@gmail.com> > --- > mm/rmap.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/mm/rmap.c b/mm/rmap.c > index 2608c40dffad..a9bd64ebdd9a 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1495,6 +1495,7 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, > { > atomic_t *mapped = &folio->_nr_pages_mapped; > int last, nr = 0, nr_pmdmapped = 0; > + bool partially_mapped = false; > enum node_stat_item idx; > > __folio_rmap_sanity_checks(folio, page, nr_pages, level); > @@ -1515,6 +1516,8 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, > nr++; > } > } while (page++, --nr_pages > 0); > + > + partially_mapped = !!nr && !!atomic_read(mapped); > break; > case RMAP_LEVEL_PMD: > atomic_dec(&folio->_large_mapcount); > @@ -1532,6 +1535,8 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, > nr = 0; > } > } > + > + partially_mapped = nr < nr_pmdmapped; > break; > } > > @@ -1553,9 +1558,10 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, > * page of the folio is unmapped and at least one page > * is still mapped. > */ > - if (folio_test_large(folio) && folio_test_anon(folio)) > - if (level == RMAP_LEVEL_PTE || nr < nr_pmdmapped) > - deferred_split_folio(folio); > + if (folio_test_anon(folio) && > + list_empty(&folio->_deferred_list) && > + partially_mapped) > + deferred_split_folio(folio); > } > > /* > > base-commit: 3dba658670af22074cc6f26dc92efe0013ac3359 > -- > 2.43.0 >
On Sat, Apr 27, 2024 at 4:16 AM David Hildenbrand <david@redhat.com> wrote: > > On 26.04.24 21:20, Zi Yan wrote: > > On 26 Apr 2024, at 15:08, David Hildenbrand wrote: [...] > >>> + bool partially_mapped = false; [...] > >>> + > >>> + partially_mapped = !!nr && !!atomic_read(mapped); > >> > >> Nit: The && should remove the need for both !!. > > > > My impression was that !! is needed to convert from int to bool and I do > > find "!!int && !!int" use in the kernel. > > I might be wrong about this, but if you wouldn't write I think you're correct. > > if (!!nr && !!atomic_read(mapped)) > > then > > bool partially_mapped = nr && atomic_read(mapped); > > is sufficient. +1 > > && would make sure that the result is either 0 or 1, which > you can store safely in a bool, no matter which underlying type > is used to store that value. > > But I *think* nowdays, the compiler will always handle that > correctly, even without the "&&" (ever since C99 added _Bool). > > Likely, also > > bool partially_mapped = nr & atomic_read(mapped); > > Would nowadays work, but looks stupid. > > > Related: https://lkml.org/lkml/2013/8/31/138 > > --- > #include <stdio.h> > #include <stdbool.h> > #include <stdint.h> > #include <inttypes.h> > > volatile uint64_t a = 0x8000000000000000ull; > > void main (void) { > printf("uint64_t a = a: 0x%" PRIx64 "\n", a); > > int i1 = a; > printf("int i1 = a: %d\n", i1); > > int i2 = !!a; > printf("int i2 = !!a: %d\n", i2); > > bool b1 = a; > printf("bool b1 = a: %d\n", b1); > > bool b2 = !!a; > printf("bool b2 = !!a: %d\n", b2); > } > --- > $ ./test > uint64_t a = a: 0x8000000000000000 > int i1 = a: 0 > int i2 = !!a: 1 > bool b1 = a: 1 > bool b2 = !!a: 1 > --- > > Note that if bool would be defined as "int", you would need the !!, otherwise you > would lose information. Agreed. We need to be careful in this case. > > But even for b1, the gcc generates now: > > 40118c: 48 8b 05 7d 2e 00 00 mov 0x2e7d(%rip),%rax # 404010 <a> > 401193: 48 85 c0 test %rax,%rax > 401196: 0f 95 c0 setne %al > > > My stdbool.h contains > > #define bool _Bool > > And I think C99 added _Bool that makes that work. > > But I didn't read the standard, and it's time for the weekend :) I just read the C99 and found some interesting information as follows: 6.3.1.2 Boolean type When any *scalar value* is converted to _Bool, the result is 0 if the value compares equal to 0; otherwise, the result is 1. 6.2.5 Types 21. Arithmetic types and pointer types are collectively called *scalar types*. Array and structure types are collectively called aggregate types. 6.5.13 Logical AND operator Semantics The && operator shall yield 1 if both of its operands compare unequal to 0; otherwise, it yields 0. The result has type int. 6.5.10 Bitwise AND operator Constraints Each of the operands shall have integer type. Semantics The result of the binary & operator is the bitwise AND of the operands (that is, each bit in the result is set if and only if each of the corresponding bits in the converted operands is set). && would ensure that the result is either 0 or 1, as David said, so no worries. We defined partially_mapped as a bool(_Bool). IIUC, "partially_mapped = int & int;" would work correctly as well. However, "partially_mapped = long & int;" might not. Using && would be nicer as David suggested :p Thanks, Lance > > -- > Cheers, > > David / dhildenb >
On Sat, Apr 27, 2024 at 3:02 AM Zi Yan <zi.yan@sent.com> wrote: > > From: Zi Yan <ziy@nvidia.com> > > In __folio_remove_rmap(), a large folio is added to deferred split list > if any page in a folio loses its final mapping. But it is possible that > the folio is fully unmapped and adding it to deferred split list is > unnecessary. > > For PMD-mapped THPs, that was not really an issue, because removing the > last PMD mapping in the absence of PTE mappings would not have added the > folio to the deferred split queue. > > However, for PTE-mapped THPs, which are now more prominent due to mTHP, > they are always added to the deferred split queue. One side effect > is that the THP_DEFERRED_SPLIT_PAGE stat for a PTE-mapped folio can be > unintentionally increased, making it look like there are many partially > mapped folios -- although the whole folio is fully unmapped stepwise. > > Core-mm now tries batch-unmapping consecutive PTEs of PTE-mapped THPs > where possible starting from commit b06dc281aa99 ("mm/rmap: introduce > folio_remove_rmap_[pte|ptes|pmd]()"). When it happens, a whole PTE-mapped > folio is unmapped in one go and can avoid being added to deferred split > list, reducing the THP_DEFERRED_SPLIT_PAGE noise. But there will still be > noise when we cannot batch-unmap a complete PTE-mapped folio in one go > -- or where this type of batching is not implemented yet, e.g., migration. > > To avoid the unnecessary addition, folio->_nr_pages_mapped is checked > to tell if the whole folio is unmapped. If the folio is already on > deferred split list, it will be skipped, too. > > Note: commit 98046944a159 ("mm: huge_memory: add the missing > folio_test_pmd_mappable() for THP split statistics") tried to exclude > mTHP deferred split stats from THP_DEFERRED_SPLIT_PAGE, but it does not > fix the above issue. A fully unmapped PTE-mapped order-9 THP was still > added to deferred split list and counted as THP_DEFERRED_SPLIT_PAGE, > since nr is 512 (non zero), level is RMAP_LEVEL_PTE, and inside > deferred_split_folio() the order-9 folio is folio_test_pmd_mappable(). > > Suggested-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Lance Yang <ioworker0@gmail.com> Thanks, Lance > --- > mm/rmap.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/mm/rmap.c b/mm/rmap.c > index 2608c40dffad..a9bd64ebdd9a 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1495,6 +1495,7 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, > { > atomic_t *mapped = &folio->_nr_pages_mapped; > int last, nr = 0, nr_pmdmapped = 0; > + bool partially_mapped = false; > enum node_stat_item idx; > > __folio_rmap_sanity_checks(folio, page, nr_pages, level); > @@ -1515,6 +1516,8 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, > nr++; > } > } while (page++, --nr_pages > 0); > + > + partially_mapped = !!nr && !!atomic_read(mapped); > break; > case RMAP_LEVEL_PMD: > atomic_dec(&folio->_large_mapcount); > @@ -1532,6 +1535,8 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, > nr = 0; > } > } > + > + partially_mapped = nr < nr_pmdmapped; > break; > } > > @@ -1553,9 +1558,10 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, > * page of the folio is unmapped and at least one page > * is still mapped. > */ > - if (folio_test_large(folio) && folio_test_anon(folio)) > - if (level == RMAP_LEVEL_PTE || nr < nr_pmdmapped) > - deferred_split_folio(folio); > + if (folio_test_anon(folio) && > + list_empty(&folio->_deferred_list) && > + partially_mapped) > + deferred_split_folio(folio); > } > > /* > > base-commit: 3dba658670af22074cc6f26dc92efe0013ac3359 > -- > 2.43.0 >
On 27.04.24 06:06, Lance Yang wrote: > On Sat, Apr 27, 2024 at 4:16 AM David Hildenbrand <david@redhat.com> wrote: >> >> On 26.04.24 21:20, Zi Yan wrote: >>> On 26 Apr 2024, at 15:08, David Hildenbrand wrote: > [...] >>>>> + bool partially_mapped = false; > [...] >>>>> + >>>>> + partially_mapped = !!nr && !!atomic_read(mapped); >>>> >>>> Nit: The && should remove the need for both !!. >>> >>> My impression was that !! is needed to convert from int to bool and I do >>> find "!!int && !!int" use in the kernel. >> >> I might be wrong about this, but if you wouldn't write > > I think you're correct. > >> >> if (!!nr && !!atomic_read(mapped)) >> >> then >> >> bool partially_mapped = nr && atomic_read(mapped); >> >> is sufficient. > > +1 > >> >> && would make sure that the result is either 0 or 1, which >> you can store safely in a bool, no matter which underlying type >> is used to store that value. >> >> But I *think* nowdays, the compiler will always handle that >> correctly, even without the "&&" (ever since C99 added _Bool). >> >> Likely, also >> >> bool partially_mapped = nr & atomic_read(mapped); >> >> Would nowadays work, but looks stupid. >> >> >> Related: https://lkml.org/lkml/2013/8/31/138 >> >> --- >> #include <stdio.h> >> #include <stdbool.h> >> #include <stdint.h> >> #include <inttypes.h> >> >> volatile uint64_t a = 0x8000000000000000ull; >> >> void main (void) { >> printf("uint64_t a = a: 0x%" PRIx64 "\n", a); >> >> int i1 = a; >> printf("int i1 = a: %d\n", i1); >> >> int i2 = !!a; >> printf("int i2 = !!a: %d\n", i2); >> >> bool b1 = a; >> printf("bool b1 = a: %d\n", b1); >> >> bool b2 = !!a; >> printf("bool b2 = !!a: %d\n", b2); >> } >> --- >> $ ./test >> uint64_t a = a: 0x8000000000000000 >> int i1 = a: 0 >> int i2 = !!a: 1 >> bool b1 = a: 1 >> bool b2 = !!a: 1 >> --- >> >> Note that if bool would be defined as "int", you would need the !!, otherwise you >> would lose information. > > Agreed. We need to be careful in this case. > >> >> But even for b1, the gcc generates now: >> >> 40118c: 48 8b 05 7d 2e 00 00 mov 0x2e7d(%rip),%rax # 404010 <a> >> 401193: 48 85 c0 test %rax,%rax >> 401196: 0f 95 c0 setne %al >> >> >> My stdbool.h contains >> >> #define bool _Bool >> >> And I think C99 added _Bool that makes that work. >> >> But I didn't read the standard, and it's time for the weekend :) > > I just read the C99 and found some interesting information as follows: > > 6.3.1.2 Boolean type > When any *scalar value* is converted to _Bool, the result is 0 if the > value compares equal to 0; otherwise, the result is 1. > > 6.2.5 Types > 21. Arithmetic types and pointer types are collectively called *scalar > types*. Array and structure types are collectively called aggregate types. > > 6.5.13 Logical AND operator > Semantics > The && operator shall yield 1 if both of its operands compare unequal to > 0; otherwise, it yields 0. The result has type int. > > 6.5.10 Bitwise AND operator > Constraints > Each of the operands shall have integer type. > Semantics > The result of the binary & operator is the bitwise AND of the operands > (that is, each bit in the result is set if and only if each of the > corresponding > bits in the converted operands is set). > > && would ensure that the result is either 0 or 1, as David said, so no worries. > My example was flawed: I wanted to express that "if any bit is set, the bool value will be 1. That works for "| vs ||" but not for "& vs &&", obviously :) > We defined partially_mapped as a bool(_Bool). IIUC, "partially_mapped > = int & int;" > would work correctly as well. However, "partially_mapped = long & > int;" might not. Implicit type conversion would convert "long & int" to "long & long" first, which should work just fine. I think really most concerns regarding the bool type are due to < C99 not supporting _Bool. Great weekend everybody!
On Sat, Apr 27, 2024 at 3:20 AM Zi Yan <ziy@nvidia.com> wrote: > > On 26 Apr 2024, at 15:08, David Hildenbrand wrote: > > > On 26.04.24 21:02, Zi Yan wrote: > >> From: Zi Yan <ziy@nvidia.com> > >> > >> In __folio_remove_rmap(), a large folio is added to deferred split list > >> if any page in a folio loses its final mapping. But it is possible that > >> the folio is fully unmapped and adding it to deferred split list is > >> unnecessary. > >> > >> For PMD-mapped THPs, that was not really an issue, because removing the > >> last PMD mapping in the absence of PTE mappings would not have added the > >> folio to the deferred split queue. > >> > >> However, for PTE-mapped THPs, which are now more prominent due to mTHP, > >> they are always added to the deferred split queue. One side effect > >> is that the THP_DEFERRED_SPLIT_PAGE stat for a PTE-mapped folio can be > >> unintentionally increased, making it look like there are many partially > >> mapped folios -- although the whole folio is fully unmapped stepwise. > >> > >> Core-mm now tries batch-unmapping consecutive PTEs of PTE-mapped THPs > >> where possible starting from commit b06dc281aa99 ("mm/rmap: introduce > >> folio_remove_rmap_[pte|ptes|pmd]()"). When it happens, a whole PTE-mapped > >> folio is unmapped in one go and can avoid being added to deferred split > >> list, reducing the THP_DEFERRED_SPLIT_PAGE noise. But there will still be > >> noise when we cannot batch-unmap a complete PTE-mapped folio in one go > >> -- or where this type of batching is not implemented yet, e.g., migration. > >> > >> To avoid the unnecessary addition, folio->_nr_pages_mapped is checked > >> to tell if the whole folio is unmapped. If the folio is already on > >> deferred split list, it will be skipped, too. > >> > >> Note: commit 98046944a159 ("mm: huge_memory: add the missing > >> folio_test_pmd_mappable() for THP split statistics") tried to exclude > >> mTHP deferred split stats from THP_DEFERRED_SPLIT_PAGE, but it does not > >> fix the above issue. A fully unmapped PTE-mapped order-9 THP was still > >> added to deferred split list and counted as THP_DEFERRED_SPLIT_PAGE, > >> since nr is 512 (non zero), level is RMAP_LEVEL_PTE, and inside > >> deferred_split_folio() the order-9 folio is folio_test_pmd_mappable(). > >> > >> Suggested-by: David Hildenbrand <david@redhat.com> > >> Signed-off-by: Zi Yan <ziy@nvidia.com> > >> --- > >> mm/rmap.c | 12 +++++++++--- > >> 1 file changed, 9 insertions(+), 3 deletions(-) > >> > >> diff --git a/mm/rmap.c b/mm/rmap.c > >> index 2608c40dffad..a9bd64ebdd9a 100644 > >> --- a/mm/rmap.c > >> +++ b/mm/rmap.c > >> @@ -1495,6 +1495,7 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, > >> { > >> atomic_t *mapped = &folio->_nr_pages_mapped; > >> int last, nr = 0, nr_pmdmapped = 0; > >> + bool partially_mapped = false; > >> enum node_stat_item idx; > >> __folio_rmap_sanity_checks(folio, page, nr_pages, level); > >> @@ -1515,6 +1516,8 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, > >> nr++; > >> } > >> } while (page++, --nr_pages > 0); > >> + > >> + partially_mapped = !!nr && !!atomic_read(mapped); > > > > Nit: The && should remove the need for both !!. > > My impression was that !! is needed to convert from int to bool and I do > find "!!int && !!int" use in the kernel. If this is unnecessary, Andrew > can apply the fixup below. I can send a new version if it is really needed. > > diff --git a/mm/rmap.c b/mm/rmap.c > index a9bd64ebdd9a..c1fd5828409b 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1517,7 +1517,7 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, > } > } while (page++, --nr_pages > 0); > > - partially_mapped = !!nr && !!atomic_read(mapped); > + partially_mapped = nr && atomic_read(mapped Reviewed-by:Barry Song <baohua@kernel.org> > break; > case RMAP_LEVEL_PMD: > atomic_dec(&folio->_large_mapcount); > > > > > Reviewed-by: David Hildenbrand <david@redhat.com> > > Thanks. > > -- > Best Regards, > Yan, Zi
On Fri, Apr 26, 2024 at 03:02:53PM -0400, Zi Yan wrote: Hi Zi, It increasingly looks like this commit is crashing on s390 since 2024-04-30 in linux-next. If I do not miss something - since it was included in mm-everything. > @@ -1553,9 +1558,10 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, > * page of the folio is unmapped and at least one page > * is still mapped. > */ > - if (folio_test_large(folio) && folio_test_anon(folio)) > - if (level == RMAP_LEVEL_PTE || nr < nr_pmdmapped) > - deferred_split_folio(folio); > + if (folio_test_anon(folio) && > + list_empty(&folio->_deferred_list) && An attempt to reference folio->_deferred_list causes the crash below. > + partially_mapped) > + deferred_split_folio(folio); > } > > /* [ 507.227423] Unable to handle kernel pointer dereference in virtual kernel address space [ 507.227432] Failing address: 000001d689000000 TEID: 000001d689000803 [ 507.227435] Fault in home space mode while using kernel ASCE. [ 507.227439] AS:0000000180788007 R3:00000001fe2cc007 S:0000000000000020 [ 507.227492] Oops: 0010 ilc:3 [#1] SMP [ 507.227497] Modules linked in: vmur(E) kvm(E) algif_hash(E) af_alg(E) binfmt_misc(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) nf_tables(E) nfnetlink(E) dm_service_time(E) s390_trng(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) vfio(E) sch_fq_codel(E) loop(E) configfs(E) lcs(E) ctcm(E) fsm(E) zfcp(E) scsi_transport_fc(E) ghash_s390(E) prng(E) chacha_s390(E) libchacha(E) aes_s390(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha512_s390(E) sha256_s390(E) sha1_s390(E) sha_common(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) pkey(E) zcrypt(E) rng_core(E) dm_multipath(E) autofs4(E) [ 507.227546] Unloaded tainted modules: dcssblk(E):2 [last unloaded: dcssblk(E)] [ 507.230569] CPU: 0 PID: 36783 Comm: pahole Tainted: G E 6.9.0-20240430.rc6.git237.d04466706db5.300.fc39.s390x+next #1 [ 507.230574] Hardware name: IBM 3931 A01 703 (z/VM 7.3.0) [ 507.230576] Krnl PSW : 0704f00180000000 0000025e1092a430 (folio_remove_rmap_ptes+0xe0/0x140) [ 507.230588] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3 [ 507.230592] Krnl GPRS: ffffffffffffe377 0000000000000000 0000025e122075b8 0000000000000000 [ 507.230595] ffffffffffffffff 0000025d8f613288 8800000000000000 00000157a38b8700 [ 507.230598] 000000023fffe13f 0000000000000000 000001579ccd75c0 000001d688ffff80 [ 507.230602] 000003ffb9cacf98 000001d688ffff80 0000025e1092a428 000001de11fab878 [ 507.230610] Krnl Code: 0000025e1092a422: c0e500039f47 brasl %r14,0000025e1099e2b0 [ 507.230610] 0000025e1092a428: 9101b01f tm 31(%r11),1 [ 507.230610] #0000025e1092a42c: a784ffb9 brc 8,0000025e1092a39e [ 507.230610] >0000025e1092a430: e340b0900004 lg %r4,144(%r11) [ 507.230610] 0000025e1092a436: 4150b090 la %r5,144(%r11) [ 507.230610] 0000025e1092a43a: ec45ffb26064 cgrj %r4,%r5,6,0000025e1092a39e [ 507.230610] 0000025e1092a440: a7910001 tmll %r9,1 [ 507.230610] 0000025e1092a444: a784ffad brc 8,0000025e1092a39e [ 507.230672] Call Trace: [ 507.230678] [<0000025e1092a430>] folio_remove_rmap_ptes+0xe0/0x140 [ 507.230682] ([<0000025e1092a428>] folio_remove_rmap_ptes+0xd8/0x140) [ 507.230685] [<0000025e1090d76a>] zap_present_ptes.isra.0+0x222/0x918 [ 507.230689] [<0000025e1090e008>] zap_pte_range+0x1a8/0x4e8 [ 507.230692] [<0000025e1090e58c>] zap_p4d_range+0x244/0x480 [ 507.230695] [<0000025e1090eb22>] unmap_page_range+0xea/0x2c0 [ 507.230698] [<0000025e1090ed92>] unmap_single_vma.isra.0+0x9a/0xf0 [ 507.230701] [<0000025e1090ee9e>] unmap_vmas+0xb6/0x1a0 [ 507.230705] [<0000025e1091e0d4>] exit_mmap+0xc4/0x3d0 [ 507.230709] [<0000025e10675c64>] __mmput+0x54/0x150 [ 507.230714] [<0000025e1067f3ba>] exit_mm+0xca/0x138 [ 507.230717] [<0000025e1067f690>] do_exit+0x268/0x520 [ 507.230721] [<0000025e1067fb38>] do_group_exit+0x40/0xb8 [ 507.230725] [<0000025e1067fc0e>] __s390x_sys_exit_group+0x2e/0x30 [ 507.230729] [<0000025e1136ba4e>] __do_syscall+0x216/0x2d0 [ 507.230736] [<0000025e1137c848>] system_call+0x70/0x98 [ 507.230780] Last Breaking-Event-Address: [ 507.230783] [<0000025e1099e32a>] __lruvec_stat_mod_folio+0x7a/0xb0 [ 507.230789] Kernel panic - not syncing: Fatal exception: panic_on_oops 00: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 0000025E 10630B56 Thanks!
On 1 May 2024, at 9:24, Alexander Gordeev wrote: > On Fri, Apr 26, 2024 at 03:02:53PM -0400, Zi Yan wrote: > > Hi Zi, > > It increasingly looks like this commit is crashing on s390 since > 2024-04-30 in linux-next. If I do not miss something - since it > was included in mm-everything. > >> @@ -1553,9 +1558,10 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, >> * page of the folio is unmapped and at least one page >> * is still mapped. >> */ >> - if (folio_test_large(folio) && folio_test_anon(folio)) >> - if (level == RMAP_LEVEL_PTE || nr < nr_pmdmapped) >> - deferred_split_folio(folio); >> + if (folio_test_anon(folio) && >> + list_empty(&folio->_deferred_list) && > > An attempt to reference folio->_deferred_list causes the crash below. So if you remove this line, the crash no longer happens? It looks strange to me that referencing a anonymous folio's _deferred_list would cause a crash. Hmm, unless the folio is order-0. Can you try the patch below and see if it fixes the crash? It moves partially_mapped ahead to exclude order-0 folios. diff --git a/mm/rmap.c b/mm/rmap.c index 087a79f1f611..2d27c92bb6d5 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1557,9 +1557,8 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, * page of the folio is unmapped and at least one page * is still mapped. */ - if (folio_test_anon(folio) && - list_empty(&folio->_deferred_list) && - partially_mapped) + if (folio_test_anon(folio) && partially_mapped && + list_empty(&folio->_deferred_list)) deferred_split_folio(folio); } > >> + partially_mapped) >> + deferred_split_folio(folio); >> } >> >> /* > > [ 507.227423] Unable to handle kernel pointer dereference in virtual kernel address space > [ 507.227432] Failing address: 000001d689000000 TEID: 000001d689000803 > [ 507.227435] Fault in home space mode while using kernel ASCE. > [ 507.227439] AS:0000000180788007 R3:00000001fe2cc007 S:0000000000000020 > [ 507.227492] Oops: 0010 ilc:3 [#1] SMP > [ 507.227497] Modules linked in: vmur(E) kvm(E) algif_hash(E) af_alg(E) binfmt_misc(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) nf_tables(E) nfnetlink(E) dm_service_time(E) s390_trng(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) vfio(E) sch_fq_codel(E) loop(E) configfs(E) lcs(E) ctcm(E) fsm(E) zfcp(E) scsi_transport_fc(E) ghash_s390(E) prng(E) chacha_s390(E) libchacha(E) aes_s390(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha512_s390(E) sha256_s390(E) sha1_s390(E) sha_common(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) pkey(E) zcrypt(E) rng_core(E) dm_multipath(E) autofs4(E) > [ 507.227546] Unloaded tainted modules: dcssblk(E):2 [last unloaded: dcssblk(E)] > [ 507.230569] CPU: 0 PID: 36783 Comm: pahole Tainted: G E 6.9.0-20240430.rc6.git237.d04466706db5.300.fc39.s390x+next #1 > [ 507.230574] Hardware name: IBM 3931 A01 703 (z/VM 7.3.0) > [ 507.230576] Krnl PSW : 0704f00180000000 0000025e1092a430 (folio_remove_rmap_ptes+0xe0/0x140) > [ 507.230588] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3 > [ 507.230592] Krnl GPRS: ffffffffffffe377 0000000000000000 0000025e122075b8 0000000000000000 > [ 507.230595] ffffffffffffffff 0000025d8f613288 8800000000000000 00000157a38b8700 > [ 507.230598] 000000023fffe13f 0000000000000000 000001579ccd75c0 000001d688ffff80 > [ 507.230602] 000003ffb9cacf98 000001d688ffff80 0000025e1092a428 000001de11fab878 > [ 507.230610] Krnl Code: 0000025e1092a422: c0e500039f47 brasl %r14,0000025e1099e2b0 > [ 507.230610] 0000025e1092a428: 9101b01f tm 31(%r11),1 > [ 507.230610] #0000025e1092a42c: a784ffb9 brc 8,0000025e1092a39e > [ 507.230610] >0000025e1092a430: e340b0900004 lg %r4,144(%r11) > [ 507.230610] 0000025e1092a436: 4150b090 la %r5,144(%r11) > [ 507.230610] 0000025e1092a43a: ec45ffb26064 cgrj %r4,%r5,6,0000025e1092a39e > [ 507.230610] 0000025e1092a440: a7910001 tmll %r9,1 > [ 507.230610] 0000025e1092a444: a784ffad brc 8,0000025e1092a39e > [ 507.230672] Call Trace: > [ 507.230678] [<0000025e1092a430>] folio_remove_rmap_ptes+0xe0/0x140 > [ 507.230682] ([<0000025e1092a428>] folio_remove_rmap_ptes+0xd8/0x140) > [ 507.230685] [<0000025e1090d76a>] zap_present_ptes.isra.0+0x222/0x918 > [ 507.230689] [<0000025e1090e008>] zap_pte_range+0x1a8/0x4e8 > [ 507.230692] [<0000025e1090e58c>] zap_p4d_range+0x244/0x480 > [ 507.230695] [<0000025e1090eb22>] unmap_page_range+0xea/0x2c0 > [ 507.230698] [<0000025e1090ed92>] unmap_single_vma.isra.0+0x9a/0xf0 > [ 507.230701] [<0000025e1090ee9e>] unmap_vmas+0xb6/0x1a0 > [ 507.230705] [<0000025e1091e0d4>] exit_mmap+0xc4/0x3d0 > [ 507.230709] [<0000025e10675c64>] __mmput+0x54/0x150 > [ 507.230714] [<0000025e1067f3ba>] exit_mm+0xca/0x138 > [ 507.230717] [<0000025e1067f690>] do_exit+0x268/0x520 > [ 507.230721] [<0000025e1067fb38>] do_group_exit+0x40/0xb8 > [ 507.230725] [<0000025e1067fc0e>] __s390x_sys_exit_group+0x2e/0x30 > [ 507.230729] [<0000025e1136ba4e>] __do_syscall+0x216/0x2d0 > [ 507.230736] [<0000025e1137c848>] system_call+0x70/0x98 > [ 507.230780] Last Breaking-Event-Address: > [ 507.230783] [<0000025e1099e32a>] __lruvec_stat_mod_folio+0x7a/0xb0 > [ 507.230789] Kernel panic - not syncing: Fatal exception: panic_on_oops > 00: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 0000025E 10630B56 > > Thanks! -- Best Regards, Yan, Zi
On 01.05.24 15:38, Zi Yan wrote: > On 1 May 2024, at 9:24, Alexander Gordeev wrote: > >> On Fri, Apr 26, 2024 at 03:02:53PM -0400, Zi Yan wrote: >> >> Hi Zi, >> >> It increasingly looks like this commit is crashing on s390 since >> 2024-04-30 in linux-next. If I do not miss something - since it >> was included in mm-everything. >> >>> @@ -1553,9 +1558,10 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, >>> * page of the folio is unmapped and at least one page >>> * is still mapped. >>> */ >>> - if (folio_test_large(folio) && folio_test_anon(folio)) >>> - if (level == RMAP_LEVEL_PTE || nr < nr_pmdmapped) >>> - deferred_split_folio(folio); >>> + if (folio_test_anon(folio) && >>> + list_empty(&folio->_deferred_list) && >> >> An attempt to reference folio->_deferred_list causes the crash below. > > So if you remove this line, the crash no longer happens? It looks strange to > me that referencing a anonymous folio's _deferred_list would cause a crash. > Hmm, unless the folio is order-0. > > Can you try the patch below and see if it fixes the crash? It moves partially_mapped > ahead to exclude order-0 folios. > > diff --git a/mm/rmap.c b/mm/rmap.c > index 087a79f1f611..2d27c92bb6d5 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1557,9 +1557,8 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, > * page of the folio is unmapped and at least one page > * is still mapped. > */ > - if (folio_test_anon(folio) && > - list_empty(&folio->_deferred_list) && > - partially_mapped) > + if (folio_test_anon(folio) && partially_mapped && > + list_empty(&folio->_deferred_list)) > deferred_split_folio(folio); Yes, that should fix it and is the right thing to do. For small folios, partially_mapped will always be false.
On Wed, May 01, 2024 at 09:38:24AM -0400, Zi Yan wrote: Hi Zi, > @@ -1557,9 +1557,8 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, > * page of the folio is unmapped and at least one page > * is still mapped. > */ > - if (folio_test_anon(folio) && > - list_empty(&folio->_deferred_list) && > - partially_mapped) > + if (folio_test_anon(folio) && partially_mapped && > + list_empty(&folio->_deferred_list)) > deferred_split_folio(folio); > } That helps. > Best Regards, > Yan, Zi Thanks!
On 2 May 2024, at 9:18, Alexander Gordeev wrote: > On Wed, May 01, 2024 at 09:38:24AM -0400, Zi Yan wrote: > Hi Zi, >> @@ -1557,9 +1557,8 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, >> * page of the folio is unmapped and at least one page >> * is still mapped. >> */ >> - if (folio_test_anon(folio) && >> - list_empty(&folio->_deferred_list) && >> - partially_mapped) >> + if (folio_test_anon(folio) && partially_mapped && >> + list_empty(&folio->_deferred_list)) >> deferred_split_folio(folio); >> } > > That helps. > >> Best Regards, >> Yan, Zi > > Thanks! Great! I will send a v6. -- Best Regards, Yan, Zi
diff --git a/mm/rmap.c b/mm/rmap.c index 2608c40dffad..a9bd64ebdd9a 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1495,6 +1495,7 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, { atomic_t *mapped = &folio->_nr_pages_mapped; int last, nr = 0, nr_pmdmapped = 0; + bool partially_mapped = false; enum node_stat_item idx; __folio_rmap_sanity_checks(folio, page, nr_pages, level); @@ -1515,6 +1516,8 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, nr++; } } while (page++, --nr_pages > 0); + + partially_mapped = !!nr && !!atomic_read(mapped); break; case RMAP_LEVEL_PMD: atomic_dec(&folio->_large_mapcount); @@ -1532,6 +1535,8 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, nr = 0; } } + + partially_mapped = nr < nr_pmdmapped; break; } @@ -1553,9 +1558,10 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, * page of the folio is unmapped and at least one page * is still mapped. */ - if (folio_test_large(folio) && folio_test_anon(folio)) - if (level == RMAP_LEVEL_PTE || nr < nr_pmdmapped) - deferred_split_folio(folio); + if (folio_test_anon(folio) && + list_empty(&folio->_deferred_list) && + partially_mapped) + deferred_split_folio(folio); } /*