Message ID | 20240311150058.1122862-6-ryan.roberts@arm.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Swap-out mTHP without splitting | expand |
On Mon, Mar 11, 2024 at 11:01 PM Ryan Roberts <ryan.roberts@arm.com> wrote: > > Now that swap supports storing all mTHP sizes, avoid splitting large > folios before swap-out. This benefits performance of the swap-out path > by eliding split_folio_to_list(), which is expensive, and also sets us > up for swapping in large folios in a future series. > > If the folio is partially mapped, we continue to split it since we want > to avoid the extra IO overhead and storage of writing out pages > uneccessarily. > > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > --- > mm/vmscan.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index cf7d4cf47f1a..0ebec99e04c6 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, > if (!can_split_folio(folio, NULL)) > goto activate_locked; > /* > - * Split folios without a PMD map right > - * away. Chances are some or all of the > - * tail pages can be freed without IO. > + * Split partially mapped folios map > + * right away. Chances are some or all > + * of the tail pages can be freed > + * without IO. > */ > - if (!folio_entire_mapcount(folio) && > + if (!list_empty(&folio->_deferred_list) && Hi Ryan, After reconsidering our previous discussion about PMD-mapped large folios, I've pondered the possibility of PMD-mapped Transparent Huge Pages (THPs) being mapped by multiple processes. In such a scenario, if one process decides to unmap a portion of the folio while others retain the entire mapping, it raises questions about how the system should handle this situation. Would the large folio be placed in a deferred list? If so, splitting it might not yield benefits, as neither I/O nor swap slots would increase in this case by not splitting it. Regarding PTE-mapped large folios, the absence of an indicator like "entire_map" makes it challenging to identify cases where the entire folio is mapped. Thus, splitting seems to be the only viable solution in such circumstances. > split_folio_to_list(folio, > folio_list)) > goto activate_locked; > -- > 2.25.1 Thanks Barry
On 11/03/2024 22:30, Barry Song wrote: > On Mon, Mar 11, 2024 at 11:01 PM Ryan Roberts <ryan.roberts@arm.com> wrote: >> >> Now that swap supports storing all mTHP sizes, avoid splitting large >> folios before swap-out. This benefits performance of the swap-out path >> by eliding split_folio_to_list(), which is expensive, and also sets us >> up for swapping in large folios in a future series. >> >> If the folio is partially mapped, we continue to split it since we want >> to avoid the extra IO overhead and storage of writing out pages >> uneccessarily. >> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >> --- >> mm/vmscan.c | 9 +++++---- >> 1 file changed, 5 insertions(+), 4 deletions(-) >> >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> index cf7d4cf47f1a..0ebec99e04c6 100644 >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, >> if (!can_split_folio(folio, NULL)) >> goto activate_locked; >> /* >> - * Split folios without a PMD map right >> - * away. Chances are some or all of the >> - * tail pages can be freed without IO. >> + * Split partially mapped folios map >> + * right away. Chances are some or all >> + * of the tail pages can be freed >> + * without IO. >> */ >> - if (!folio_entire_mapcount(folio) && >> + if (!list_empty(&folio->_deferred_list) && > > Hi Ryan, > After reconsidering our previous discussion about PMD-mapped large > folios, I've pondered > the possibility of PMD-mapped Transparent Huge Pages (THPs) being > mapped by multiple > processes. In such a scenario, if one process decides to unmap a > portion of the folio while > others retain the entire mapping, it raises questions about how the > system should handle > this situation. Would the large folio be placed in a deferred list? No - if the large folio is entirely mapped (via PMD), then the folio will not be put on the deferred split list in the first place. See __folio_remove_rmap(): last = (last < ENTIRELY_MAPPED); means that nr will never be incremented above 0. (_nr_pages_mapped is incremented by ENTIRELY_MAPPED for every PMD map). > If > so, splitting it might not > yield benefits, as neither I/O nor swap slots would increase in this > case by not splitting it. > > Regarding PTE-mapped large folios, the absence of an indicator like > "entire_map" makes it > challenging to identify cases where the entire folio is mapped. Thus, > splitting seems to be > the only viable solution in such circumstances. > >> split_folio_to_list(folio, >> folio_list)) >> goto activate_locked; >> -- >> 2.25.1 > > Thanks > Barry
On Tue, Mar 12, 2024 at 9:12 PM Ryan Roberts <ryan.roberts@arm.com> wrote: > > On 11/03/2024 22:30, Barry Song wrote: > > On Mon, Mar 11, 2024 at 11:01 PM Ryan Roberts <ryan.roberts@arm.com> wrote: > >> > >> Now that swap supports storing all mTHP sizes, avoid splitting large > >> folios before swap-out. This benefits performance of the swap-out path > >> by eliding split_folio_to_list(), which is expensive, and also sets us > >> up for swapping in large folios in a future series. > >> > >> If the folio is partially mapped, we continue to split it since we want > >> to avoid the extra IO overhead and storage of writing out pages > >> uneccessarily. > >> > >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > >> --- > >> mm/vmscan.c | 9 +++++---- > >> 1 file changed, 5 insertions(+), 4 deletions(-) > >> > >> diff --git a/mm/vmscan.c b/mm/vmscan.c > >> index cf7d4cf47f1a..0ebec99e04c6 100644 > >> --- a/mm/vmscan.c > >> +++ b/mm/vmscan.c > >> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, > >> if (!can_split_folio(folio, NULL)) > >> goto activate_locked; > >> /* > >> - * Split folios without a PMD map right > >> - * away. Chances are some or all of the > >> - * tail pages can be freed without IO. > >> + * Split partially mapped folios map > >> + * right away. Chances are some or all > >> + * of the tail pages can be freed > >> + * without IO. > >> */ > >> - if (!folio_entire_mapcount(folio) && > >> + if (!list_empty(&folio->_deferred_list) && > > > > Hi Ryan, > > After reconsidering our previous discussion about PMD-mapped large > > folios, I've pondered > > the possibility of PMD-mapped Transparent Huge Pages (THPs) being > > mapped by multiple > > processes. In such a scenario, if one process decides to unmap a > > portion of the folio while > > others retain the entire mapping, it raises questions about how the > > system should handle > > this situation. Would the large folio be placed in a deferred list? > > No - if the large folio is entirely mapped (via PMD), then the folio will not be > put on the deferred split list in the first place. See __folio_remove_rmap(): > > last = (last < ENTIRELY_MAPPED); > > means that nr will never be incremented above 0. (_nr_pages_mapped is > incremented by ENTIRELY_MAPPED for every PMD map). you are right, I missed this part, we are breaking early in RMAP_LEVEL_PTE. so we won't get to if (nr). Thanks for your clarification. now we get unified code for both pmd-mapped and pte-mapped large folios. feel free to add, Reviewed-by: Barry Song <v-songbaohua@oppo.com> > > > If > > so, splitting it might not > > yield benefits, as neither I/O nor swap slots would increase in this > > case by not splitting it. > > > > Regarding PTE-mapped large folios, the absence of an indicator like > > "entire_map" makes it > > challenging to identify cases where the entire folio is mapped. Thus, > > splitting seems to be > > the only viable solution in such circumstances. > > > >> split_folio_to_list(folio, > >> folio_list)) > >> goto activate_locked; > >> -- > >> 2.25.1 > Thanks Barry
On 11.03.24 16:00, Ryan Roberts wrote: > Now that swap supports storing all mTHP sizes, avoid splitting large > folios before swap-out. This benefits performance of the swap-out path > by eliding split_folio_to_list(), which is expensive, and also sets us > up for swapping in large folios in a future series. > > If the folio is partially mapped, we continue to split it since we want > to avoid the extra IO overhead and storage of writing out pages > uneccessarily. > > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > --- > mm/vmscan.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index cf7d4cf47f1a..0ebec99e04c6 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, > if (!can_split_folio(folio, NULL)) > goto activate_locked; > /* > - * Split folios without a PMD map right > - * away. Chances are some or all of the > - * tail pages can be freed without IO. > + * Split partially mapped folios map > + * right away. Chances are some or all > + * of the tail pages can be freed > + * without IO. > */ > - if (!folio_entire_mapcount(folio) && > + if (!list_empty(&folio->_deferred_list) && > split_folio_to_list(folio, > folio_list)) > goto activate_locked; Not sure if we might have to annotate that with data_race(). Reviewed-by: David Hildenbrand <david@redhat.com>
On 15/03/2024 10:43, David Hildenbrand wrote: > On 11.03.24 16:00, Ryan Roberts wrote: >> Now that swap supports storing all mTHP sizes, avoid splitting large >> folios before swap-out. This benefits performance of the swap-out path >> by eliding split_folio_to_list(), which is expensive, and also sets us >> up for swapping in large folios in a future series. >> >> If the folio is partially mapped, we continue to split it since we want >> to avoid the extra IO overhead and storage of writing out pages >> uneccessarily. >> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >> --- >> mm/vmscan.c | 9 +++++---- >> 1 file changed, 5 insertions(+), 4 deletions(-) >> >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> index cf7d4cf47f1a..0ebec99e04c6 100644 >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head >> *folio_list, >> if (!can_split_folio(folio, NULL)) >> goto activate_locked; >> /* >> - * Split folios without a PMD map right >> - * away. Chances are some or all of the >> - * tail pages can be freed without IO. >> + * Split partially mapped folios map >> + * right away. Chances are some or all >> + * of the tail pages can be freed >> + * without IO. >> */ >> - if (!folio_entire_mapcount(folio) && >> + if (!list_empty(&folio->_deferred_list) && >> split_folio_to_list(folio, >> folio_list)) >> goto activate_locked; > > Not sure if we might have to annotate that with data_race(). I asked that exact question to Matthew in another context bt didn't get a response. There are examples of checking if the deferred list is empty with and without data_race() in the code base. But list_empty() is implemented like this: static inline int list_empty(const struct list_head *head) { return READ_ONCE(head->next) == head; } So I assumed the READ_ONCE() makes everything safe without a lock? Perhaps not sufficient for KCSAN? > > Reviewed-by: David Hildenbrand <david@redhat.com> > Thanks!
On 15.03.24 11:49, Ryan Roberts wrote: > On 15/03/2024 10:43, David Hildenbrand wrote: >> On 11.03.24 16:00, Ryan Roberts wrote: >>> Now that swap supports storing all mTHP sizes, avoid splitting large >>> folios before swap-out. This benefits performance of the swap-out path >>> by eliding split_folio_to_list(), which is expensive, and also sets us >>> up for swapping in large folios in a future series. >>> >>> If the folio is partially mapped, we continue to split it since we want >>> to avoid the extra IO overhead and storage of writing out pages >>> uneccessarily. >>> >>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>> --- >>> mm/vmscan.c | 9 +++++---- >>> 1 file changed, 5 insertions(+), 4 deletions(-) >>> >>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>> index cf7d4cf47f1a..0ebec99e04c6 100644 >>> --- a/mm/vmscan.c >>> +++ b/mm/vmscan.c >>> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head >>> *folio_list, >>> if (!can_split_folio(folio, NULL)) >>> goto activate_locked; >>> /* >>> - * Split folios without a PMD map right >>> - * away. Chances are some or all of the >>> - * tail pages can be freed without IO. >>> + * Split partially mapped folios map >>> + * right away. Chances are some or all >>> + * of the tail pages can be freed >>> + * without IO. >>> */ >>> - if (!folio_entire_mapcount(folio) && >>> + if (!list_empty(&folio->_deferred_list) && >>> split_folio_to_list(folio, >>> folio_list)) >>> goto activate_locked; >> >> Not sure if we might have to annotate that with data_race(). > > I asked that exact question to Matthew in another context bt didn't get a > response. There are examples of checking if the deferred list is empty with and > without data_race() in the code base. But list_empty() is implemented like this: > > static inline int list_empty(const struct list_head *head) > { > return READ_ONCE(head->next) == head; > } > > So I assumed the READ_ONCE() makes everything safe without a lock? Perhaps not > sufficient for KCSAN? Yeah, there is only one use of data_race with that list. It was added in f3ebdf042df4 ("THP: avoid lock when check whether THP is in deferred list"). Looks like that was added right in v1 of that change [1], so my best guess is that it is not actually required. If not required, likely we should just cleanup the single user. [1] https://lore.kernel.org/linux-mm/20230417075643.3287513-2-fengwei.yin@intel.com/
Hi Yin Fengwei, On 15/03/2024 11:12, David Hildenbrand wrote: > On 15.03.24 11:49, Ryan Roberts wrote: >> On 15/03/2024 10:43, David Hildenbrand wrote: >>> On 11.03.24 16:00, Ryan Roberts wrote: >>>> Now that swap supports storing all mTHP sizes, avoid splitting large >>>> folios before swap-out. This benefits performance of the swap-out path >>>> by eliding split_folio_to_list(), which is expensive, and also sets us >>>> up for swapping in large folios in a future series. >>>> >>>> If the folio is partially mapped, we continue to split it since we want >>>> to avoid the extra IO overhead and storage of writing out pages >>>> uneccessarily. >>>> >>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>>> --- >>>> mm/vmscan.c | 9 +++++---- >>>> 1 file changed, 5 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>>> index cf7d4cf47f1a..0ebec99e04c6 100644 >>>> --- a/mm/vmscan.c >>>> +++ b/mm/vmscan.c >>>> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head >>>> *folio_list, >>>> if (!can_split_folio(folio, NULL)) >>>> goto activate_locked; >>>> /* >>>> - * Split folios without a PMD map right >>>> - * away. Chances are some or all of the >>>> - * tail pages can be freed without IO. >>>> + * Split partially mapped folios map >>>> + * right away. Chances are some or all >>>> + * of the tail pages can be freed >>>> + * without IO. >>>> */ >>>> - if (!folio_entire_mapcount(folio) && >>>> + if (!list_empty(&folio->_deferred_list) && >>>> split_folio_to_list(folio, >>>> folio_list)) >>>> goto activate_locked; >>> >>> Not sure if we might have to annotate that with data_race(). >> >> I asked that exact question to Matthew in another context bt didn't get a >> response. There are examples of checking if the deferred list is empty with and >> without data_race() in the code base. But list_empty() is implemented like this: >> >> static inline int list_empty(const struct list_head *head) >> { >> return READ_ONCE(head->next) == head; >> } >> >> So I assumed the READ_ONCE() makes everything safe without a lock? Perhaps not >> sufficient for KCSAN? > > Yeah, there is only one use of data_race with that list. > > It was added in f3ebdf042df4 ("THP: avoid lock when check whether THP is in > deferred list"). > > Looks like that was added right in v1 of that change [1], so my best guess is > that it is not actually required. > > If not required, likely we should just cleanup the single user. > > [1] > https://lore.kernel.org/linux-mm/20230417075643.3287513-2-fengwei.yin@intel.com/ Do you have any recollection of why you added the data_race() markup? Thanks, Ryan >
Ryan Roberts <ryan.roberts@arm.com> writes: > Hi Yin Fengwei, > > On 15/03/2024 11:12, David Hildenbrand wrote: >> On 15.03.24 11:49, Ryan Roberts wrote: >>> On 15/03/2024 10:43, David Hildenbrand wrote: >>>> On 11.03.24 16:00, Ryan Roberts wrote: >>>>> Now that swap supports storing all mTHP sizes, avoid splitting large >>>>> folios before swap-out. This benefits performance of the swap-out path >>>>> by eliding split_folio_to_list(), which is expensive, and also sets us >>>>> up for swapping in large folios in a future series. >>>>> >>>>> If the folio is partially mapped, we continue to split it since we want >>>>> to avoid the extra IO overhead and storage of writing out pages >>>>> uneccessarily. >>>>> >>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>>>> --- >>>>> mm/vmscan.c | 9 +++++---- >>>>> 1 file changed, 5 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>>>> index cf7d4cf47f1a..0ebec99e04c6 100644 >>>>> --- a/mm/vmscan.c >>>>> +++ b/mm/vmscan.c >>>>> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head >>>>> *folio_list, >>>>> if (!can_split_folio(folio, NULL)) >>>>> goto activate_locked; >>>>> /* >>>>> - * Split folios without a PMD map right >>>>> - * away. Chances are some or all of the >>>>> - * tail pages can be freed without IO. >>>>> + * Split partially mapped folios map >>>>> + * right away. Chances are some or all >>>>> + * of the tail pages can be freed >>>>> + * without IO. >>>>> */ >>>>> - if (!folio_entire_mapcount(folio) && >>>>> + if (!list_empty(&folio->_deferred_list) && >>>>> split_folio_to_list(folio, >>>>> folio_list)) >>>>> goto activate_locked; >>>> >>>> Not sure if we might have to annotate that with data_race(). >>> >>> I asked that exact question to Matthew in another context bt didn't get a >>> response. There are examples of checking if the deferred list is empty with and >>> without data_race() in the code base. But list_empty() is implemented like this: >>> >>> static inline int list_empty(const struct list_head *head) >>> { >>> return READ_ONCE(head->next) == head; >>> } >>> >>> So I assumed the READ_ONCE() makes everything safe without a lock? Perhaps not >>> sufficient for KCSAN? >> >> Yeah, there is only one use of data_race with that list. >> >> It was added in f3ebdf042df4 ("THP: avoid lock when check whether THP is in >> deferred list"). >> >> Looks like that was added right in v1 of that change [1], so my best guess is >> that it is not actually required. >> >> If not required, likely we should just cleanup the single user. >> >> [1] >> https://lore.kernel.org/linux-mm/20230417075643.3287513-2-fengwei.yin@intel.com/ > > Do you have any recollection of why you added the data_race() markup? Per my understanding, this is used to mark that the code accesses folio->_deferred_list without lock intentionally, while folio->_deferred_list may be changed in parallel. IIUC, this is what data_race() is used for. Or, my understanding is wrong? -- Best Regards, Huang, Ying
On 3/18/2024 10:16 AM, Huang, Ying wrote: > Ryan Roberts <ryan.roberts@arm.com> writes: > >> Hi Yin Fengwei, >> >> On 15/03/2024 11:12, David Hildenbrand wrote: >>> On 15.03.24 11:49, Ryan Roberts wrote: >>>> On 15/03/2024 10:43, David Hildenbrand wrote: >>>>> On 11.03.24 16:00, Ryan Roberts wrote: >>>>>> Now that swap supports storing all mTHP sizes, avoid splitting large >>>>>> folios before swap-out. This benefits performance of the swap-out path >>>>>> by eliding split_folio_to_list(), which is expensive, and also sets us >>>>>> up for swapping in large folios in a future series. >>>>>> >>>>>> If the folio is partially mapped, we continue to split it since we want >>>>>> to avoid the extra IO overhead and storage of writing out pages >>>>>> uneccessarily. >>>>>> >>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>>>>> --- >>>>>> mm/vmscan.c | 9 +++++---- >>>>>> 1 file changed, 5 insertions(+), 4 deletions(-) >>>>>> >>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>>>>> index cf7d4cf47f1a..0ebec99e04c6 100644 >>>>>> --- a/mm/vmscan.c >>>>>> +++ b/mm/vmscan.c >>>>>> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head >>>>>> *folio_list, >>>>>> if (!can_split_folio(folio, NULL)) >>>>>> goto activate_locked; >>>>>> /* >>>>>> - * Split folios without a PMD map right >>>>>> - * away. Chances are some or all of the >>>>>> - * tail pages can be freed without IO. >>>>>> + * Split partially mapped folios map >>>>>> + * right away. Chances are some or all >>>>>> + * of the tail pages can be freed >>>>>> + * without IO. >>>>>> */ >>>>>> - if (!folio_entire_mapcount(folio) && >>>>>> + if (!list_empty(&folio->_deferred_list) && >>>>>> split_folio_to_list(folio, >>>>>> folio_list)) >>>>>> goto activate_locked; >>>>> >>>>> Not sure if we might have to annotate that with data_race(). >>>> >>>> I asked that exact question to Matthew in another context bt didn't get a >>>> response. There are examples of checking if the deferred list is empty with and >>>> without data_race() in the code base. But list_empty() is implemented like this: >>>> >>>> static inline int list_empty(const struct list_head *head) >>>> { >>>> return READ_ONCE(head->next) == head; >>>> } >>>> >>>> So I assumed the READ_ONCE() makes everything safe without a lock? Perhaps not >>>> sufficient for KCSAN? I don't think READ_ONCE() can replace the lock. >>> >>> Yeah, there is only one use of data_race with that list. >>> >>> It was added in f3ebdf042df4 ("THP: avoid lock when check whether THP is in >>> deferred list"). >>> >>> Looks like that was added right in v1 of that change [1], so my best guess is >>> that it is not actually required. >>> >>> If not required, likely we should just cleanup the single user. >>> >>> [1] >>> https://lore.kernel.org/linux-mm/20230417075643.3287513-2-fengwei.yin@intel.com/ >> >> Do you have any recollection of why you added the data_race() markup? > > Per my understanding, this is used to mark that the code accesses > folio->_deferred_list without lock intentionally, while > folio->_deferred_list may be changed in parallel. IIUC, this is what > data_race() is used for. Or, my understanding is wrong? Yes. This is my understanding also. Regards Yin, Fengwei > > -- > Best Regards, > Huang, Ying
On 18.03.24 11:00, Yin, Fengwei wrote: > > > On 3/18/2024 10:16 AM, Huang, Ying wrote: >> Ryan Roberts <ryan.roberts@arm.com> writes: >> >>> Hi Yin Fengwei, >>> >>> On 15/03/2024 11:12, David Hildenbrand wrote: >>>> On 15.03.24 11:49, Ryan Roberts wrote: >>>>> On 15/03/2024 10:43, David Hildenbrand wrote: >>>>>> On 11.03.24 16:00, Ryan Roberts wrote: >>>>>>> Now that swap supports storing all mTHP sizes, avoid splitting large >>>>>>> folios before swap-out. This benefits performance of the swap-out path >>>>>>> by eliding split_folio_to_list(), which is expensive, and also sets us >>>>>>> up for swapping in large folios in a future series. >>>>>>> >>>>>>> If the folio is partially mapped, we continue to split it since we want >>>>>>> to avoid the extra IO overhead and storage of writing out pages >>>>>>> uneccessarily. >>>>>>> >>>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>>>>>> --- >>>>>>> mm/vmscan.c | 9 +++++---- >>>>>>> 1 file changed, 5 insertions(+), 4 deletions(-) >>>>>>> >>>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>>>>>> index cf7d4cf47f1a..0ebec99e04c6 100644 >>>>>>> --- a/mm/vmscan.c >>>>>>> +++ b/mm/vmscan.c >>>>>>> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head >>>>>>> *folio_list, >>>>>>> if (!can_split_folio(folio, NULL)) >>>>>>> goto activate_locked; >>>>>>> /* >>>>>>> - * Split folios without a PMD map right >>>>>>> - * away. Chances are some or all of the >>>>>>> - * tail pages can be freed without IO. >>>>>>> + * Split partially mapped folios map >>>>>>> + * right away. Chances are some or all >>>>>>> + * of the tail pages can be freed >>>>>>> + * without IO. >>>>>>> */ >>>>>>> - if (!folio_entire_mapcount(folio) && >>>>>>> + if (!list_empty(&folio->_deferred_list) && >>>>>>> split_folio_to_list(folio, >>>>>>> folio_list)) >>>>>>> goto activate_locked; >>>>>> >>>>>> Not sure if we might have to annotate that with data_race(). >>>>> >>>>> I asked that exact question to Matthew in another context bt didn't get a >>>>> response. There are examples of checking if the deferred list is empty with and >>>>> without data_race() in the code base. But list_empty() is implemented like this: >>>>> >>>>> static inline int list_empty(const struct list_head *head) >>>>> { >>>>> return READ_ONCE(head->next) == head; >>>>> } >>>>> >>>>> So I assumed the READ_ONCE() makes everything safe without a lock? Perhaps not >>>>> sufficient for KCSAN? > I don't think READ_ONCE() can replace the lock. > >>>> >>>> Yeah, there is only one use of data_race with that list. >>>> >>>> It was added in f3ebdf042df4 ("THP: avoid lock when check whether THP is in >>>> deferred list"). >>>> >>>> Looks like that was added right in v1 of that change [1], so my best guess is >>>> that it is not actually required. >>>> >>>> If not required, likely we should just cleanup the single user. >>>> >>>> [1] >>>> https://lore.kernel.org/linux-mm/20230417075643.3287513-2-fengwei.yin@intel.com/ >>> >>> Do you have any recollection of why you added the data_race() markup? >> >> Per my understanding, this is used to mark that the code accesses >> folio->_deferred_list without lock intentionally, while >> folio->_deferred_list may be changed in parallel. IIUC, this is what >> data_race() is used for. Or, my understanding is wrong? > Yes. This is my understanding also. Why don't we have a data_race() in deferred_split_folio() then, before taking the lock? It's used a bit inconsistently here.
On 18/03/2024 10:05, David Hildenbrand wrote: > On 18.03.24 11:00, Yin, Fengwei wrote: >> >> >> On 3/18/2024 10:16 AM, Huang, Ying wrote: >>> Ryan Roberts <ryan.roberts@arm.com> writes: >>> >>>> Hi Yin Fengwei, >>>> >>>> On 15/03/2024 11:12, David Hildenbrand wrote: >>>>> On 15.03.24 11:49, Ryan Roberts wrote: >>>>>> On 15/03/2024 10:43, David Hildenbrand wrote: >>>>>>> On 11.03.24 16:00, Ryan Roberts wrote: >>>>>>>> Now that swap supports storing all mTHP sizes, avoid splitting large >>>>>>>> folios before swap-out. This benefits performance of the swap-out path >>>>>>>> by eliding split_folio_to_list(), which is expensive, and also sets us >>>>>>>> up for swapping in large folios in a future series. >>>>>>>> >>>>>>>> If the folio is partially mapped, we continue to split it since we want >>>>>>>> to avoid the extra IO overhead and storage of writing out pages >>>>>>>> uneccessarily. >>>>>>>> >>>>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>>>>>>> --- >>>>>>>> mm/vmscan.c | 9 +++++---- >>>>>>>> 1 file changed, 5 insertions(+), 4 deletions(-) >>>>>>>> >>>>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>>>>>>> index cf7d4cf47f1a..0ebec99e04c6 100644 >>>>>>>> --- a/mm/vmscan.c >>>>>>>> +++ b/mm/vmscan.c >>>>>>>> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct >>>>>>>> list_head >>>>>>>> *folio_list, >>>>>>>> if (!can_split_folio(folio, NULL)) >>>>>>>> goto activate_locked; >>>>>>>> /* >>>>>>>> - * Split folios without a PMD map right >>>>>>>> - * away. Chances are some or all of the >>>>>>>> - * tail pages can be freed without IO. >>>>>>>> + * Split partially mapped folios map >>>>>>>> + * right away. Chances are some or all >>>>>>>> + * of the tail pages can be freed >>>>>>>> + * without IO. >>>>>>>> */ >>>>>>>> - if (!folio_entire_mapcount(folio) && >>>>>>>> + if (!list_empty(&folio->_deferred_list) && >>>>>>>> split_folio_to_list(folio, >>>>>>>> folio_list)) >>>>>>>> goto activate_locked; >>>>>>> >>>>>>> Not sure if we might have to annotate that with data_race(). >>>>>> >>>>>> I asked that exact question to Matthew in another context bt didn't get a >>>>>> response. There are examples of checking if the deferred list is empty >>>>>> with and >>>>>> without data_race() in the code base. But list_empty() is implemented like >>>>>> this: >>>>>> >>>>>> static inline int list_empty(const struct list_head *head) >>>>>> { >>>>>> return READ_ONCE(head->next) == head; >>>>>> } >>>>>> >>>>>> So I assumed the READ_ONCE() makes everything safe without a lock? Perhaps >>>>>> not >>>>>> sufficient for KCSAN? >> I don't think READ_ONCE() can replace the lock. But it doesn't ensure we get a consistent value and that the compiler orders the load correctly. There are lots of patterns in the kernel that use READ_ONCE() without a lock and they don't use data_race() - e.g. ptep_get_lockless(). It sounds like none of us really understand what data_race() is for, so I guess I'll just do a KCSAN build and invoke the code path to see if it complains. >> >>>>> >>>>> Yeah, there is only one use of data_race with that list. >>>>> >>>>> It was added in f3ebdf042df4 ("THP: avoid lock when check whether THP is in >>>>> deferred list"). >>>>> >>>>> Looks like that was added right in v1 of that change [1], so my best guess is >>>>> that it is not actually required. >>>>> >>>>> If not required, likely we should just cleanup the single user. >>>>> >>>>> [1] >>>>> https://lore.kernel.org/linux-mm/20230417075643.3287513-2-fengwei.yin@intel.com/ >>>> >>>> Do you have any recollection of why you added the data_race() markup? >>> >>> Per my understanding, this is used to mark that the code accesses >>> folio->_deferred_list without lock intentionally, while >>> folio->_deferred_list may be changed in parallel. IIUC, this is what >>> data_race() is used for. Or, my understanding is wrong? >> Yes. This is my understanding also. > > Why don't we have a data_race() in deferred_split_folio() then, before taking > the lock? > > It's used a bit inconsistently here. >
On 18/03/2024 15:35, Ryan Roberts wrote: > On 18/03/2024 10:05, David Hildenbrand wrote: >> On 18.03.24 11:00, Yin, Fengwei wrote: >>> >>> >>> On 3/18/2024 10:16 AM, Huang, Ying wrote: >>>> Ryan Roberts <ryan.roberts@arm.com> writes: >>>> >>>>> Hi Yin Fengwei, >>>>> >>>>> On 15/03/2024 11:12, David Hildenbrand wrote: >>>>>> On 15.03.24 11:49, Ryan Roberts wrote: >>>>>>> On 15/03/2024 10:43, David Hildenbrand wrote: >>>>>>>> On 11.03.24 16:00, Ryan Roberts wrote: >>>>>>>>> Now that swap supports storing all mTHP sizes, avoid splitting large >>>>>>>>> folios before swap-out. This benefits performance of the swap-out path >>>>>>>>> by eliding split_folio_to_list(), which is expensive, and also sets us >>>>>>>>> up for swapping in large folios in a future series. >>>>>>>>> >>>>>>>>> If the folio is partially mapped, we continue to split it since we want >>>>>>>>> to avoid the extra IO overhead and storage of writing out pages >>>>>>>>> uneccessarily. >>>>>>>>> >>>>>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>>>>>>>> --- >>>>>>>>> mm/vmscan.c | 9 +++++---- >>>>>>>>> 1 file changed, 5 insertions(+), 4 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>>>>>>>> index cf7d4cf47f1a..0ebec99e04c6 100644 >>>>>>>>> --- a/mm/vmscan.c >>>>>>>>> +++ b/mm/vmscan.c >>>>>>>>> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct >>>>>>>>> list_head >>>>>>>>> *folio_list, >>>>>>>>> if (!can_split_folio(folio, NULL)) >>>>>>>>> goto activate_locked; >>>>>>>>> /* >>>>>>>>> - * Split folios without a PMD map right >>>>>>>>> - * away. Chances are some or all of the >>>>>>>>> - * tail pages can be freed without IO. >>>>>>>>> + * Split partially mapped folios map >>>>>>>>> + * right away. Chances are some or all >>>>>>>>> + * of the tail pages can be freed >>>>>>>>> + * without IO. >>>>>>>>> */ >>>>>>>>> - if (!folio_entire_mapcount(folio) && >>>>>>>>> + if (!list_empty(&folio->_deferred_list) && >>>>>>>>> split_folio_to_list(folio, >>>>>>>>> folio_list)) >>>>>>>>> goto activate_locked; >>>>>>>> >>>>>>>> Not sure if we might have to annotate that with data_race(). >>>>>>> >>>>>>> I asked that exact question to Matthew in another context bt didn't get a >>>>>>> response. There are examples of checking if the deferred list is empty >>>>>>> with and >>>>>>> without data_race() in the code base. But list_empty() is implemented like >>>>>>> this: >>>>>>> >>>>>>> static inline int list_empty(const struct list_head *head) >>>>>>> { >>>>>>> return READ_ONCE(head->next) == head; >>>>>>> } >>>>>>> >>>>>>> So I assumed the READ_ONCE() makes everything safe without a lock? Perhaps >>>>>>> not >>>>>>> sufficient for KCSAN? >>> I don't think READ_ONCE() can replace the lock. > > But it doesn't ensure we get a consistent value and that the compiler orders the Sorry - fat fingers... I meant it *does* ensure we get a consistent value (i.e. untorn) > load correctly. There are lots of patterns in the kernel that use READ_ONCE() > without a lock and they don't use data_race() - e.g. ptep_get_lockless(). > > It sounds like none of us really understand what data_race() is for, so I guess > I'll just do a KCSAN build and invoke the code path to see if it complains. > > >>> >>>>>> >>>>>> Yeah, there is only one use of data_race with that list. >>>>>> >>>>>> It was added in f3ebdf042df4 ("THP: avoid lock when check whether THP is in >>>>>> deferred list"). >>>>>> >>>>>> Looks like that was added right in v1 of that change [1], so my best guess is >>>>>> that it is not actually required. >>>>>> >>>>>> If not required, likely we should just cleanup the single user. >>>>>> >>>>>> [1] >>>>>> https://lore.kernel.org/linux-mm/20230417075643.3287513-2-fengwei.yin@intel.com/ >>>>> >>>>> Do you have any recollection of why you added the data_race() markup? >>>> >>>> Per my understanding, this is used to mark that the code accesses >>>> folio->_deferred_list without lock intentionally, while >>>> folio->_deferred_list may be changed in parallel. IIUC, this is what >>>> data_race() is used for. Or, my understanding is wrong? >>> Yes. This is my understanding also. >> >> Why don't we have a data_race() in deferred_split_folio() then, before taking >> the lock? >> >> It's used a bit inconsistently here. >> >
On 3/18/24 23:35, Ryan Roberts wrote: > On 18/03/2024 10:05, David Hildenbrand wrote: >> On 18.03.24 11:00, Yin, Fengwei wrote: >>> >>> >>> On 3/18/2024 10:16 AM, Huang, Ying wrote: >>>> Ryan Roberts <ryan.roberts@arm.com> writes: >>>> >>>>> Hi Yin Fengwei, >>>>> >>>>> On 15/03/2024 11:12, David Hildenbrand wrote: >>>>>> On 15.03.24 11:49, Ryan Roberts wrote: >>>>>>> On 15/03/2024 10:43, David Hildenbrand wrote: >>>>>>>> On 11.03.24 16:00, Ryan Roberts wrote: >>>>>>>>> Now that swap supports storing all mTHP sizes, avoid splitting large >>>>>>>>> folios before swap-out. This benefits performance of the swap-out path >>>>>>>>> by eliding split_folio_to_list(), which is expensive, and also sets us >>>>>>>>> up for swapping in large folios in a future series. >>>>>>>>> >>>>>>>>> If the folio is partially mapped, we continue to split it since we want >>>>>>>>> to avoid the extra IO overhead and storage of writing out pages >>>>>>>>> uneccessarily. >>>>>>>>> >>>>>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>>>>>>>> --- >>>>>>>>> mm/vmscan.c | 9 +++++---- >>>>>>>>> 1 file changed, 5 insertions(+), 4 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>>>>>>>> index cf7d4cf47f1a..0ebec99e04c6 100644 >>>>>>>>> --- a/mm/vmscan.c >>>>>>>>> +++ b/mm/vmscan.c >>>>>>>>> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct >>>>>>>>> list_head >>>>>>>>> *folio_list, >>>>>>>>> if (!can_split_folio(folio, NULL)) >>>>>>>>> goto activate_locked; >>>>>>>>> /* >>>>>>>>> - * Split folios without a PMD map right >>>>>>>>> - * away. Chances are some or all of the >>>>>>>>> - * tail pages can be freed without IO. >>>>>>>>> + * Split partially mapped folios map >>>>>>>>> + * right away. Chances are some or all >>>>>>>>> + * of the tail pages can be freed >>>>>>>>> + * without IO. >>>>>>>>> */ >>>>>>>>> - if (!folio_entire_mapcount(folio) && >>>>>>>>> + if (!list_empty(&folio->_deferred_list) && >>>>>>>>> split_folio_to_list(folio, >>>>>>>>> folio_list)) >>>>>>>>> goto activate_locked; >>>>>>>> >>>>>>>> Not sure if we might have to annotate that with data_race(). >>>>>>> >>>>>>> I asked that exact question to Matthew in another context bt didn't get a >>>>>>> response. There are examples of checking if the deferred list is empty >>>>>>> with and >>>>>>> without data_race() in the code base. But list_empty() is implemented like >>>>>>> this: >>>>>>> >>>>>>> static inline int list_empty(const struct list_head *head) >>>>>>> { >>>>>>> return READ_ONCE(head->next) == head; >>>>>>> } >>>>>>> >>>>>>> So I assumed the READ_ONCE() makes everything safe without a lock? Perhaps >>>>>>> not >>>>>>> sufficient for KCSAN? >>> I don't think READ_ONCE() can replace the lock. > > But it doesn't ensure we get a consistent value and that the compiler orders the > load correctly. There are lots of patterns in the kernel that use READ_ONCE() > without a lock and they don't use data_race() - e.g. ptep_get_lockless(). They (ptep_get_lockless() and deferred_list) have different access pattern (or race pattern) here. I don't think they are comparable. > > It sounds like none of us really understand what data_race() is for, so I guess > I'll just do a KCSAN build and invoke the code path to see if it complains. READ_ONCE() in list_empty will shutdown the KCSAN also. > > >>> >>>>>> >>>>>> Yeah, there is only one use of data_race with that list. >>>>>> >>>>>> It was added in f3ebdf042df4 ("THP: avoid lock when check whether THP is in >>>>>> deferred list"). >>>>>> >>>>>> Looks like that was added right in v1 of that change [1], so my best guess is >>>>>> that it is not actually required. >>>>>> >>>>>> If not required, likely we should just cleanup the single user. >>>>>> >>>>>> [1] >>>>>> https://lore.kernel.org/linux-mm/20230417075643.3287513-2-fengwei.yin@intel.com/ >>>>> >>>>> Do you have any recollection of why you added the data_race() markup? >>>> >>>> Per my understanding, this is used to mark that the code accesses >>>> folio->_deferred_list without lock intentionally, while >>>> folio->_deferred_list may be changed in parallel. IIUC, this is what >>>> data_race() is used for. Or, my understanding is wrong? >>> Yes. This is my understanding also. >> >> Why don't we have a data_race() in deferred_split_folio() then, before taking >> the lock? >> >> It's used a bit inconsistently here. >> >
On 3/18/24 18:05, David Hildenbrand wrote: > On 18.03.24 11:00, Yin, Fengwei wrote: >> >> >> On 3/18/2024 10:16 AM, Huang, Ying wrote: >>> Ryan Roberts <ryan.roberts@arm.com> writes: >>> >>>> Hi Yin Fengwei, >>>> >>>> On 15/03/2024 11:12, David Hildenbrand wrote: >>>>> On 15.03.24 11:49, Ryan Roberts wrote: >>>>>> On 15/03/2024 10:43, David Hildenbrand wrote: >>>>>>> On 11.03.24 16:00, Ryan Roberts wrote: >>>>>>>> Now that swap supports storing all mTHP sizes, avoid splitting large >>>>>>>> folios before swap-out. This benefits performance of the swap-out path >>>>>>>> by eliding split_folio_to_list(), which is expensive, and also sets us >>>>>>>> up for swapping in large folios in a future series. >>>>>>>> >>>>>>>> If the folio is partially mapped, we continue to split it since we want >>>>>>>> to avoid the extra IO overhead and storage of writing out pages >>>>>>>> uneccessarily. >>>>>>>> >>>>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>>>>>>> --- >>>>>>>> mm/vmscan.c | 9 +++++---- >>>>>>>> 1 file changed, 5 insertions(+), 4 deletions(-) >>>>>>>> >>>>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>>>>>>> index cf7d4cf47f1a..0ebec99e04c6 100644 >>>>>>>> --- a/mm/vmscan.c >>>>>>>> +++ b/mm/vmscan.c >>>>>>>> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head >>>>>>>> *folio_list, >>>>>>>> if (!can_split_folio(folio, NULL)) >>>>>>>> goto activate_locked; >>>>>>>> /* >>>>>>>> - * Split folios without a PMD map right >>>>>>>> - * away. Chances are some or all of the >>>>>>>> - * tail pages can be freed without IO. >>>>>>>> + * Split partially mapped folios map >>>>>>>> + * right away. Chances are some or all >>>>>>>> + * of the tail pages can be freed >>>>>>>> + * without IO. >>>>>>>> */ >>>>>>>> - if (!folio_entire_mapcount(folio) && >>>>>>>> + if (!list_empty(&folio->_deferred_list) && >>>>>>>> split_folio_to_list(folio, >>>>>>>> folio_list)) >>>>>>>> goto activate_locked; >>>>>>> >>>>>>> Not sure if we might have to annotate that with data_race(). >>>>>> >>>>>> I asked that exact question to Matthew in another context bt didn't get a >>>>>> response. There are examples of checking if the deferred list is empty with and >>>>>> without data_race() in the code base. But list_empty() is implemented like this: >>>>>> >>>>>> static inline int list_empty(const struct list_head *head) >>>>>> { >>>>>> return READ_ONCE(head->next) == head; >>>>>> } >>>>>> >>>>>> So I assumed the READ_ONCE() makes everything safe without a lock? Perhaps not >>>>>> sufficient for KCSAN? >> I don't think READ_ONCE() can replace the lock. >> >>>>> >>>>> Yeah, there is only one use of data_race with that list. >>>>> >>>>> It was added in f3ebdf042df4 ("THP: avoid lock when check whether THP is in >>>>> deferred list"). >>>>> >>>>> Looks like that was added right in v1 of that change [1], so my best guess is >>>>> that it is not actually required. >>>>> >>>>> If not required, likely we should just cleanup the single user. >>>>> >>>>> [1] >>>>> https://lore.kernel.org/linux-mm/20230417075643.3287513-2-fengwei.yin@intel.com/ >>>> >>>> Do you have any recollection of why you added the data_race() markup? >>> >>> Per my understanding, this is used to mark that the code accesses >>> folio->_deferred_list without lock intentionally, while >>> folio->_deferred_list may be changed in parallel. IIUC, this is what >>> data_race() is used for. Or, my understanding is wrong? >> Yes. This is my understanding also. > > Why don't we have a data_race() in deferred_split_folio() then, before taking the lock? No idea why there is no data_race() added. But I think we should add data_race(). Regards Yin, Fengwei > > It's used a bit inconsistently here. >
On 19/03/2024 02:20, Yin Fengwei wrote: > > > On 3/18/24 23:35, Ryan Roberts wrote: >> On 18/03/2024 10:05, David Hildenbrand wrote: >>> On 18.03.24 11:00, Yin, Fengwei wrote: >>>> >>>> >>>> On 3/18/2024 10:16 AM, Huang, Ying wrote: >>>>> Ryan Roberts <ryan.roberts@arm.com> writes: >>>>> >>>>>> Hi Yin Fengwei, >>>>>> >>>>>> On 15/03/2024 11:12, David Hildenbrand wrote: >>>>>>> On 15.03.24 11:49, Ryan Roberts wrote: >>>>>>>> On 15/03/2024 10:43, David Hildenbrand wrote: >>>>>>>>> On 11.03.24 16:00, Ryan Roberts wrote: >>>>>>>>>> Now that swap supports storing all mTHP sizes, avoid splitting large >>>>>>>>>> folios before swap-out. This benefits performance of the swap-out path >>>>>>>>>> by eliding split_folio_to_list(), which is expensive, and also sets us >>>>>>>>>> up for swapping in large folios in a future series. >>>>>>>>>> >>>>>>>>>> If the folio is partially mapped, we continue to split it since we want >>>>>>>>>> to avoid the extra IO overhead and storage of writing out pages >>>>>>>>>> uneccessarily. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>>>>>>>>> --- >>>>>>>>>> mm/vmscan.c | 9 +++++---- >>>>>>>>>> 1 file changed, 5 insertions(+), 4 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>>>>>>>>> index cf7d4cf47f1a..0ebec99e04c6 100644 >>>>>>>>>> --- a/mm/vmscan.c >>>>>>>>>> +++ b/mm/vmscan.c >>>>>>>>>> @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct >>>>>>>>>> list_head >>>>>>>>>> *folio_list, >>>>>>>>>> if (!can_split_folio(folio, NULL)) >>>>>>>>>> goto activate_locked; >>>>>>>>>> /* >>>>>>>>>> - * Split folios without a PMD map right >>>>>>>>>> - * away. Chances are some or all of the >>>>>>>>>> - * tail pages can be freed without IO. >>>>>>>>>> + * Split partially mapped folios map >>>>>>>>>> + * right away. Chances are some or all >>>>>>>>>> + * of the tail pages can be freed >>>>>>>>>> + * without IO. >>>>>>>>>> */ >>>>>>>>>> - if (!folio_entire_mapcount(folio) && >>>>>>>>>> + if (!list_empty(&folio->_deferred_list) && >>>>>>>>>> split_folio_to_list(folio, >>>>>>>>>> folio_list)) >>>>>>>>>> goto activate_locked; >>>>>>>>> >>>>>>>>> Not sure if we might have to annotate that with data_race(). >>>>>>>> >>>>>>>> I asked that exact question to Matthew in another context bt didn't get a >>>>>>>> response. There are examples of checking if the deferred list is empty >>>>>>>> with and >>>>>>>> without data_race() in the code base. But list_empty() is implemented like >>>>>>>> this: >>>>>>>> >>>>>>>> static inline int list_empty(const struct list_head *head) >>>>>>>> { >>>>>>>> return READ_ONCE(head->next) == head; >>>>>>>> } >>>>>>>> >>>>>>>> So I assumed the READ_ONCE() makes everything safe without a lock? Perhaps >>>>>>>> not >>>>>>>> sufficient for KCSAN? >>>> I don't think READ_ONCE() can replace the lock. >> >> But it doesn't ensure we get a consistent value and that the compiler orders the >> load correctly. There are lots of patterns in the kernel that use READ_ONCE() >> without a lock and they don't use data_race() - e.g. ptep_get_lockless(). > They (ptep_get_lockless() and deferred_list) have different access pattern > (or race pattern) here. I don't think they are comparable. > >> >> It sounds like none of us really understand what data_race() is for, so I guess >> I'll just do a KCSAN build and invoke the code path to see if it complains. > READ_ONCE() in list_empty will shutdown the KCSAN also. OK, I found some time to run the test with KCSAN; nothing fires. But then I read the docs and looked at the code a bit. Documentation/dev-tools/kcsan.rst states: In an execution, two memory accesses form a *data race* if they *conflict*, they happen concurrently in different threads, and at least one of them is a *plain access*; they *conflict* if both access the same memory location, and at least one is a write. It also clarifies the READ_ONCE() is a "marked access". So we would have a data race if there was a concurrent, *plain* write to folio->_deferred_list.next. This can occur in a couple of places I believe, for example: deferred_split_folio() list_add_tail() __list_add() new->next = next; deferred_split_scan() list_move() list_add() __list_add() new->next = next; So if either partially deferred_split_folio() or deferred_split_scan() can run concurrently with shrink_folio_list(), for the same folio (I beleive both can can), then we have a race, and this list_empty() check needs to be protected with data_race(). The race is safe/by design, but it does need to be marked. I'll fix this in my next version. Thanks, Ryan > >> >> >>>> >>>>>>> >>>>>>> Yeah, there is only one use of data_race with that list. >>>>>>> >>>>>>> It was added in f3ebdf042df4 ("THP: avoid lock when check whether THP is in >>>>>>> deferred list"). >>>>>>> >>>>>>> Looks like that was added right in v1 of that change [1], so my best guess is >>>>>>> that it is not actually required. >>>>>>> >>>>>>> If not required, likely we should just cleanup the single user. >>>>>>> >>>>>>> [1] >>>>>>> https://lore.kernel.org/linux-mm/20230417075643.3287513-2-fengwei.yin@intel.com/ >>>>>> >>>>>> Do you have any recollection of why you added the data_race() markup? >>>>> >>>>> Per my understanding, this is used to mark that the code accesses >>>>> folio->_deferred_list without lock intentionally, while >>>>> folio->_deferred_list may be changed in parallel. IIUC, this is what >>>>> data_race() is used for. Or, my understanding is wrong? >>>> Yes. This is my understanding also. >>> >>> Why don't we have a data_race() in deferred_split_folio() then, before taking >>> the lock? >>> >>> It's used a bit inconsistently here. >>> >>
diff --git a/mm/vmscan.c b/mm/vmscan.c index cf7d4cf47f1a..0ebec99e04c6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1222,11 +1222,12 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, if (!can_split_folio(folio, NULL)) goto activate_locked; /* - * Split folios without a PMD map right - * away. Chances are some or all of the - * tail pages can be freed without IO. + * Split partially mapped folios map + * right away. Chances are some or all + * of the tail pages can be freed + * without IO. */ - if (!folio_entire_mapcount(folio) && + if (!list_empty(&folio->_deferred_list) && split_folio_to_list(folio, folio_list)) goto activate_locked;
Now that swap supports storing all mTHP sizes, avoid splitting large folios before swap-out. This benefits performance of the swap-out path by eliding split_folio_to_list(), which is expensive, and also sets us up for swapping in large folios in a future series. If the folio is partially mapped, we continue to split it since we want to avoid the extra IO overhead and storage of writing out pages uneccessarily. Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> --- mm/vmscan.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)