Message ID | 20220510211743.95831-1-minchan@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v4] mm: fix is_pinnable_page against on cma page | expand |
On 5/10/22 14:17, Minchan Kim wrote: > Pages on CMA area could have MIGRATE_ISOLATE as well as MIGRATE_CMA > so current is_pinnable_page could miss CMA pages which has MIGRATE_ > ISOLATE. It ends up pinning CMA pages as longterm at pin_user_pages > APIs so CMA allocation keep failed until the pin is released. > > > CPU 0 CPU 1 - Task B > > cma_alloc > alloc_contig_range > pin_user_pages_fast(FOLL_LONGTERM) > change pageblock as MIGRATE_ISOLATE > internal_get_user_pages_fast > lockless_pages_from_mm > gup_pte_range > try_grab_folio > is_pinnable_page > return true; > So, pinned the page successfully. > page migration failure with pinned page > .. > .. After 30 sec > unpin_user_page(page) > > CMA allocation succeeded after 30 sec. Hi Minchan, Thanks for spelling out how this works, that really speeds up the review and helps others quickly learn what is going on with the code. For my own information, mainly: where is CMA blocking, so that it waits (apparently) for the during of the pin, before proceeding? (Or is the caller retrying?) I noticed a few minor points but was too slow to reply, notes below: > > The CMA allocation path protects the migration type change race > using zone->lock but what GUP path need to know is just whether the > page is on CMA area or not rather than exact migration type. > Thus, we don't need zone->lock but just checks migration type in > either of (MIGRATE_ISOLATE and MIGRATE_CMA). > > Adding the MIGRATE_ISOLATE check in is_pinnable_page could cause > rejecting of pinning pages on MIGRATE_ISOLATE pageblocks even > though it's neither CMA nor movable zone if the page is temporarily > unmovable. However, such a migration failure by unexpected temporal > refcount holding is general issue, not only come from MIGRATE_ISOLATE > and the MIGRATE_ISOLATE is also transient state like other temporal > elevated refcount problem. > > Cc: David Hildenbrand <david@redhat.com> > Signed-off-by: Minchan Kim <minchan@kernel.org> > --- > * from v3 - https://lore.kernel.org/all/20220509153430.4125710-1-minchan@kernel.org/ > * Fix typo and adding more description - akpm > > * from v2 - https://lore.kernel.org/all/20220505064429.2818496-1-minchan@kernel.org/ > * Use __READ_ONCE instead of volatile - akpm > > * from v1 - https://lore.kernel.org/all/20220502173558.2510641-1-minchan@kernel.org/ > * fix build warning - lkp > * fix refetching issue of migration type > * add side effect on !ZONE_MOVABLE and !MIGRATE_CMA in description - david > > include/linux/mm.h | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 6acca5cecbc5..cbf79eb790e0 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1625,8 +1625,19 @@ static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma, > #ifdef CONFIG_MIGRATION > static inline bool is_pinnable_page(struct page *page) > { > - return !(is_zone_movable_page(page) || is_migrate_cma_page(page)) || > - is_zero_pfn(page_to_pfn(page)); > +#ifdef CONFIG_CMA > + /* > + * use volatile to use local variable mt instead of > + * refetching mt value. > + */ This comment is stale and should therefore be deleted. > + int __mt = get_pageblock_migratetype(page); > + int mt = __READ_ONCE(__mt); Although I saw the email discussion about this in v2, that discussion didn't go far enough. It started with "don't use volatile", and went on to "try __READ_ONCE() instead", but it should have continued on to "you don't need this at all". Because you don't. There is nothing you are racing with, and adding __READ_ONCE() in order to avoid a completely not-going-to-happen compiler re-invocation of a significant code block is just very wrong. So let's just let it go entirely. :) > + > + if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE) MIGRATE_ISOLATE is not always defined, and must therefore be protected with a check on CONFIG_MEMORY_ISOLATION...oh never mind, I see in mm/Kconfig: config CMA ... select MEMORY_ISOLATION ...so that's OK. What a tangled web, I wonder if enum migratetype really needs to be sliced up like that, but that's a separate topic. > + return false; > +#endif > + > + return !(is_zone_movable_page(page) || is_zero_pfn(page_to_pfn(page))); And actually this area is getting rather nested with the various ifdefs, and it is probably time to refactor them a bit, considering the above point about MIGRATE_ISOLATE. I had something in mind (which is why I delayed my feedback), along the lines of merging _ISOLATE and _CMA and the ifdefs. But it's just a fine point and not critical of course, just a thought. > } > #else > static inline bool is_pinnable_page(struct page *page) thanks,
Hi John, On Tue, May 10, 2022 at 03:56:37PM -0700, John Hubbard wrote: > On 5/10/22 14:17, Minchan Kim wrote: > > Pages on CMA area could have MIGRATE_ISOLATE as well as MIGRATE_CMA > > so current is_pinnable_page could miss CMA pages which has MIGRATE_ > > ISOLATE. It ends up pinning CMA pages as longterm at pin_user_pages > > APIs so CMA allocation keep failed until the pin is released. > > > > > > CPU 0 CPU 1 - Task B > > > > cma_alloc > > alloc_contig_range > > pin_user_pages_fast(FOLL_LONGTERM) > > change pageblock as MIGRATE_ISOLATE > > internal_get_user_pages_fast > > lockless_pages_from_mm > > gup_pte_range > > try_grab_folio > > is_pinnable_page > > return true; > > So, pinned the page successfully. > > page migration failure with pinned page > > .. > > .. After 30 sec > > unpin_user_page(page) > > > > CMA allocation succeeded after 30 sec. > > Hi Minchan, > > Thanks for spelling out how this works, that really speeds up the > review and helps others quickly learn what is going on with the code. > > For my own information, mainly: where is CMA blocking, so that > it waits (apparently) for the during of the pin, before proceeding? > (Or is the caller retrying?) It would fail the cma_alloc in the first place since it couldn't migrate page out due to the elevated refcount and cma_allc would proceed next pageblocks to keep pages migrated out but it ends up failing the cma allocation because the user tries to allocate entire CMA pageblocks, not part of them so one of the pinned page make the cma allocation failure. Since then, user(e.g., dmabuf) could retry a few more times but it keeps failed until Task B release the refcount of the page. > > I noticed a few minor points but was too slow to reply, notes below: > > > > > The CMA allocation path protects the migration type change race > > using zone->lock but what GUP path need to know is just whether the > > page is on CMA area or not rather than exact migration type. > > Thus, we don't need zone->lock but just checks migration type in > > either of (MIGRATE_ISOLATE and MIGRATE_CMA). > > > > Adding the MIGRATE_ISOLATE check in is_pinnable_page could cause > > rejecting of pinning pages on MIGRATE_ISOLATE pageblocks even > > though it's neither CMA nor movable zone if the page is temporarily > > unmovable. However, such a migration failure by unexpected temporal > > refcount holding is general issue, not only come from MIGRATE_ISOLATE > > and the MIGRATE_ISOLATE is also transient state like other temporal > > elevated refcount problem. > > > > Cc: David Hildenbrand <david@redhat.com> > > Signed-off-by: Minchan Kim <minchan@kernel.org> > > --- > > * from v3 - https://lore.kernel.org/all/20220509153430.4125710-1-minchan@kernel.org/ > > * Fix typo and adding more description - akpm > > > > * from v2 - https://lore.kernel.org/all/20220505064429.2818496-1-minchan@kernel.org/ > > * Use __READ_ONCE instead of volatile - akpm > > > > * from v1 - https://lore.kernel.org/all/20220502173558.2510641-1-minchan@kernel.org/ > > * fix build warning - lkp > > * fix refetching issue of migration type > > * add side effect on !ZONE_MOVABLE and !MIGRATE_CMA in description - david > > > > include/linux/mm.h | 15 +++++++++++++-- > > 1 file changed, 13 insertions(+), 2 deletions(-) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index 6acca5cecbc5..cbf79eb790e0 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -1625,8 +1625,19 @@ static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma, > > #ifdef CONFIG_MIGRATION > > static inline bool is_pinnable_page(struct page *page) > > { > > - return !(is_zone_movable_page(page) || is_migrate_cma_page(page)) || > > - is_zero_pfn(page_to_pfn(page)); > > +#ifdef CONFIG_CMA > > + /* > > + * use volatile to use local variable mt instead of > > + * refetching mt value. > > + */ > > This comment is stale and should therefore be deleted. Yeah. > > > + int __mt = get_pageblock_migratetype(page); > > + int mt = __READ_ONCE(__mt); > > Although I saw the email discussion about this in v2, that discussion > didn't go far enough. It started with "don't use volatile", and went > on to "try __READ_ONCE() instead", but it should have continued on > to "you don't need this at all". That's really what I want to hear from experts so wanted to learn "Why". How could we prevent refetching of the mt if we don't use __READ_ONCE or volatile there? > > Because you don't. There is nothing you are racing with, and adding > __READ_ONCE() in order to avoid a completely not-going-to-happen > compiler re-invocation of a significant code block is just very wrong. > > So let's just let it go entirely. :) Yeah, once it's clear for everyone, I am happy to remove the unnecessary lines. > > > + > > + if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE) > > MIGRATE_ISOLATE is not always defined, and must therefore be protected > with a check on CONFIG_MEMORY_ISOLATION...oh never mind, I see in > mm/Kconfig: > > config CMA > ... > select MEMORY_ISOLATION > > ...so that's OK. What a tangled web, I wonder if enum migratetype > really needs to be sliced up like that, but that's a separate topic. > > > + return false; > > +#endif > > + > > + return !(is_zone_movable_page(page) || is_zero_pfn(page_to_pfn(page))); > > And actually this area is getting rather nested with the various ifdefs, > and it is probably time to refactor them a bit, considering the above > point about MIGRATE_ISOLATE. I had something in mind (which is why I > delayed my feedback), along the lines of merging _ISOLATE and _CMA and > the ifdefs. But it's just a fine point and not critical of course, just > a thought. Glad to hear someone is looking that.
On 5/10/22 4:31 PM, Minchan Kim wrote: >>> + int __mt = get_pageblock_migratetype(page); >>> + int mt = __READ_ONCE(__mt); >> >> Although I saw the email discussion about this in v2, that discussion >> didn't go far enough. It started with "don't use volatile", and went >> on to "try __READ_ONCE() instead", but it should have continued on >> to "you don't need this at all". > > That's really what I want to hear from experts so wanted to learn > "Why". How could we prevent refetching of the mt if we don't use > __READ_ONCE or volatile there? > >> >> Because you don't. There is nothing you are racing with, and adding >> __READ_ONCE() in order to avoid a completely not-going-to-happen >> compiler re-invocation of a significant code block is just very wrong. >> >> So let's just let it go entirely. :) > > Yeah, once it's clear for everyone, I am happy to remove the > unnecessary lines. > >> >>> + >>> + if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE) >> With or without __READ_ONCE() or volatile or anything else, this code will do what you want. Which is: loosely check for either of the above. What functional problem do you think you are preventing with __READ_ONCE()? Because I don't see one. thanks,
On Tue, May 10, 2022 at 04:58:13PM -0700, John Hubbard wrote: > On 5/10/22 4:31 PM, Minchan Kim wrote: > > > > + int __mt = get_pageblock_migratetype(page); > > > > + int mt = __READ_ONCE(__mt); > > > > > > Although I saw the email discussion about this in v2, that discussion > > > didn't go far enough. It started with "don't use volatile", and went > > > on to "try __READ_ONCE() instead", but it should have continued on > > > to "you don't need this at all". > > > > That's really what I want to hear from experts so wanted to learn > > "Why". How could we prevent refetching of the mt if we don't use > > __READ_ONCE or volatile there? > > > > > > > > Because you don't. There is nothing you are racing with, and adding > > > __READ_ONCE() in order to avoid a completely not-going-to-happen > > > compiler re-invocation of a significant code block is just very wrong. > > > > > > So let's just let it go entirely. :) > > > > Yeah, once it's clear for everyone, I am happy to remove the > > unnecessary lines. > > > > > > > > > + > > > > + if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE) > > > > > With or without __READ_ONCE() or volatile or anything else, > this code will do what you want. Which is: loosely check > for either of the above. > > What functional problem do you think you are preventing > with __READ_ONCE()? Because I don't see one. I discussed the issue at v1 so please take a look. https://lore.kernel.org/all/YnFvmc+eMoXvLCWf@google.com/
On 5/10/22 17:09, Minchan Kim wrote: > On Tue, May 10, 2022 at 04:58:13PM -0700, John Hubbard wrote: >> On 5/10/22 4:31 PM, Minchan Kim wrote: >>>>> + int __mt = get_pageblock_migratetype(page); >>>>> + int mt = __READ_ONCE(__mt); >>>> >>>> Although I saw the email discussion about this in v2, that discussion >>>> didn't go far enough. It started with "don't use volatile", and went >>>> on to "try __READ_ONCE() instead", but it should have continued on >>>> to "you don't need this at all". >>> >>> That's really what I want to hear from experts so wanted to learn >>> "Why". How could we prevent refetching of the mt if we don't use >>> __READ_ONCE or volatile there? >>> >>>> >>>> Because you don't. There is nothing you are racing with, and adding >>>> __READ_ONCE() in order to avoid a completely not-going-to-happen >>>> compiler re-invocation of a significant code block is just very wrong. >>>> >>>> So let's just let it go entirely. :) >>> >>> Yeah, once it's clear for everyone, I am happy to remove the >>> unnecessary lines. >>> >>>> >>>>> + >>>>> + if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE) >>>> >> >> With or without __READ_ONCE() or volatile or anything else, >> this code will do what you want. Which is: loosely check >> for either of the above. >> >> What functional problem do you think you are preventing >> with __READ_ONCE()? Because I don't see one. > > I discussed the issue at v1 so please take a look. > > https://lore.kernel.org/all/YnFvmc+eMoXvLCWf@google.com/ I read that, but there was never any real justification there for needing to prevent a re-read of mt, just a preference: "I'd like to keep use the local variable mt's value in folloing conditions checks instead of refetching the value from get_pageblock_migratetype." But I don't believe that there is any combination of values of mt that will cause a problem here. I also think that once we pull in experts, they will tell us that the compiler is not going to re-run a non-trivial function to re-fetch a value, but I'm not one of those experts, so that's still arguable. But imagine what the kernel code would look like if every time we call a large function, we have to consider if it actually gets called some arbitrary number of times, due to (anti-) optimizations by the compiler. This seems like something that is not really happening. thanks,
On Tue, May 10, 2022 at 09:32:05PM -0700, John Hubbard wrote: > On 5/10/22 17:09, Minchan Kim wrote: > > On Tue, May 10, 2022 at 04:58:13PM -0700, John Hubbard wrote: > > > On 5/10/22 4:31 PM, Minchan Kim wrote: > > > > > > + int __mt = get_pageblock_migratetype(page); > > > > > > + int mt = __READ_ONCE(__mt); > > > > > > > > > > Although I saw the email discussion about this in v2, that discussion > > > > > didn't go far enough. It started with "don't use volatile", and went > > > > > on to "try __READ_ONCE() instead", but it should have continued on > > > > > to "you don't need this at all". > > > > > > > > That's really what I want to hear from experts so wanted to learn > > > > "Why". How could we prevent refetching of the mt if we don't use > > > > __READ_ONCE or volatile there? > > > > > > > > > > > > > > Because you don't. There is nothing you are racing with, and adding > > > > > __READ_ONCE() in order to avoid a completely not-going-to-happen > > > > > compiler re-invocation of a significant code block is just very wrong. > > > > > > > > > > So let's just let it go entirely. :) > > > > > > > > Yeah, once it's clear for everyone, I am happy to remove the > > > > unnecessary lines. > > > > > > > > > > > > > > > + > > > > > > + if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE) > > > > > > > > > > > With or without __READ_ONCE() or volatile or anything else, > > > this code will do what you want. Which is: loosely check > > > for either of the above. > > > > > > What functional problem do you think you are preventing > > > with __READ_ONCE()? Because I don't see one. > > > > I discussed the issue at v1 so please take a look. > > > > https://lore.kernel.org/all/YnFvmc+eMoXvLCWf@google.com/ > > I read that, but there was never any real justification there for needing > to prevent a re-read of mt, just a preference: "I'd like to keep use the local > variable mt's value in folloing conditions checks instead of refetching > the value from get_pageblock_migratetype." > > But I don't believe that there is any combination of values of mt that > will cause a problem here. > > I also think that once we pull in experts, they will tell us that the > compiler is not going to re-run a non-trivial function to re-fetch a > value, but I'm not one of those experts, so that's still arguable. But > imagine what the kernel code would look like if every time we call > a large function, we have to consider if it actually gets called some > arbitrary number of times, due to (anti-) optimizations by the compiler. > This seems like something that is not really happening. Maybe, I might be paranoid since I have heard too subtle things about how compiler could changes high level language code so wanted be careful especially when we do lockless-stuff. Who cares when we change the large(?) function to small(?) function later on? I'd like to hear from experts to decide it.
On 5/11/22 2:46 PM, Minchan Kim wrote: >> I read that, but there was never any real justification there for needing >> to prevent a re-read of mt, just a preference: "I'd like to keep use the local >> variable mt's value in folloing conditions checks instead of refetching >> the value from get_pageblock_migratetype." >> >> But I don't believe that there is any combination of values of mt that >> will cause a problem here. >> >> I also think that once we pull in experts, they will tell us that the >> compiler is not going to re-run a non-trivial function to re-fetch a >> value, but I'm not one of those experts, so that's still arguable. But >> imagine what the kernel code would look like if every time we call >> a large function, we have to consider if it actually gets called some >> arbitrary number of times, due to (anti-) optimizations by the compiler. >> This seems like something that is not really happening. > > Maybe, I might be paranoid since I have heard too subtle things > about how compiler could changes high level language code so wanted > be careful especially when we do lockless-stuff. > > Who cares when we change the large(?) function to small(?) function > later on? I'd like to hear from experts to decide it. > Yes. But one thing that is still unanswered, that I think you can answer, is: even if the compiler *did* re-read the mt variable, what problems could that cause? I claim "no problems", because there is no combination of 0, _CMA, _ISOLATE, _CMA|ISOLATE that will cause problems here. Any if that's true, then we can leave the experts alone, because the answer is there without knowing what happens exactly to mt. thanks,
On Wed, May 11, 2022 at 03:25:49PM -0700, John Hubbard wrote: > On 5/11/22 2:46 PM, Minchan Kim wrote: > > > I read that, but there was never any real justification there for needing > > > to prevent a re-read of mt, just a preference: "I'd like to keep use the local > > > variable mt's value in folloing conditions checks instead of refetching > > > the value from get_pageblock_migratetype." > > > > > > But I don't believe that there is any combination of values of mt that > > > will cause a problem here. > > > > > > I also think that once we pull in experts, they will tell us that the > > > compiler is not going to re-run a non-trivial function to re-fetch a > > > value, but I'm not one of those experts, so that's still arguable. But > > > imagine what the kernel code would look like if every time we call > > > a large function, we have to consider if it actually gets called some > > > arbitrary number of times, due to (anti-) optimizations by the compiler. > > > This seems like something that is not really happening. > > > > Maybe, I might be paranoid since I have heard too subtle things > > about how compiler could changes high level language code so wanted > > be careful especially when we do lockless-stuff. > > > > Who cares when we change the large(?) function to small(?) function > > later on? I'd like to hear from experts to decide it. > > > > Yes. But one thing that is still unanswered, that I think you can > answer, is: even if the compiler *did* re-read the mt variable, what > problems could that cause? I claim "no problems", because there is > no combination of 0, _CMA, _ISOLATE, _CMA|ISOLATE that will cause > problems here. What scenario I am concerning with __READ_ONCE so compiler inlining get_pageblock_migratetype two times are CPU 0 CPU 1 alloc_contig_range is_pinnable_page start_isolate_page_range set_pageblock_migratetype(MIGRATE_ISOLATE) if (get_pageeblock_migratetype(page) == MIGRATE_CMA) so it's false undo: set_pageblock_migratetype(MIGRATE_CMA) if (get_pageeblock_migratetype(page) == MIGRATE_ISOLATE) so it's false In the end, CMA memory would be pinned by CPU 0 process so CMA allocation keep failed until the process release the refcount. > > Any if that's true, then we can leave the experts alone, because > the answer is there without knowing what happens exactly to mt. > > thanks, > > -- > John Hubbard > NVIDIA >
On 5/11/22 15:37, Minchan Kim wrote: >> Yes. But one thing that is still unanswered, that I think you can >> answer, is: even if the compiler *did* re-read the mt variable, what >> problems could that cause? I claim "no problems", because there is >> no combination of 0, _CMA, _ISOLATE, _CMA|ISOLATE that will cause >> problems here. > > What scenario I am concerning with __READ_ONCE so compiler > inlining get_pageblock_migratetype two times are > > CPU 0 CPU 1 > alloc_contig_range > is_pinnable_page start_isolate_page_range > set_pageblock_migratetype(MIGRATE_ISOLATE) > if (get_pageeblock_migratetype(page) == MIGRATE_CMA) > so it's false > undo: > set_pageblock_migratetype(MIGRATE_CMA) > > if (get_pageeblock_migratetype(page) == MIGRATE_ISOLATE) > so it's false > > In the end, CMA memory would be pinned by CPU 0 process > so CMA allocation keep failed until the process release the > refcount. > OK, so the code checks the wrong item each time. But the code really only needs to know "is either _CMA or _ISOLATE set?". And so you can just sidestep the entire question by writing it like this: int mt = get_pageblock_migratetype(page); if (mt & (MIGRATE_ISOLATE | MIGRATE_CMA)) return false; ...yes? thanks,
On Wed, May 11, 2022 at 03:49:06PM -0700, John Hubbard wrote: > On 5/11/22 15:37, Minchan Kim wrote: > > > Yes. But one thing that is still unanswered, that I think you can > > > answer, is: even if the compiler *did* re-read the mt variable, what > > > problems could that cause? I claim "no problems", because there is > > > no combination of 0, _CMA, _ISOLATE, _CMA|ISOLATE that will cause > > > problems here. > > > > What scenario I am concerning with __READ_ONCE so compiler > > inlining get_pageblock_migratetype two times are > > > > CPU 0 CPU 1 > > alloc_contig_range > > is_pinnable_page start_isolate_page_range > > set_pageblock_migratetype(MIGRATE_ISOLATE) > > if (get_pageeblock_migratetype(page) == MIGRATE_CMA) > > so it's false > > undo: > > set_pageblock_migratetype(MIGRATE_CMA) > > if (get_pageeblock_migratetype(page) == MIGRATE_ISOLATE) > > so it's false > > > > In the end, CMA memory would be pinned by CPU 0 process > > so CMA allocation keep failed until the process release the > > refcount. > > > > OK, so the code checks the wrong item each time. But the code really > only needs to know "is either _CMA or _ISOLATE set?". And so you Yes. > can just sidestep the entire question by writing it like this: > > int mt = get_pageblock_migratetype(page); > > if (mt & (MIGRATE_ISOLATE | MIGRATE_CMA)) > return false; I am confused. Isn't it same question? set_pageblock_migratetype(MIGRATE_ISOLATE) if (get_pageblock_migrate(page) & MIGRATE_CMA) set_pageblock_migratetype(MIGRATE_CMA) if (get_pageblock_migrate(page) & MIGRATE_ISOLATE)
On 5/11/22 16:08, Minchan Kim wrote: >> OK, so the code checks the wrong item each time. But the code really >> only needs to know "is either _CMA or _ISOLATE set?". And so you > > Yes. > >> can just sidestep the entire question by writing it like this: >> >> int mt = get_pageblock_migratetype(page); >> >> if (mt & (MIGRATE_ISOLATE | MIGRATE_CMA)) >> return false; > > I am confused. Isn't it same question? > > set_pageblock_migratetype(MIGRATE_ISOLATE) > if (get_pageblock_migrate(page) & MIGRATE_CMA) > > set_pageblock_migratetype(MIGRATE_CMA) > > if (get_pageblock_migrate(page) & MIGRATE_ISOLATE) Well no, because the "&" operation is a single operation on the CPU, and isn't going to get split up like that. thanks,
On Wed, May 11, 2022 at 04:13:10PM -0700, John Hubbard wrote: > On 5/11/22 16:08, Minchan Kim wrote: > > > OK, so the code checks the wrong item each time. But the code really > > > only needs to know "is either _CMA or _ISOLATE set?". And so you > > > > Yes. > > > > > can just sidestep the entire question by writing it like this: > > > > > > int mt = get_pageblock_migratetype(page); > > > > > > if (mt & (MIGRATE_ISOLATE | MIGRATE_CMA)) > > > return false; > > > > I am confused. Isn't it same question? > > > > set_pageblock_migratetype(MIGRATE_ISOLATE) > > if (get_pageblock_migrate(page) & MIGRATE_CMA) > > > > set_pageblock_migratetype(MIGRATE_CMA) > > > > if (get_pageblock_migrate(page) & MIGRATE_ISOLATE) > > Well no, because the "&" operation is a single operation on the CPU, and > isn't going to get split up like that. Oh, if that's true, yeah, I could live with it. Thanks, let me post next revision with commenting about that.
On Wed, May 11, 2022 at 04:15:17PM -0700, Minchan Kim wrote: > On Wed, May 11, 2022 at 04:13:10PM -0700, John Hubbard wrote: > > On 5/11/22 16:08, Minchan Kim wrote: > > > > OK, so the code checks the wrong item each time. But the code really > > > > only needs to know "is either _CMA or _ISOLATE set?". And so you > > > > > > Yes. > > > > > > > can just sidestep the entire question by writing it like this: > > > > > > > > int mt = get_pageblock_migratetype(page); > > > > > > > > if (mt & (MIGRATE_ISOLATE | MIGRATE_CMA)) > > > > return false; > > > > > > I am confused. Isn't it same question? > > > > > > set_pageblock_migratetype(MIGRATE_ISOLATE) > > > if (get_pageblock_migrate(page) & MIGRATE_CMA) > > > > > > set_pageblock_migratetype(MIGRATE_CMA) > > > > > > if (get_pageblock_migrate(page) & MIGRATE_ISOLATE) > > > > Well no, because the "&" operation is a single operation on the CPU, and > > isn't going to get split up like that. > > Oh, if that's true, yeah, I could live with it. > > Thanks, let me post next revision with commenting about that. This is delta to confirm before posting next revision. Are you okay with this one? diff --git a/include/linux/mm.h b/include/linux/mm.h index cbf79eb790e0..7b2df6780552 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1626,14 +1626,14 @@ static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma, static inline bool is_pinnable_page(struct page *page) { #ifdef CONFIG_CMA + int mt = get_pageblock_migratetype(page); + /* - * use volatile to use local variable mt instead of - * refetching mt value. + * "&" operation would prevent compiler split up + * get_pageblock_migratetype two times for each + * condition check: refetching mt value two times. */ - int __mt = get_pageblock_migratetype(page); - int mt = __READ_ONCE(__mt); - - if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE) + if (mt & (MIGRATE_ISOLATE | MIGRATE_CMA)) return false; #endif
On 5/11/22 16:28, Minchan Kim wrote: > This is delta to confirm before posting next revision. > > Are you okay with this one? Yes, just maybe delete the comment: > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index cbf79eb790e0..7b2df6780552 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1626,14 +1626,14 @@ static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma, > static inline bool is_pinnable_page(struct page *page) > { > #ifdef CONFIG_CMA > + int mt = get_pageblock_migratetype(page); > + > /* > - * use volatile to use local variable mt instead of > - * refetching mt value. > + * "&" operation would prevent compiler split up > + * get_pageblock_migratetype two times for each > + * condition check: refetching mt value two times. > */ If you wanted a useful comment here, I think a description of what is running at the same time would be good. But otherwise, I'd just delete the entire comment, because a micro-comment about "&" is not really worth the vertical space here IMHO. > - int __mt = get_pageblock_migratetype(page); > - int mt = __READ_ONCE(__mt); > - > - if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE) > + if (mt & (MIGRATE_ISOLATE | MIGRATE_CMA)) > return false; > #endif > Anyway, I'm comfortable with this now: Reviewed-by: John Hubbard <jhubbard@nvidia.com> thanks,
On Wed, May 11, 2022 at 04:13:10PM -0700, John Hubbard wrote: > On 5/11/22 16:08, Minchan Kim wrote: > > > OK, so the code checks the wrong item each time. But the code really > > > only needs to know "is either _CMA or _ISOLATE set?". And so you > > > > Yes. > > > > > can just sidestep the entire question by writing it like this: > > > > > > int mt = get_pageblock_migratetype(page); > > > > > > if (mt & (MIGRATE_ISOLATE | MIGRATE_CMA)) > > > return false; > > > > I am confused. Isn't it same question? > > > > set_pageblock_migratetype(MIGRATE_ISOLATE) > > if (get_pageblock_migrate(page) & MIGRATE_CMA) > > > > set_pageblock_migratetype(MIGRATE_CMA) > > > > if (get_pageblock_migrate(page) & MIGRATE_ISOLATE) > > Well no, because the "&" operation is a single operation on the CPU, and > isn't going to get split up like that. Chiming in a bit late... The usual way that this sort of thing causes trouble is if there is a single store instruction that changes the value from MIGRATE_ISOLATE to MIGRATE_CMA, and if the compiler decides to fetch twice, AND twice, and then combine the results. This could give a zero outcome where the underlying variable never had the value zero. Is this sort of thing low probability? Definitely. Isn't this sort of thing prohibited? Definitely not. So what you have will likely work for at least a while longer, but it is not guaranteed and it forces you to think a lot harder about what the current implementations of the compiler can and cannot do to you. The following LWN article goes through some of the possible optimizations (vandalisms?) in this area: https://lwn.net/Articles/793253/ In the end, it is your code, so you get to decide how much you would like to keep track of what compilers get up to over time. ;-) Thanx, Paul
On 5/11/22 16:45, Paul E. McKenney wrote: >> >> Well no, because the "&" operation is a single operation on the CPU, and >> isn't going to get split up like that. > > Chiming in a bit late... Much appreciated! > > The usual way that this sort of thing causes trouble is if there is a > single store instruction that changes the value from MIGRATE_ISOLATE > to MIGRATE_CMA, and if the compiler decides to fetch twice, AND twice, Doing an AND twice for "x & constant" this definitely blows my mind. Is nothing sacred? :) > and then combine the results. This could give a zero outcome where the > underlying variable never had the value zero. > > Is this sort of thing low probability? > > Definitely. > > Isn't this sort of thing prohibited? > > Definitely not. > > So what you have will likely work for at least a while longer, but it > is not guaranteed and it forces you to think a lot harder about what > the current implementations of the compiler can and cannot do to you. > > The following LWN article goes through some of the possible optimizations > (vandalisms?) in this area: https://lwn.net/Articles/793253/ > hmm, I don't think we hit any of those cases, do we? Because here, the "write" side is via a non-inline function that I just don't believe the compiler is allowed to call twice. Or is it? Minchan's earlier summary: CPU 0 CPU1 set_pageblock_migratetype(MIGRATE_ISOLATE) if (get_pageblock_migrate(page) & MIGRATE_CMA) set_pageblock_migratetype(MIGRATE_CMA) if (get_pageblock_migrate(page) & MIGRATE_ISOLATE) ...where set_pageblock_migratetype() is not inline. thanks,
On Wed, May 11, 2022 at 04:57:04PM -0700, John Hubbard wrote: > On 5/11/22 16:45, Paul E. McKenney wrote: > > > > > > Well no, because the "&" operation is a single operation on the CPU, and > > > isn't going to get split up like that. > > > > Chiming in a bit late... > > Much appreciated! > > > The usual way that this sort of thing causes trouble is if there is a > > single store instruction that changes the value from MIGRATE_ISOLATE > > to MIGRATE_CMA, and if the compiler decides to fetch twice, AND twice, > > Doing an AND twice for "x & constant" this definitely blows my mind. Is > nothing sacred? :) Apparently there is not much sacred to compiler writers in search of additional optimizations. :-/ > > and then combine the results. This could give a zero outcome where the > > underlying variable never had the value zero. > > > > Is this sort of thing low probability? > > > > Definitely. > > > > Isn't this sort of thing prohibited? > > > > Definitely not. > > > > So what you have will likely work for at least a while longer, but it > > is not guaranteed and it forces you to think a lot harder about what > > the current implementations of the compiler can and cannot do to you. > > > > The following LWN article goes through some of the possible optimizations > > (vandalisms?) in this area: https://lwn.net/Articles/793253/ > > hmm, I don't think we hit any of those cases, do we? Because here, the > "write" side is via a non-inline function that I just don't believe the > compiler is allowed to call twice. Or is it? Not yet. But if link-time optimizations (LTO) continue their march, I wouldn't feel safe ruling it out... > Minchan's earlier summary: > > CPU 0 CPU1 > > > set_pageblock_migratetype(MIGRATE_ISOLATE) > > if (get_pageblock_migrate(page) & MIGRATE_CMA) > > set_pageblock_migratetype(MIGRATE_CMA) > > if (get_pageblock_migrate(page) & MIGRATE_ISOLATE) > > ...where set_pageblock_migratetype() is not inline. ...especially if the code is reorganized for whatever reason. > thanks, > -- > John Hubbard > NVIDIA But again: > > In the end, it is your code, so you get to decide how much you would > > like to keep track of what compilers get up to over time. ;-) Thanx, Paul
On 5/11/22 16:57, John Hubbard wrote: > On 5/11/22 16:45, Paul E. McKenney wrote: >>> >>> Well no, because the "&" operation is a single operation on the CPU, and >>> isn't going to get split up like that. >> >> Chiming in a bit late... > > Much appreciated! > >> >> The usual way that this sort of thing causes trouble is if there is a >> single store instruction that changes the value from MIGRATE_ISOLATE >> to MIGRATE_CMA, and if the compiler decides to fetch twice, AND twice, > > Doing an AND twice for "x & constant" this definitely blows my mind. Is > nothing sacred? :) > >> and then combine the results. This could give a zero outcome where the >> underlying variable never had the value zero. >> >> Is this sort of thing low probability? >> >> Definitely. >> >> Isn't this sort of thing prohibited? >> >> Definitely not. >> >> So what you have will likely work for at least a while longer, but it >> is not guaranteed and it forces you to think a lot harder about what >> the current implementations of the compiler can and cannot do to you. >> >> The following LWN article goes through some of the possible optimizations >> (vandalisms?) in this area: https://lwn.net/Articles/793253/ >> > > hmm, I don't think we hit any of those cases, do we? Because here, the > "write" side is via a non-inline function that I just don't believe the > compiler is allowed to call twice. Or is it? > > Minchan's earlier summary: > > CPU 0                        CPU1 > > >                              set_pageblock_migratetype(MIGRATE_ISOLATE) > > if (get_pageblock_migrate(page) & MIGRATE_CMA) > >                              set_pageblock_migratetype(MIGRATE_CMA) > > if (get_pageblock_migrate(page) & MIGRATE_ISOLATE) > > ...where set_pageblock_migratetype() is not inline. > > thanks, Let me try to say this more clearly: I don't think that the following __READ_ONCE() statement can actually help anything, given that get_pageblock_migratetype() is non-inlined: + int __mt = get_pageblock_migratetype(page); + int mt = __READ_ONCE(__mt); + + if (mt & (MIGRATE_CMA | MIGRATE_ISOLATE)) + return false; Am I missing anything here? thanks,
On Wed, May 11, 2022 at 05:12:32PM -0700, John Hubbard wrote: > On 5/11/22 16:57, John Hubbard wrote: > > On 5/11/22 16:45, Paul E. McKenney wrote: > > > > > > > > Well no, because the "&" operation is a single operation on the CPU, and > > > > isn't going to get split up like that. > > > > > > Chiming in a bit late... > > > > Much appreciated! > > > > > > > > The usual way that this sort of thing causes trouble is if there is a > > > single store instruction that changes the value from MIGRATE_ISOLATE > > > to MIGRATE_CMA, and if the compiler decides to fetch twice, AND twice, > > > > Doing an AND twice for "x & constant" this definitely blows my mind. Is > > nothing sacred? :) > > > > > and then combine the results. This could give a zero outcome where the > > > underlying variable never had the value zero. > > > > > > Is this sort of thing low probability? > > > > > > Definitely. > > > > > > Isn't this sort of thing prohibited? > > > > > > Definitely not. > > > > > > So what you have will likely work for at least a while longer, but it > > > is not guaranteed and it forces you to think a lot harder about what > > > the current implementations of the compiler can and cannot do to you. > > > > > > The following LWN article goes through some of the possible optimizations > > > (vandalisms?) in this area: https://lwn.net/Articles/793253/ > > > > > > > hmm, I don't think we hit any of those cases, do we? Because here, the > > "write" side is via a non-inline function that I just don't believe the > > compiler is allowed to call twice. Or is it? > > > > Minchan's earlier summary: > > > > CPU 0                        CPU1 > > > > > >                              set_pageblock_migratetype(MIGRATE_ISOLATE) > > > > if (get_pageblock_migrate(page) & MIGRATE_CMA) > > > >                              set_pageblock_migratetype(MIGRATE_CMA) > > > > if (get_pageblock_migrate(page) & MIGRATE_ISOLATE) > > > > ...where set_pageblock_migratetype() is not inline. > > > > thanks, > > Let me try to say this more clearly: I don't think that the following > __READ_ONCE() statement can actually help anything, given that > get_pageblock_migratetype() is non-inlined: > > + int __mt = get_pageblock_migratetype(page); > + int mt = __READ_ONCE(__mt); > + > + if (mt & (MIGRATE_CMA | MIGRATE_ISOLATE)) > + return false; > > > Am I missing anything here? In the absence of future aggression from link-time optimizations (LTO), you are missing nothing. Thanx, Paul
On Wed, May 11, 2022 at 05:22:07PM -0700, Paul E. McKenney wrote: > On Wed, May 11, 2022 at 05:12:32PM -0700, John Hubbard wrote: > > On 5/11/22 16:57, John Hubbard wrote: > > > On 5/11/22 16:45, Paul E. McKenney wrote: > > > > > > > > > > Well no, because the "&" operation is a single operation on the CPU, and > > > > > isn't going to get split up like that. > > > > > > > > Chiming in a bit late... > > > > > > Much appreciated! > > > > > > > > > > > The usual way that this sort of thing causes trouble is if there is a > > > > single store instruction that changes the value from MIGRATE_ISOLATE > > > > to MIGRATE_CMA, and if the compiler decides to fetch twice, AND twice, > > > > > > Doing an AND twice for "x & constant" this definitely blows my mind. Is > > > nothing sacred? :) > > > > > > > and then combine the results. This could give a zero outcome where the > > > > underlying variable never had the value zero. > > > > > > > > Is this sort of thing low probability? > > > > > > > > Definitely. > > > > > > > > Isn't this sort of thing prohibited? > > > > > > > > Definitely not. > > > > > > > > So what you have will likely work for at least a while longer, but it > > > > is not guaranteed and it forces you to think a lot harder about what > > > > the current implementations of the compiler can and cannot do to you. > > > > > > > > The following LWN article goes through some of the possible optimizations > > > > (vandalisms?) in this area: https://lwn.net/Articles/793253/ > > > > > > > > > > hmm, I don't think we hit any of those cases, do we? Because here, the > > > "write" side is via a non-inline function that I just don't believe the > > > compiler is allowed to call twice. Or is it? > > > > > > Minchan's earlier summary: > > > > > > CPU 0                        CPU1 > > > > > > > > >                              set_pageblock_migratetype(MIGRATE_ISOLATE) > > > > > > if (get_pageblock_migrate(page) & MIGRATE_CMA) > > > > > >                              set_pageblock_migratetype(MIGRATE_CMA) > > > > > > if (get_pageblock_migrate(page) & MIGRATE_ISOLATE) > > > > > > ...where set_pageblock_migratetype() is not inline. > > > > > > thanks, > > > > Let me try to say this more clearly: I don't think that the following > > __READ_ONCE() statement can actually help anything, given that > > get_pageblock_migratetype() is non-inlined: > > > > + int __mt = get_pageblock_migratetype(page); > > + int mt = __READ_ONCE(__mt); > > + > > + if (mt & (MIGRATE_CMA | MIGRATE_ISOLATE)) > > + return false; > > > > > > Am I missing anything here? > > In the absence of future aggression from link-time optimizations (LTO), > you are missing nothing. A thing I want to note is Android kernel uses LTO full mode.
On 5/11/22 17:26, Minchan Kim wrote: >>> Let me try to say this more clearly: I don't think that the following >>> __READ_ONCE() statement can actually help anything, given that >>> get_pageblock_migratetype() is non-inlined: >>> >>> + int __mt = get_pageblock_migratetype(page); >>> + int mt = __READ_ONCE(__mt); >>> + >>> + if (mt & (MIGRATE_CMA | MIGRATE_ISOLATE)) >>> + return false; >>> >>> >>> Am I missing anything here? >> >> In the absence of future aggression from link-time optimizations (LTO), >> you are missing nothing. > > A thing I want to note is Android kernel uses LTO full mode. Thanks Paul for explaining the state of things. Minchan, how about something like very close to your original draft, then, but with a little note, and the "&" as well: int __mt = get_pageblock_migratetype(page); /* * Defend against future compiler LTO features, or code refactoring * that inlines the above function, by forcing a single read. Because, this * routine races with set_pageblock_migratetype(), and we want to avoid * reading zero, when actually one or the other flags was set. */ int mt = __READ_ONCE(__mt); if (mt & (MIGRATE_CMA | MIGRATE_ISOLATE)) return false; ...which should make everyone comfortable and protected from the future sins of the compiler and linker teams? :) thanks,
On Wed, May 11, 2022 at 05:26:55PM -0700, Minchan Kim wrote: > On Wed, May 11, 2022 at 05:22:07PM -0700, Paul E. McKenney wrote: > > On Wed, May 11, 2022 at 05:12:32PM -0700, John Hubbard wrote: > > > On 5/11/22 16:57, John Hubbard wrote: > > > > On 5/11/22 16:45, Paul E. McKenney wrote: > > > > > > > > > > > > Well no, because the "&" operation is a single operation on the CPU, and > > > > > > isn't going to get split up like that. > > > > > > > > > > Chiming in a bit late... > > > > > > > > Much appreciated! > > > > > > > > > > > > > > The usual way that this sort of thing causes trouble is if there is a > > > > > single store instruction that changes the value from MIGRATE_ISOLATE > > > > > to MIGRATE_CMA, and if the compiler decides to fetch twice, AND twice, > > > > > > > > Doing an AND twice for "x & constant" this definitely blows my mind. Is > > > > nothing sacred? :) > > > > > > > > > and then combine the results. This could give a zero outcome where the > > > > > underlying variable never had the value zero. > > > > > > > > > > Is this sort of thing low probability? > > > > > > > > > > Definitely. > > > > > > > > > > Isn't this sort of thing prohibited? > > > > > > > > > > Definitely not. > > > > > > > > > > So what you have will likely work for at least a while longer, but it > > > > > is not guaranteed and it forces you to think a lot harder about what > > > > > the current implementations of the compiler can and cannot do to you. > > > > > > > > > > The following LWN article goes through some of the possible optimizations > > > > > (vandalisms?) in this area: https://lwn.net/Articles/793253/ > > > > > > > > > > > > > hmm, I don't think we hit any of those cases, do we? Because here, the > > > > "write" side is via a non-inline function that I just don't believe the > > > > compiler is allowed to call twice. Or is it? > > > > > > > > Minchan's earlier summary: > > > > > > > > CPU 0                        CPU1 > > > > > > > > > > > >                              set_pageblock_migratetype(MIGRATE_ISOLATE) > > > > > > > > if (get_pageblock_migrate(page) & MIGRATE_CMA) > > > > > > > >                              set_pageblock_migratetype(MIGRATE_CMA) > > > > > > > > if (get_pageblock_migrate(page) & MIGRATE_ISOLATE) > > > > > > > > ...where set_pageblock_migratetype() is not inline. > > > > > > > > thanks, > > > > > > Let me try to say this more clearly: I don't think that the following > > > __READ_ONCE() statement can actually help anything, given that > > > get_pageblock_migratetype() is non-inlined: > > > > > > + int __mt = get_pageblock_migratetype(page); > > > + int mt = __READ_ONCE(__mt); > > > + > > > + if (mt & (MIGRATE_CMA | MIGRATE_ISOLATE)) > > > + return false; > > > > > > > > > Am I missing anything here? > > > > In the absence of future aggression from link-time optimizations (LTO), > > you are missing nothing. > > A thing I want to note is Android kernel uses LTO full mode. I doubt that current LTO can do this sort of optimized inlining, at least, not yet. Thanx, Paul
On Wed, May 11, 2022 at 05:34:52PM -0700, John Hubbard wrote: > On 5/11/22 17:26, Minchan Kim wrote: > > > > Let me try to say this more clearly: I don't think that the following > > > > __READ_ONCE() statement can actually help anything, given that > > > > get_pageblock_migratetype() is non-inlined: > > > > > > > > + int __mt = get_pageblock_migratetype(page); > > > > + int mt = __READ_ONCE(__mt); > > > > + > > > > + if (mt & (MIGRATE_CMA | MIGRATE_ISOLATE)) > > > > + return false; > > > > > > > > > > > > Am I missing anything here? > > > > > > In the absence of future aggression from link-time optimizations (LTO), > > > you are missing nothing. > > > > A thing I want to note is Android kernel uses LTO full mode. > > Thanks Paul for explaining the state of things. > > Minchan, how about something like very close to your original draft, > then, but with a little note, and the "&" as well: > > int __mt = get_pageblock_migratetype(page); > > /* > * Defend against future compiler LTO features, or code refactoring > * that inlines the above function, by forcing a single read. Because, this > * routine races with set_pageblock_migratetype(), and we want to avoid > * reading zero, when actually one or the other flags was set. > */ > int mt = __READ_ONCE(__mt); > > if (mt & (MIGRATE_CMA | MIGRATE_ISOLATE)) > return false; > > > ...which should make everyone comfortable and protected from the > future sins of the compiler and linker teams? :) This would work, but it would force a store to the stack and an immediate reload. Which might be OK on this code path. But using READ_ONCE() in (I think?) __get_pfnblock_flags_mask() would likely generate the same code that is produced today. word = READ_ONCE(bitmap[word_bitidx]); But I could easily have missed a turn in that cascade of functions. ;-) Or there might be some code path that really hates a READ_ONCE() in that place. Thanx, Paul
On 5/11/22 17:49, Paul E. McKenney wrote: >> Thanks Paul for explaining the state of things. >> >> Minchan, how about something like very close to your original draft, >> then, but with a little note, and the "&" as well: >> >> int __mt = get_pageblock_migratetype(page); >> >> /* >> * Defend against future compiler LTO features, or code refactoring >> * that inlines the above function, by forcing a single read. Because, this >> * routine races with set_pageblock_migratetype(), and we want to avoid >> * reading zero, when actually one or the other flags was set. >> */ >> int mt = __READ_ONCE(__mt); >> >> if (mt & (MIGRATE_CMA | MIGRATE_ISOLATE)) >> return false; >> >> >> ...which should make everyone comfortable and protected from the >> future sins of the compiler and linker teams? :) > > This would work, but it would force a store to the stack and an immediate > reload. Which might be OK on this code path. > > But using READ_ONCE() in (I think?) __get_pfnblock_flags_mask() > would likely generate the same code that is produced today. > > word = READ_ONCE(bitmap[word_bitidx]); Ah right, I like that much, much better. The READ_ONCE is placed where it actually clearly matters, rather than several layers removed. > > But I could easily have missed a turn in that cascade of functions. ;-) > > Or there might be some code path that really hates a READ_ONCE() in > that place. I certainly hope not. I see free_one_page(), among other things, calls this. But having the correct READ_ONCE() in a central place seems worth it, unless we know that this will cause a measurable slowdown somewhere. thanks,
On Wed, May 11, 2022 at 05:49:49PM -0700, Paul E. McKenney wrote: > On Wed, May 11, 2022 at 05:34:52PM -0700, John Hubbard wrote: > > On 5/11/22 17:26, Minchan Kim wrote: > > > > > Let me try to say this more clearly: I don't think that the following > > > > > __READ_ONCE() statement can actually help anything, given that > > > > > get_pageblock_migratetype() is non-inlined: > > > > > > > > > > + int __mt = get_pageblock_migratetype(page); > > > > > + int mt = __READ_ONCE(__mt); > > > > > + > > > > > + if (mt & (MIGRATE_CMA | MIGRATE_ISOLATE)) > > > > > + return false; > > > > > > > > > > > > > > > Am I missing anything here? > > > > > > > > In the absence of future aggression from link-time optimizations (LTO), > > > > you are missing nothing. > > > > > > A thing I want to note is Android kernel uses LTO full mode. > > > > Thanks Paul for explaining the state of things. > > > > Minchan, how about something like very close to your original draft, > > then, but with a little note, and the "&" as well: > > > > int __mt = get_pageblock_migratetype(page); > > > > /* > > * Defend against future compiler LTO features, or code refactoring > > * that inlines the above function, by forcing a single read. Because, this > > * routine races with set_pageblock_migratetype(), and we want to avoid > > * reading zero, when actually one or the other flags was set. > > */ > > int mt = __READ_ONCE(__mt); > > > > if (mt & (MIGRATE_CMA | MIGRATE_ISOLATE)) > > return false; > > > > > > ...which should make everyone comfortable and protected from the > > future sins of the compiler and linker teams? :) > > This would work, but it would force a store to the stack and an immediate > reload. Which might be OK on this code path. > > But using READ_ONCE() in (I think?) __get_pfnblock_flags_mask() > would likely generate the same code that is produced today. > > word = READ_ONCE(bitmap[word_bitidx]); > > But I could easily have missed a turn in that cascade of functions. ;-) > > Or there might be some code path that really hates a READ_ONCE() in > that place. My worry about chaning __get_pfnblock_flags_mask is it's called multiple hot places in mm codes so I didn't want to add overhead to them.
On Wed, May 11, 2022 at 05:34:52PM -0700, John Hubbard wrote: > On 5/11/22 17:26, Minchan Kim wrote: > > > > Let me try to say this more clearly: I don't think that the following > > > > __READ_ONCE() statement can actually help anything, given that > > > > get_pageblock_migratetype() is non-inlined: > > > > > > > > + int __mt = get_pageblock_migratetype(page); > > > > + int mt = __READ_ONCE(__mt); > > > > + > > > > + if (mt & (MIGRATE_CMA | MIGRATE_ISOLATE)) > > > > + return false; > > > > > > > > > > > > Am I missing anything here? > > > > > > In the absence of future aggression from link-time optimizations (LTO), > > > you are missing nothing. > > > > A thing I want to note is Android kernel uses LTO full mode. > > Thanks Paul for explaining the state of things. > > Minchan, how about something like very close to your original draft, > then, but with a little note, and the "&" as well: > > int __mt = get_pageblock_migratetype(page); > > /* > * Defend against future compiler LTO features, or code refactoring > * that inlines the above function, by forcing a single read. Because, this > * routine races with set_pageblock_migratetype(), and we want to avoid > * reading zero, when actually one or the other flags was set. > */ > int mt = __READ_ONCE(__mt); > > if (mt & (MIGRATE_CMA | MIGRATE_ISOLATE)) > return false; > > > ...which should make everyone comfortable and protected from the > future sins of the compiler and linker teams? :) Will take. Thanks.
On 5/11/22 18:03, Minchan Kim wrote: >> >> Or there might be some code path that really hates a READ_ONCE() in >> that place. > > My worry about chaning __get_pfnblock_flags_mask is it's called > multiple hot places in mm codes so I didn't want to add overhead > to them. ...unless it really does generate the same code as is already there, right? Let me check that real quick. thanks,
On 5/11/22 18:08, John Hubbard wrote: > On 5/11/22 18:03, Minchan Kim wrote: >>> >>> Or there might be some code path that really hates a READ_ONCE() in >>> that place. >> >> My worry about chaning __get_pfnblock_flags_mask is it's called >> multiple hot places in mm codes so I didn't want to add overhead >> to them. > > ...unless it really does generate the same code as is already there, > right? Let me check that real quick. > It does change the generated code slightly. I don't know if this will affect performance here or not. But just for completeness, here you go: free_one_page() originally has this (just showing the changed parts): mov 0x8(%rdx,%rax,8),%rbx and $0x3f,%ecx shr %cl,%rbx and $0x7, And after applying this diff: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0e42038382c1..df1f8e9a294f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -482,7 +482,7 @@ unsigned long __get_pfnblock_flags_mask(const struct page *page, word_bitidx = bitidx / BITS_PER_LONG; bitidx &= (BITS_PER_LONG-1); - word = bitmap[word_bitidx]; + word = READ_ONCE(bitmap[word_bitidx]); return (word >> bitidx) & mask; } ...it now does an extra memory dereference: lea 0x8(%rdx,%rax,8),%rax and $0x3f,%ecx mov (%rax),%rbx shr %cl,%rbx and $0x7,%ebx thanks,
On Wed, May 11, 2022 at 07:18:56PM -0700, John Hubbard wrote: > On 5/11/22 18:08, John Hubbard wrote: > > On 5/11/22 18:03, Minchan Kim wrote: > > > > > > > > Or there might be some code path that really hates a READ_ONCE() in > > > > that place. > > > > > > My worry about chaning __get_pfnblock_flags_mask is it's called > > > multiple hot places in mm codes so I didn't want to add overhead > > > to them. > > > > ...unless it really does generate the same code as is already there, > > right? Let me check that real quick. > > > > It does change the generated code slightly. I don't know if this will > affect performance here or not. But just for completeness, here you go: > > free_one_page() originally has this (just showing the changed parts): > > mov 0x8(%rdx,%rax,8),%rbx > and $0x3f,%ecx > shr %cl,%rbx > and $0x7, > > > And after applying this diff: > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 0e42038382c1..df1f8e9a294f 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -482,7 +482,7 @@ unsigned long __get_pfnblock_flags_mask(const struct > page *page, > word_bitidx = bitidx / BITS_PER_LONG; > bitidx &= (BITS_PER_LONG-1); > > - word = bitmap[word_bitidx]; > + word = READ_ONCE(bitmap[word_bitidx]); > return (word >> bitidx) & mask; > } > > > ...it now does an extra memory dereference: > > lea 0x8(%rdx,%rax,8),%rax > and $0x3f,%ecx > mov (%rax),%rbx > shr %cl,%rbx > and $0x7,%ebx > Thanks for checking, John. I don't want to have the READ_ONCE in __get_pfnblock_flags_mask atm even though it's an extra memory dereference for specific architecutre and specific compiler unless other callsites *do* need it. We choose the choice(i.e., having __READ_ONCE in is_pinanble_ page) for *potential* cases(e.g., aggressive LTO for future) and if it's an extra memory dereference atm, it would be multiple instructions *potentailly* as having more change or different compiler and/or arches now or later. It's out of scope in this patch.
On Wed, May 11, 2022 at 07:18:56PM -0700, John Hubbard wrote: > On 5/11/22 18:08, John Hubbard wrote: > > On 5/11/22 18:03, Minchan Kim wrote: > > > > > > > > Or there might be some code path that really hates a READ_ONCE() in > > > > that place. > > > > > > My worry about chaning __get_pfnblock_flags_mask is it's called > > > multiple hot places in mm codes so I didn't want to add overhead > > > to them. > > > > ...unless it really does generate the same code as is already there, > > right? Let me check that real quick. > > > > It does change the generated code slightly. I don't know if this will > affect performance here or not. But just for completeness, here you go: > > free_one_page() originally has this (just showing the changed parts): > > mov 0x8(%rdx,%rax,8),%rbx > and $0x3f,%ecx > shr %cl,%rbx > and $0x7, > > > And after applying this diff: > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 0e42038382c1..df1f8e9a294f 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -482,7 +482,7 @@ unsigned long __get_pfnblock_flags_mask(const struct > page *page, > word_bitidx = bitidx / BITS_PER_LONG; > bitidx &= (BITS_PER_LONG-1); > > - word = bitmap[word_bitidx]; > + word = READ_ONCE(bitmap[word_bitidx]); > return (word >> bitidx) & mask; > } > > > ...it now does an extra memory dereference: > > lea 0x8(%rdx,%rax,8),%rax > and $0x3f,%ecx > mov (%rax),%rbx > shr %cl,%rbx > and $0x7,%ebx That could indeed be a bad thing on a fastpath. Thanx, Paul
On 5/11/22 20:44, Minchan Kim wrote: > Thanks for checking, John. > > I don't want to have the READ_ONCE in __get_pfnblock_flags_mask > atm even though it's an extra memory dereference for specific > architecutre and specific compiler unless other callsites *do* > need it. > > We choose the choice(i.e., having __READ_ONCE in is_pinanble_ > page) for *potential* cases(e.g., aggressive LTO for future) > and if it's an extra memory dereference atm, it would be multiple > instructions *potentailly* as having more change or different > compiler and/or arches now or later. It's out of scope in > this patch. Agreed! thanks,
On Wed, May 11, 2022 at 08:44:43PM -0700, Minchan Kim wrote: > On Wed, May 11, 2022 at 07:18:56PM -0700, John Hubbard wrote: > > On 5/11/22 18:08, John Hubbard wrote: > > > On 5/11/22 18:03, Minchan Kim wrote: > > > > > > > > > > Or there might be some code path that really hates a READ_ONCE() in > > > > > that place. > > > > > > > > My worry about chaning __get_pfnblock_flags_mask is it's called > > > > multiple hot places in mm codes so I didn't want to add overhead > > > > to them. > > > > > > ...unless it really does generate the same code as is already there, > > > right? Let me check that real quick. > > > > > > > It does change the generated code slightly. I don't know if this will > > affect performance here or not. But just for completeness, here you go: > > > > free_one_page() originally has this (just showing the changed parts): > > > > mov 0x8(%rdx,%rax,8),%rbx > > and $0x3f,%ecx > > shr %cl,%rbx > > and $0x7, > > > > > > And after applying this diff: > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 0e42038382c1..df1f8e9a294f 100644 > > +++ b/mm/page_alloc.c > > @@ -482,7 +482,7 @@ unsigned long __get_pfnblock_flags_mask(const struct > > page *page, > > word_bitidx = bitidx / BITS_PER_LONG; > > bitidx &= (BITS_PER_LONG-1); > > > > - word = bitmap[word_bitidx]; > > + word = READ_ONCE(bitmap[word_bitidx]); > > return (word >> bitidx) & mask; > > } > > > > > > ...it now does an extra memory dereference: > > > > lea 0x8(%rdx,%rax,8),%rax > > and $0x3f,%ecx > > mov (%rax),%rbx > > shr %cl,%rbx > > and $0x7,%ebx Where is the extra memory reference? 'lea' is not a memory reference, it is just some maths? > Thanks for checking, John. > > I don't want to have the READ_ONCE in __get_pfnblock_flags_mask > atm even though it's an extra memory dereference for specific > architecutre and specific compiler unless other callsites *do* > need it. If a callpath is called under locking or not under locking then I would expect to have two call chains clearly marked what their locking conditions are. ie __get_pfn_block_flags_mask_unlocked() - and obviously clearly document and check what the locking requirements are of the locked path. IMHO putting a READ_ONCE on something that is not a memory load from shared data is nonsense - if a simple == has a stability risk then so does the '(word >> bitidx) & mask'. Jason
On 5/17/22 07:00, Jason Gunthorpe wrote: >>> It does change the generated code slightly. I don't know if this will >>> affect performance here or not. But just for completeness, here you go: >>> >>> free_one_page() originally has this (just showing the changed parts): >>> >>> mov 0x8(%rdx,%rax,8),%rbx >>> and $0x3f,%ecx >>> shr %cl,%rbx >>> and $0x7, >>> >>> >>> And after applying this diff: >>> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>> index 0e42038382c1..df1f8e9a294f 100644 >>> +++ b/mm/page_alloc.c >>> @@ -482,7 +482,7 @@ unsigned long __get_pfnblock_flags_mask(const struct >>> page *page, >>> word_bitidx = bitidx / BITS_PER_LONG; >>> bitidx &= (BITS_PER_LONG-1); >>> >>> - word = bitmap[word_bitidx]; >>> + word = READ_ONCE(bitmap[word_bitidx]); >>> return (word >> bitidx) & mask; >>> } >>> >>> >>> ...it now does an extra memory dereference: >>> >>> lea 0x8(%rdx,%rax,8),%rax >>> and $0x3f,%ecx >>> mov (%rax),%rbx >>> shr %cl,%rbx >>> and $0x7,%ebx > > Where is the extra memory reference? 'lea' is not a memory reference, > it is just some maths? If you compare this to the snippet above, you'll see that there is an extra mov statement, and that one dereferences a pointer from %rax: mov (%rax),%rbx > >> Thanks for checking, John. >> >> I don't want to have the READ_ONCE in __get_pfnblock_flags_mask >> atm even though it's an extra memory dereference for specific >> architecutre and specific compiler unless other callsites *do* >> need it. > > If a callpath is called under locking or not under locking then I > would expect to have two call chains clearly marked what their locking > conditions are. ie __get_pfn_block_flags_mask_unlocked() - and __get_pfn_block_flags_mask_unlocked() would definitely clarify things, and allow some clear documentation, good idea. I haven't checked to see if some code could keep using the normal __get_pfn_block_flags_mask(), but if it could, that would help with the problem of keeping the fast path fast. > obviously clearly document and check what the locking requirements are > of the locked path. > > IMHO putting a READ_ONCE on something that is not a memory load from > shared data is nonsense - if a simple == has a stability risk then so > does the '(word >> bitidx) & mask'. > > Jason Doing something like this: int __x = y(); int x = READ_ONCE(__x); is just awful! I agree. Really, y() should handle any barriers, because otherwise it really does look pointless, and people reading the code need something that is clearer. My first reaction was that this was pointless and wrong, and it turns out that that's only about 80% true: as long as LTO-of-the-future doesn't arrive, and as long as no one refactors y() to be inline. thanks,
On Tue, May 17, 2022 at 11:12:37AM -0700, John Hubbard wrote: > On 5/17/22 07:00, Jason Gunthorpe wrote: > > > > It does change the generated code slightly. I don't know if this will > > > > affect performance here or not. But just for completeness, here you go: > > > > > > > > free_one_page() originally has this (just showing the changed parts): > > > > > > > > mov 0x8(%rdx,%rax,8),%rbx > > > > and $0x3f,%ecx > > > > shr %cl,%rbx > > > > and $0x7, > > > > > > > > > > > > And after applying this diff: > > > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > index 0e42038382c1..df1f8e9a294f 100644 > > > > +++ b/mm/page_alloc.c > > > > @@ -482,7 +482,7 @@ unsigned long __get_pfnblock_flags_mask(const struct > > > > page *page, > > > > word_bitidx = bitidx / BITS_PER_LONG; > > > > bitidx &= (BITS_PER_LONG-1); > > > > > > > > - word = bitmap[word_bitidx]; > > > > + word = READ_ONCE(bitmap[word_bitidx]); > > > > return (word >> bitidx) & mask; > > > > } > > > > > > > > > > > > ...it now does an extra memory dereference: > > > > > > > > lea 0x8(%rdx,%rax,8),%rax > > > > and $0x3f,%ecx > > > > mov (%rax),%rbx > > > > shr %cl,%rbx > > > > and $0x7,%ebx > > > > Where is the extra memory reference? 'lea' is not a memory reference, > > it is just some maths? > > If you compare this to the snippet above, you'll see that there is > an extra mov statement, and that one dereferences a pointer from > %rax: > > mov (%rax),%rbx That is the same move as: mov 0x8(%rdx,%rax,8),%rbx Except that the EA calculation was done in advance and stored in rax. lea isn't a memory reference, it is just computing the pointer value that 0x8(%rdx,%rax,8) represents. ie the lea computes %rax = %rdx + %rax*8 + 6 Which is then fed into the mov. Maybe it is an optimization to allow one pipe to do the shr and an other to the EA - IDK, seems like a random thing for the compiler to do. > > IMHO putting a READ_ONCE on something that is not a memory load from > > shared data is nonsense - if a simple == has a stability risk then so > > does the '(word >> bitidx) & mask'. > > Doing something like this: > > int __x = y(); > int x = READ_ONCE(__x); > > is just awful! I agree. Really, y() should handle any barriers, because > otherwise it really does look pointless, and people reading the code > need something that is clearer. My first reaction was that this was > pointless and wrong, and it turns out that that's only about 80% true: > as long as LTO-of-the-future doesn't arrive, and as long as no one > refactors y() to be inline. It is always pointless, because look at the whole flow: y(): word = bitmap[word_bitidx]; return (word >> bitidx) & mask; int __x = y() int x = READ_ONCE(__x) The requirement here is for a coherent read of bitmap[word_bitidx] that is not mangled by multi-load, load tearing or other theoritcal compiler problems. Stated more clearly if bitmap[] has values {A,B,C} then the result of all this in x should always be {A',B',C'} according to the given maths, never, ever an impossible value D. The nature of the compiler problems we are trying to prevent is that the compiler my take a computation that seems deterministic and corrupt it somehow by making an assumption that the memory is stable when it is not. For instance == may not work right if the compiler decides to do: if (READ_ONCE(a) == 1 && READ_ONCE(a) == 1) This also means that >> and & could malfunction. So when LTO expands and inlines everything and you get this: if (READ_ONCE((bitmap[word_bitidx] >> bitidx) & mask) == MIGRATE_CMA) It is still not safe because the >> and & could have multi-loaded or torn the read in some way that it is no longer a sane calculation. (for instance imagine that the compiler decides to read the value byte-at-a-time without READ_ONCE) ie the memory may have had A racing turning into C but x computed into D not A'. Paul can correct me, but I understand we do not have a list of allowed operations that are exempted from the READ_ONCE() requirement. ie it is not just conditional branching that requires READ_ONCE(). This is why READ_ONCE() must always be on the memory load, because the point is to sanitize away the uncertainty that comes with an unlocked read of unstable memory contents. READ_ONCE() samples the value in memory, and removes all tearing, multiload, etc "instability" that may effect down stream computations. In this way down stream compulations become reliable. Jason
On 5/17/22 12:28, Jason Gunthorpe wrote: >> If you compare this to the snippet above, you'll see that there is >> an extra mov statement, and that one dereferences a pointer from >> %rax: >> >> mov (%rax),%rbx > > That is the same move as: > > mov 0x8(%rdx,%rax,8),%rbx > > Except that the EA calculation was done in advance and stored in rax. > > lea isn't a memory reference, it is just computing the pointer value > that 0x8(%rdx,%rax,8) represents. ie the lea computes > > %rax = %rdx + %rax*8 + 6 > > Which is then fed into the mov. Maybe it is an optimization to allow > one pipe to do the shr and an other to the EA - IDK, seems like a > random thing for the compiler to do. Apologies for getting that wrong, and thanks for walking me through the asm. [...] > > Paul can correct me, but I understand we do not have a list of allowed > operations that are exempted from the READ_ONCE() requirement. ie it > is not just conditional branching that requires READ_ONCE(). > > This is why READ_ONCE() must always be on the memory load, because the > point is to sanitize away the uncertainty that comes with an unlocked > read of unstable memory contents. READ_ONCE() samples the value in > memory, and removes all tearing, multiload, etc "instability" that may > effect down stream computations. In this way down stream compulations > become reliable. > > Jason So then: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0e42038382c1..b404f87e2682 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -482,7 +482,12 @@ unsigned long __get_pfnblock_flags_mask(const struct page *page, word_bitidx = bitidx / BITS_PER_LONG; bitidx &= (BITS_PER_LONG-1); - word = bitmap[word_bitidx]; + /* + * This races, without locks, with set_pageblock_migratetype(). Ensure + * a consistent (non-tearing) read of the memory array, so that results, + * even though racy, are not corrupted. + */ + word = READ_ONCE(bitmap[word_bitidx]); return (word >> bitidx) & mask; } thanks,
On Tue, May 17, 2022 at 01:12:02PM -0700, John Hubbard wrote: > On 5/17/22 12:28, Jason Gunthorpe wrote: > > > If you compare this to the snippet above, you'll see that there is > > > an extra mov statement, and that one dereferences a pointer from > > > %rax: > > > > > > mov (%rax),%rbx > > > > That is the same move as: > > > > mov 0x8(%rdx,%rax,8),%rbx > > > > Except that the EA calculation was done in advance and stored in rax. > > > > lea isn't a memory reference, it is just computing the pointer value > > that 0x8(%rdx,%rax,8) represents. ie the lea computes > > > > %rax = %rdx + %rax*8 + 6 > > > > Which is then fed into the mov. Maybe it is an optimization to allow > > one pipe to do the shr and an other to the EA - IDK, seems like a > > random thing for the compiler to do. Maybe an optimization suppressed due to the volatile nature of the load? If so, perhaps it might be considered a compiler bug. Though it is quite difficult to get optimization bugs involving volatile to be taken seriously. > Apologies for getting that wrong, and thanks for walking me through the > asm. > > [...] > > > > Paul can correct me, but I understand we do not have a list of allowed > > operations that are exempted from the READ_ONCE() requirement. ie it > > is not just conditional branching that requires READ_ONCE(). > > > > This is why READ_ONCE() must always be on the memory load, because the > > point is to sanitize away the uncertainty that comes with an unlocked > > read of unstable memory contents. READ_ONCE() samples the value in > > memory, and removes all tearing, multiload, etc "instability" that may > > effect down stream computations. In this way down stream compulations > > become reliable. > > > > Jason > > So then: Works for me! Acked-by: Paul E. McKenney <paulmck@kernel.org> Thanx, Paul > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 0e42038382c1..b404f87e2682 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -482,7 +482,12 @@ unsigned long __get_pfnblock_flags_mask(const struct page *page, > word_bitidx = bitidx / BITS_PER_LONG; > bitidx &= (BITS_PER_LONG-1); > > - word = bitmap[word_bitidx]; > + /* > + * This races, without locks, with set_pageblock_migratetype(). Ensure > + * a consistent (non-tearing) read of the memory array, so that results, > + * even though racy, are not corrupted. > + */ > + word = READ_ONCE(bitmap[word_bitidx]); > return (word >> bitidx) & mask; > } > > > thanks, > -- > John Hubbard > NVIDIA
On Tue, May 17, 2022 at 01:12:02PM -0700, John Hubbard wrote: > On 5/17/22 12:28, Jason Gunthorpe wrote: > > > If you compare this to the snippet above, you'll see that there is > > > an extra mov statement, and that one dereferences a pointer from > > > %rax: > > > > > > mov (%rax),%rbx > > > > That is the same move as: > > > > mov 0x8(%rdx,%rax,8),%rbx > > > > Except that the EA calculation was done in advance and stored in rax. > > > > lea isn't a memory reference, it is just computing the pointer value > > that 0x8(%rdx,%rax,8) represents. ie the lea computes > > > > %rax = %rdx + %rax*8 + 6 > > > > Which is then fed into the mov. Maybe it is an optimization to allow > > one pipe to do the shr and an other to the EA - IDK, seems like a > > random thing for the compiler to do. > > Apologies for getting that wrong, and thanks for walking me through the > asm. > > [...] > > > > Paul can correct me, but I understand we do not have a list of allowed > > operations that are exempted from the READ_ONCE() requirement. ie it > > is not just conditional branching that requires READ_ONCE(). > > > > This is why READ_ONCE() must always be on the memory load, because the > > point is to sanitize away the uncertainty that comes with an unlocked > > read of unstable memory contents. READ_ONCE() samples the value in > > memory, and removes all tearing, multiload, etc "instability" that may > > effect down stream computations. In this way down stream compulations > > become reliable. > > > > Jason > > So then: > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 0e42038382c1..b404f87e2682 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -482,7 +482,12 @@ unsigned long __get_pfnblock_flags_mask(const struct page *page, > word_bitidx = bitidx / BITS_PER_LONG; > bitidx &= (BITS_PER_LONG-1); > > - word = bitmap[word_bitidx]; > + /* > + * This races, without locks, with set_pageblock_migratetype(). Ensure set_pfnblock_flags_mask would be better? > + * a consistent (non-tearing) read of the memory array, so that results, Thanks for proceeding and suggestion, John. IIUC, the load tearing wouldn't be an issue since [1] fixed the issue. The concern in our dicussion was aggressive compiler(e.g., LTO) or code refactoring to make the code inline in *future* could potentially cause forcing refetching(i.e., re-read) tie bitmap[word_bitidx]. If so, shouldn't the comment be the one you helped before? /* * Defend against future compiler LTO features, or code refactoring * that inlines the above function, by forcing a single read. Because, * re-reads of bitmap[word_bitidx] by inlining could cause trouble * for whom believe they use a local variable for the value. */ [1] e58469bafd05, mm: page_alloc: use word-based accesses for get/set pageblock bitmaps > + * even though racy, are not corrupted. > + */ > + word = READ_ONCE(bitmap[word_bitidx]); > return (word >> bitidx) & mask; > } > > > thanks, > -- > John Hubbard > NVIDIA
On 5/23/22 09:33, Minchan Kim wrote: ... >> So then: >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 0e42038382c1..b404f87e2682 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -482,7 +482,12 @@ unsigned long __get_pfnblock_flags_mask(const struct page *page, >> word_bitidx = bitidx / BITS_PER_LONG; >> bitidx &= (BITS_PER_LONG-1); >> >> - word = bitmap[word_bitidx]; >> + /* >> + * This races, without locks, with set_pageblock_migratetype(). Ensure > > set_pfnblock_flags_mask would be better? > >> + * a consistent (non-tearing) read of the memory array, so that results, > > Thanks for proceeding and suggestion, John. > > IIUC, the load tearing wouldn't be an issue since [1] fixed the issue. Did it? [1] fixed something, but I'm not sure we can claim that that code is now safe against tearing in all possible cases, especially given the recent discussion here. Specifically, having this code do a read, then follow that up with calculations, seems correct. Anything else is sketchy... > > The concern in our dicussion was aggressive compiler(e.g., LTO) or code refactoring > to make the code inline in *future* could potentially cause forcing refetching(i.e., > re-read) tie bitmap[word_bitidx]. > > If so, shouldn't the comment be the one you helped before? Well, maybe updated to something like this? /* * This races, without locks, with set_pageblock_migratetype(). Ensure * a consistent (non-tearing) read of the memory array, so that results, * even though racy, are not corrupted--even if this function is * refactored and/or inlined. */ thanks,
On Mon, May 23, 2022 at 07:55:25PM -0700, John Hubbard wrote: > On 5/23/22 09:33, Minchan Kim wrote: > ... > > > So then: > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > index 0e42038382c1..b404f87e2682 100644 > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -482,7 +482,12 @@ unsigned long __get_pfnblock_flags_mask(const struct page *page, > > > word_bitidx = bitidx / BITS_PER_LONG; > > > bitidx &= (BITS_PER_LONG-1); > > > > > > - word = bitmap[word_bitidx]; > > > + /* > > > + * This races, without locks, with set_pageblock_migratetype(). Ensure > > set_pfnblock_flags_mask would be better? > > > + * a consistent (non-tearing) read of the memory array, so that results, > > > > Thanks for proceeding and suggestion, John. > > > > IIUC, the load tearing wouldn't be an issue since [1] fixed the issue. > > Did it? [1] fixed something, but I'm not sure we can claim that that > code is now safe against tearing in all possible cases, especially given > the recent discussion here. Specifically, having this code do a read, > then follow that up with calculations, seems correct. Anything else is The load tearing you are trying to explain in the comment would be solved by [1] since the bits will always align on a word and accessing word size based on word aligned address is always atomic so there is no load tearing problem IIUC. Instead of the tearing problem, what we are trying to solve with READ_ONCE is to prevent refetching when the function would be inlined in the future. > sketchy... > > > > > The concern in our dicussion was aggressive compiler(e.g., LTO) or code refactoring > > to make the code inline in *future* could potentially cause forcing refetching(i.e., > > re-read) tie bitmap[word_bitidx]. > > > > If so, shouldn't the comment be the one you helped before? > > Well, maybe updated to something like this? > > /* > * This races, without locks, with set_pageblock_migratetype(). Ensure set_pageblock_migratetype is more upper level function so it would be better fit to say set_pfnblock_flags_mask. > * a consistent (non-tearing) read of the memory array, so that results, So tearing problem should't already happen by [1] so I am trying to explain refetching(or re-read) problem in the comment. > * even though racy, are not corrupted--even if this function is The value is already atomic so I don't think it could be corrupted even though it would be inlined in the future. Please correct me if I miss something. > * refactored and/or inlined. > */
On 5/23/22 10:16 PM, Minchan Kim wrote: > On Mon, May 23, 2022 at 07:55:25PM -0700, John Hubbard wrote: >> On 5/23/22 09:33, Minchan Kim wrote: >> ... >>>> So then: >>>> >>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>> index 0e42038382c1..b404f87e2682 100644 >>>> --- a/mm/page_alloc.c >>>> +++ b/mm/page_alloc.c >>>> @@ -482,7 +482,12 @@ unsigned long __get_pfnblock_flags_mask(const struct page *page, >>>> word_bitidx = bitidx / BITS_PER_LONG; >>>> bitidx &= (BITS_PER_LONG-1); >>>> >>>> - word = bitmap[word_bitidx]; >>>> + /* >>>> + * This races, without locks, with set_pageblock_migratetype(). Ensure >>> set_pfnblock_flags_mask would be better? >>>> + * a consistent (non-tearing) read of the memory array, so that results, >>> >>> Thanks for proceeding and suggestion, John. >>> >>> IIUC, the load tearing wouldn't be an issue since [1] fixed the issue. >> >> Did it? [1] fixed something, but I'm not sure we can claim that that >> code is now safe against tearing in all possible cases, especially given >> the recent discussion here. Specifically, having this code do a read, >> then follow that up with calculations, seems correct. Anything else is > > The load tearing you are trying to explain in the comment would be > solved by [1] since the bits will always align on a word and accessing > word size based on word aligned address is always atomic so there is > no load tearing problem IIUC. > > Instead of the tearing problem, what we are trying to solve with > READ_ONCE is to prevent refetching when the function would be > inlined in the future. > I'm perhaps using "tearing" as too broad of a term, maybe just removing the "(non-tearing)" part would fix up the comment. >> sketchy... >> >>> >>> The concern in our dicussion was aggressive compiler(e.g., LTO) or code refactoring >>> to make the code inline in *future* could potentially cause forcing refetching(i.e., >>> re-read) tie bitmap[word_bitidx]. >>> >>> If so, shouldn't the comment be the one you helped before? >> >> Well, maybe updated to something like this? >> >> /* >> * This races, without locks, with set_pageblock_migratetype(). Ensure > > set_pageblock_migratetype is more upper level function so it would > be better fit to say set_pfnblock_flags_mask. OK > >> * a consistent (non-tearing) read of the memory array, so that results, > > So tearing problem should't already happen by [1] so I am trying to > explain refetching(or re-read) problem in the comment. > >> * even though racy, are not corrupted--even if this function is > > The value is already atomic so I don't think it could be corrupted > even though it would be inlined in the future. > > Please correct me if I miss something. > >> * refactored and/or inlined. >> */ > thanks,
On Mon, May 23, 2022 at 10:16:58PM -0700, Minchan Kim wrote: > On Mon, May 23, 2022 at 07:55:25PM -0700, John Hubbard wrote: > > On 5/23/22 09:33, Minchan Kim wrote: > > ... > > > > So then: > > > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > index 0e42038382c1..b404f87e2682 100644 > > > > +++ b/mm/page_alloc.c > > > > @@ -482,7 +482,12 @@ unsigned long __get_pfnblock_flags_mask(const struct page *page, > > > > word_bitidx = bitidx / BITS_PER_LONG; > > > > bitidx &= (BITS_PER_LONG-1); > > > > > > > > - word = bitmap[word_bitidx]; > > > > + /* > > > > + * This races, without locks, with set_pageblock_migratetype(). Ensure > > > set_pfnblock_flags_mask would be better? > > > > + * a consistent (non-tearing) read of the memory array, so that results, > > > > > > Thanks for proceeding and suggestion, John. > > > > > > IIUC, the load tearing wouldn't be an issue since [1] fixed the issue. > > > > Did it? [1] fixed something, but I'm not sure we can claim that that > > code is now safe against tearing in all possible cases, especially given > > the recent discussion here. Specifically, having this code do a read, > > then follow that up with calculations, seems correct. Anything else is > > The load tearing you are trying to explain in the comment would be > solved by [1] since the bits will always align on a word and accessing > word size based on word aligned address is always atomic so there is > no load tearing problem IIUC. That is not technically true. It is exactly the sort of thing READ_ONCE is intended to guard against. > Instead of the tearing problem, what we are trying to solve with > READ_ONCE is to prevent refetching when the function would be > inlined in the future. It is the same problem, who is to say it doesn't refetch while doing the maths? Jason
On Tue, May 24, 2022 at 11:19:37AM -0300, Jason Gunthorpe wrote: > On Mon, May 23, 2022 at 10:16:58PM -0700, Minchan Kim wrote: > > On Mon, May 23, 2022 at 07:55:25PM -0700, John Hubbard wrote: > > > On 5/23/22 09:33, Minchan Kim wrote: > > > ... > > > > > So then: > > > > > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > > index 0e42038382c1..b404f87e2682 100644 > > > > > +++ b/mm/page_alloc.c > > > > > @@ -482,7 +482,12 @@ unsigned long __get_pfnblock_flags_mask(const struct page *page, > > > > > word_bitidx = bitidx / BITS_PER_LONG; > > > > > bitidx &= (BITS_PER_LONG-1); > > > > > > > > > > - word = bitmap[word_bitidx]; > > > > > + /* > > > > > + * This races, without locks, with set_pageblock_migratetype(). Ensure > > > > set_pfnblock_flags_mask would be better? > > > > > + * a consistent (non-tearing) read of the memory array, so that results, > > > > > > > > Thanks for proceeding and suggestion, John. > > > > > > > > IIUC, the load tearing wouldn't be an issue since [1] fixed the issue. > > > > > > Did it? [1] fixed something, but I'm not sure we can claim that that > > > code is now safe against tearing in all possible cases, especially given > > > the recent discussion here. Specifically, having this code do a read, > > > then follow that up with calculations, seems correct. Anything else is > > > > The load tearing you are trying to explain in the comment would be > > solved by [1] since the bits will always align on a word and accessing > > word size based on word aligned address is always atomic so there is > > no load tearing problem IIUC. > > That is not technically true. It is exactly the sort of thing > READ_ONCE is intended to guard against. Oh, does word access based on the aligned address still happen load tearing? I just referred to https://elixir.bootlin.com/linux/latest/source/Documentation/memory-barriers.txt#L1759 > > > Instead of the tearing problem, what we are trying to solve with > > READ_ONCE is to prevent refetching when the function would be > > inlined in the future. > > It is the same problem, who is to say it doesn't refetch while doing > the maths? I didn't say it doesn't refetch the value without the READ_ONCE. What I am saying is READ_ONCE(bitmap_word_bitidx] prevents "refetching" issue rather than "tearing" issue in specific __get_pfnblock_flags_mask context because I though there is no load-tearing issue there since bitmap is word-aligned/accessed. No? If the load tearing would still happens in the bitmap, it would be better to keep last suggestion from John. + /* + * This races, without locks, with set_pfnblock_flags_mask(). Ensure + * a consistent read of the memory array, so that results, even though + * racy, are not corrupted. + */
On Tue, May 24, 2022 at 08:43:27AM -0700, Minchan Kim wrote: > On Tue, May 24, 2022 at 11:19:37AM -0300, Jason Gunthorpe wrote: > > On Mon, May 23, 2022 at 10:16:58PM -0700, Minchan Kim wrote: > > > On Mon, May 23, 2022 at 07:55:25PM -0700, John Hubbard wrote: > > > > On 5/23/22 09:33, Minchan Kim wrote: > > > > ... > > > > > > So then: > > > > > > > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > > > index 0e42038382c1..b404f87e2682 100644 > > > > > > +++ b/mm/page_alloc.c > > > > > > @@ -482,7 +482,12 @@ unsigned long __get_pfnblock_flags_mask(const struct page *page, > > > > > > word_bitidx = bitidx / BITS_PER_LONG; > > > > > > bitidx &= (BITS_PER_LONG-1); > > > > > > > > > > > > - word = bitmap[word_bitidx]; > > > > > > + /* > > > > > > + * This races, without locks, with set_pageblock_migratetype(). Ensure > > > > > set_pfnblock_flags_mask would be better? > > > > > > + * a consistent (non-tearing) read of the memory array, so that results, > > > > > > > > > > Thanks for proceeding and suggestion, John. > > > > > > > > > > IIUC, the load tearing wouldn't be an issue since [1] fixed the issue. > > > > > > > > Did it? [1] fixed something, but I'm not sure we can claim that that > > > > code is now safe against tearing in all possible cases, especially given > > > > the recent discussion here. Specifically, having this code do a read, > > > > then follow that up with calculations, seems correct. Anything else is > > > > > > The load tearing you are trying to explain in the comment would be > > > solved by [1] since the bits will always align on a word and accessing > > > word size based on word aligned address is always atomic so there is > > > no load tearing problem IIUC. > > > > That is not technically true. It is exactly the sort of thing > > READ_ONCE is intended to guard against. > > Oh, does word access based on the aligned address still happen > load tearing? > > I just referred to > https://elixir.bootlin.com/linux/latest/source/Documentation/memory-barriers.txt#L1759 I read that as saying load tearing is technically allowed but doesn't happen in gcc, and so must use the _ONCE macros. > I didn't say it doesn't refetch the value without the READ_ONCE. > > What I am saying is READ_ONCE(bitmap_word_bitidx] prevents "refetching" > issue rather than "tearing" issue in specific __get_pfnblock_flags_mask > context because I though there is no load-tearing issue there since > bitmap is word-aligned/accessed. No? It does both. AFAIK our memory model has no guarentees on what naked C statements will do. Tearing, multi-load, etc - it is all technically permitted. Use the proper accessors. Jason
On Tue, May 24, 2022 at 12:48:31PM -0300, Jason Gunthorpe wrote: > On Tue, May 24, 2022 at 08:43:27AM -0700, Minchan Kim wrote: > > On Tue, May 24, 2022 at 11:19:37AM -0300, Jason Gunthorpe wrote: > > > On Mon, May 23, 2022 at 10:16:58PM -0700, Minchan Kim wrote: > > > > On Mon, May 23, 2022 at 07:55:25PM -0700, John Hubbard wrote: > > > > > On 5/23/22 09:33, Minchan Kim wrote: > > > > > ... > > > > > > > So then: > > > > > > > > > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > > > > index 0e42038382c1..b404f87e2682 100644 > > > > > > > +++ b/mm/page_alloc.c > > > > > > > @@ -482,7 +482,12 @@ unsigned long __get_pfnblock_flags_mask(const struct page *page, > > > > > > > word_bitidx = bitidx / BITS_PER_LONG; > > > > > > > bitidx &= (BITS_PER_LONG-1); > > > > > > > > > > > > > > - word = bitmap[word_bitidx]; > > > > > > > + /* > > > > > > > + * This races, without locks, with set_pageblock_migratetype(). Ensure > > > > > > set_pfnblock_flags_mask would be better? > > > > > > > + * a consistent (non-tearing) read of the memory array, so that results, > > > > > > > > > > > > Thanks for proceeding and suggestion, John. > > > > > > > > > > > > IIUC, the load tearing wouldn't be an issue since [1] fixed the issue. > > > > > > > > > > Did it? [1] fixed something, but I'm not sure we can claim that that > > > > > code is now safe against tearing in all possible cases, especially given > > > > > the recent discussion here. Specifically, having this code do a read, > > > > > then follow that up with calculations, seems correct. Anything else is > > > > > > > > The load tearing you are trying to explain in the comment would be > > > > solved by [1] since the bits will always align on a word and accessing > > > > word size based on word aligned address is always atomic so there is > > > > no load tearing problem IIUC. > > > > > > That is not technically true. It is exactly the sort of thing > > > READ_ONCE is intended to guard against. > > > > Oh, does word access based on the aligned address still happen > > load tearing? > > > > I just referred to > > https://elixir.bootlin.com/linux/latest/source/Documentation/memory-barriers.txt#L1759 > > I read that as saying load tearing is technically allowed but doesn't > happen in gcc, and so must use the _ONCE macros. This is in fact the intent, except... And as that passage goes on to state, there really are compilers (such as GCC) that tear stores of constants to machine aligned/sized locations. In short, use of the _ONCE() macros can save you a lot of pain. > > I didn't say it doesn't refetch the value without the READ_ONCE. > > > > What I am saying is READ_ONCE(bitmap_word_bitidx] prevents "refetching" > > issue rather than "tearing" issue in specific __get_pfnblock_flags_mask > > context because I though there is no load-tearing issue there since > > bitmap is word-aligned/accessed. No? > > It does both. AFAIK our memory model has no guarentees on what naked C > statements will do. Tearing, multi-load, etc - it is all technically > permitted. Use the proper accessors. I am with Jason on this one. In fact, I believe that any naked C-language access to mutable shared variables should have a comment stating why the compiler cannot mangle that access. Thanx, Paul
On Tue, May 24, 2022 at 09:37:28AM -0700, Paul E. McKenney wrote: > On Tue, May 24, 2022 at 12:48:31PM -0300, Jason Gunthorpe wrote: > > On Tue, May 24, 2022 at 08:43:27AM -0700, Minchan Kim wrote: > > > On Tue, May 24, 2022 at 11:19:37AM -0300, Jason Gunthorpe wrote: > > > > On Mon, May 23, 2022 at 10:16:58PM -0700, Minchan Kim wrote: > > > > > On Mon, May 23, 2022 at 07:55:25PM -0700, John Hubbard wrote: > > > > > > On 5/23/22 09:33, Minchan Kim wrote: > > > > > > ... > > > > > > > > So then: > > > > > > > > > > > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > > > > > index 0e42038382c1..b404f87e2682 100644 > > > > > > > > +++ b/mm/page_alloc.c > > > > > > > > @@ -482,7 +482,12 @@ unsigned long __get_pfnblock_flags_mask(const struct page *page, > > > > > > > > word_bitidx = bitidx / BITS_PER_LONG; > > > > > > > > bitidx &= (BITS_PER_LONG-1); > > > > > > > > > > > > > > > > - word = bitmap[word_bitidx]; > > > > > > > > + /* > > > > > > > > + * This races, without locks, with set_pageblock_migratetype(). Ensure > > > > > > > set_pfnblock_flags_mask would be better? > > > > > > > > + * a consistent (non-tearing) read of the memory array, so that results, > > > > > > > > > > > > > > Thanks for proceeding and suggestion, John. > > > > > > > > > > > > > > IIUC, the load tearing wouldn't be an issue since [1] fixed the issue. > > > > > > > > > > > > Did it? [1] fixed something, but I'm not sure we can claim that that > > > > > > code is now safe against tearing in all possible cases, especially given > > > > > > the recent discussion here. Specifically, having this code do a read, > > > > > > then follow that up with calculations, seems correct. Anything else is > > > > > > > > > > The load tearing you are trying to explain in the comment would be > > > > > solved by [1] since the bits will always align on a word and accessing > > > > > word size based on word aligned address is always atomic so there is > > > > > no load tearing problem IIUC. > > > > > > > > That is not technically true. It is exactly the sort of thing > > > > READ_ONCE is intended to guard against. > > > > > > Oh, does word access based on the aligned address still happen > > > load tearing? > > > > > > I just referred to > > > https://elixir.bootlin.com/linux/latest/source/Documentation/memory-barriers.txt#L1759 > > > > I read that as saying load tearing is technically allowed but doesn't > > happen in gcc, and so must use the _ONCE macros. > > This is in fact the intent, except... > > And as that passage goes on to state, there really are compilers (such > as GCC) that tear stores of constants to machine aligned/sized locations. > > In short, use of the _ONCE() macros can save you a lot of pain. Thanks for the correction, Jason and Paul > > > > I didn't say it doesn't refetch the value without the READ_ONCE. > > > > > > What I am saying is READ_ONCE(bitmap_word_bitidx] prevents "refetching" > > > issue rather than "tearing" issue in specific __get_pfnblock_flags_mask > > > context because I though there is no load-tearing issue there since > > > bitmap is word-aligned/accessed. No? > > > > It does both. AFAIK our memory model has no guarentees on what naked C > > statements will do. Tearing, multi-load, etc - it is all technically > > permitted. Use the proper accessors. Seems like there was some misunderstanding here. I didn't mean not to use READ_ONCE for the bitmap but wanted to have more concrete comment. Since you guys corrected "even though word-alinged access could be wrong without READ_ONCE", I would keep the comment John suggested. > > I am with Jason on this one. > > In fact, I believe that any naked C-language access to mutable shared > variables should have a comment stating why the compiler cannot mangle > that access. Agreed. Thanks!
diff --git a/include/linux/mm.h b/include/linux/mm.h index 6acca5cecbc5..cbf79eb790e0 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1625,8 +1625,19 @@ static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma, #ifdef CONFIG_MIGRATION static inline bool is_pinnable_page(struct page *page) { - return !(is_zone_movable_page(page) || is_migrate_cma_page(page)) || - is_zero_pfn(page_to_pfn(page)); +#ifdef CONFIG_CMA + /* + * use volatile to use local variable mt instead of + * refetching mt value. + */ + int __mt = get_pageblock_migratetype(page); + int mt = __READ_ONCE(__mt); + + if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE) + return false; +#endif + + return !(is_zone_movable_page(page) || is_zero_pfn(page_to_pfn(page))); } #else static inline bool is_pinnable_page(struct page *page)
Pages on CMA area could have MIGRATE_ISOLATE as well as MIGRATE_CMA so current is_pinnable_page could miss CMA pages which has MIGRATE_ ISOLATE. It ends up pinning CMA pages as longterm at pin_user_pages APIs so CMA allocation keep failed until the pin is released. CPU 0 CPU 1 - Task B cma_alloc alloc_contig_range pin_user_pages_fast(FOLL_LONGTERM) change pageblock as MIGRATE_ISOLATE internal_get_user_pages_fast lockless_pages_from_mm gup_pte_range try_grab_folio is_pinnable_page return true; So, pinned the page successfully. page migration failure with pinned page .. .. After 30 sec unpin_user_page(page) CMA allocation succeeded after 30 sec. The CMA allocation path protects the migration type change race using zone->lock but what GUP path need to know is just whether the page is on CMA area or not rather than exact migration type. Thus, we don't need zone->lock but just checks migration type in either of (MIGRATE_ISOLATE and MIGRATE_CMA). Adding the MIGRATE_ISOLATE check in is_pinnable_page could cause rejecting of pinning pages on MIGRATE_ISOLATE pageblocks even though it's neither CMA nor movable zone if the page is temporarily unmovable. However, such a migration failure by unexpected temporal refcount holding is general issue, not only come from MIGRATE_ISOLATE and the MIGRATE_ISOLATE is also transient state like other temporal elevated refcount problem. Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Minchan Kim <minchan@kernel.org> --- * from v3 - https://lore.kernel.org/all/20220509153430.4125710-1-minchan@kernel.org/ * Fix typo and adding more description - akpm * from v2 - https://lore.kernel.org/all/20220505064429.2818496-1-minchan@kernel.org/ * Use __READ_ONCE instead of volatile - akpm * from v1 - https://lore.kernel.org/all/20220502173558.2510641-1-minchan@kernel.org/ * fix build warning - lkp * fix refetching issue of migration type * add side effect on !ZONE_MOVABLE and !MIGRATE_CMA in description - david include/linux/mm.h | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)