Message ID | 20210913115125.33617-1-linmiaohe@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] mm/page_isolation: fix potential missing call to unset_migratetype_isolate() | expand |
On Mon 13-09-21 19:51:25, Miaohe Lin wrote: > In start_isolate_page_range() undo path, pfn_to_online_page() just checks > the first pfn in a pageblock while __first_valid_page() will traverse the > pageblock until the first online pfn is found. So we may miss the call to > unset_migratetype_isolate() in undo path and pages will remain isolated > unexpectedly. Fix this by calling undo_isolate_page_range() and this will > also help to simplify the code further. I like the clean up part but is this a real problem that requires CC stable? Have you ever seen this to be a real problem? It looks like something based on reading the code. > Fixes: 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> > Cc: <stable@vger.kernel.org> > --- > v1->v2: > Simplify the code further per David Hildenbrand. > --- > mm/page_isolation.c | 20 +++----------------- > 1 file changed, 3 insertions(+), 17 deletions(-) > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index a95c2c6562d0..f93cc63d8fa1 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -183,7 +183,6 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, > unsigned migratetype, int flags) > { > unsigned long pfn; > - unsigned long undo_pfn; > struct page *page; > > BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages)); > @@ -193,25 +192,12 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, > pfn < end_pfn; > pfn += pageblock_nr_pages) { > page = __first_valid_page(pfn, pageblock_nr_pages); > - if (page) { > - if (set_migratetype_isolate(page, migratetype, flags)) { > - undo_pfn = pfn; > - goto undo; > - } > + if (page && set_migratetype_isolate(page, migratetype, flags)) { > + undo_isolate_page_range(start_pfn, pfn, migratetype); > + return -EBUSY; > } > } > return 0; > -undo: > - for (pfn = start_pfn; > - pfn < undo_pfn; > - pfn += pageblock_nr_pages) { > - struct page *page = pfn_to_online_page(pfn); > - if (!page) > - continue; > - unset_migratetype_isolate(page, migratetype); > - } > - > - return -EBUSY; > } > > /* > -- > 2.23.0
On 13.09.21 14:12, Michal Hocko wrote: > On Mon 13-09-21 19:51:25, Miaohe Lin wrote: >> In start_isolate_page_range() undo path, pfn_to_online_page() just checks >> the first pfn in a pageblock while __first_valid_page() will traverse the >> pageblock until the first online pfn is found. So we may miss the call to >> unset_migratetype_isolate() in undo path and pages will remain isolated >> unexpectedly. Fix this by calling undo_isolate_page_range() and this will >> also help to simplify the code further. > > I like the clean up part but is this a real problem that requires CC > stable? Have you ever seen this to be a real problem? It looks like > something based on reading the code. We discussed that it isn't an issue anymore (we never call it on memory holes), but might have been an issue on older kernels, back when we didn't have the "memory holes" check in the memory offlining path in place. Agreed, these details belong into this description. > >> Fixes: 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >> Cc: <stable@vger.kernel.org> >> --- >> v1->v2: >> Simplify the code further per David Hildenbrand. >> --- >> mm/page_isolation.c | 20 +++----------------- >> 1 file changed, 3 insertions(+), 17 deletions(-) >> >> diff --git a/mm/page_isolation.c b/mm/page_isolation.c >> index a95c2c6562d0..f93cc63d8fa1 100644 >> --- a/mm/page_isolation.c >> +++ b/mm/page_isolation.c >> @@ -183,7 +183,6 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, >> unsigned migratetype, int flags) >> { >> unsigned long pfn; >> - unsigned long undo_pfn; >> struct page *page; >> >> BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages)); >> @@ -193,25 +192,12 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, >> pfn < end_pfn; >> pfn += pageblock_nr_pages) { >> page = __first_valid_page(pfn, pageblock_nr_pages); >> - if (page) { >> - if (set_migratetype_isolate(page, migratetype, flags)) { >> - undo_pfn = pfn; >> - goto undo; >> - } >> + if (page && set_migratetype_isolate(page, migratetype, flags)) { >> + undo_isolate_page_range(start_pfn, pfn, migratetype); >> + return -EBUSY; >> } >> } >> return 0; >> -undo: >> - for (pfn = start_pfn; >> - pfn < undo_pfn; >> - pfn += pageblock_nr_pages) { >> - struct page *page = pfn_to_online_page(pfn); >> - if (!page) >> - continue; >> - unset_migratetype_isolate(page, migratetype); >> - } >> - >> - return -EBUSY; >> } >> >> /* >> -- >> 2.23.0 >
On 2021/9/13 20:20, David Hildenbrand wrote: > On 13.09.21 14:12, Michal Hocko wrote: >> On Mon 13-09-21 19:51:25, Miaohe Lin wrote: >>> In start_isolate_page_range() undo path, pfn_to_online_page() just checks >>> the first pfn in a pageblock while __first_valid_page() will traverse the >>> pageblock until the first online pfn is found. So we may miss the call to >>> unset_migratetype_isolate() in undo path and pages will remain isolated >>> unexpectedly. Fix this by calling undo_isolate_page_range() and this will >>> also help to simplify the code further. >> >> I like the clean up part but is this a real problem that requires CC >> stable? Have you ever seen this to be a real problem? It looks like >> something based on reading the code. I'm sorry but I haven't seen this to be a real problem. It's a theoretical bug. > > We discussed that it isn't an issue anymore (we never call it on memory holes), but might have been an issue on older kernels, back when we didn't have the "memory holes" check in the memory offlining path in place. So is the Cc:stable needed in this case? Many thanks for both of you. > > Agreed, these details belong into this description. > >> >>> Fixes: 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") >>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >>> Cc: <stable@vger.kernel.org> >>> --- >>> v1->v2: >>> Simplify the code further per David Hildenbrand. >>> --- >>> mm/page_isolation.c | 20 +++----------------- >>> 1 file changed, 3 insertions(+), 17 deletions(-) >>> >>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c >>> index a95c2c6562d0..f93cc63d8fa1 100644 >>> --- a/mm/page_isolation.c >>> +++ b/mm/page_isolation.c >>> @@ -183,7 +183,6 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, >>> unsigned migratetype, int flags) >>> { >>> unsigned long pfn; >>> - unsigned long undo_pfn; >>> struct page *page; >>> BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages)); >>> @@ -193,25 +192,12 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, >>> pfn < end_pfn; >>> pfn += pageblock_nr_pages) { >>> page = __first_valid_page(pfn, pageblock_nr_pages); >>> - if (page) { >>> - if (set_migratetype_isolate(page, migratetype, flags)) { >>> - undo_pfn = pfn; >>> - goto undo; >>> - } >>> + if (page && set_migratetype_isolate(page, migratetype, flags)) { >>> + undo_isolate_page_range(start_pfn, pfn, migratetype); >>> + return -EBUSY; >>> } >>> } >>> return 0; >>> -undo: >>> - for (pfn = start_pfn; >>> - pfn < undo_pfn; >>> - pfn += pageblock_nr_pages) { >>> - struct page *page = pfn_to_online_page(pfn); >>> - if (!page) >>> - continue; >>> - unset_migratetype_isolate(page, migratetype); >>> - } >>> - >>> - return -EBUSY; >>> } >>> /* >>> -- >>> 2.23.0 >> > >
On Mon 13-09-21 20:43:35, Miaohe Lin wrote: > On 2021/9/13 20:20, David Hildenbrand wrote: > > On 13.09.21 14:12, Michal Hocko wrote: > >> On Mon 13-09-21 19:51:25, Miaohe Lin wrote: > >>> In start_isolate_page_range() undo path, pfn_to_online_page() just checks > >>> the first pfn in a pageblock while __first_valid_page() will traverse the > >>> pageblock until the first online pfn is found. So we may miss the call to > >>> unset_migratetype_isolate() in undo path and pages will remain isolated > >>> unexpectedly. Fix this by calling undo_isolate_page_range() and this will > >>> also help to simplify the code further. > >> > >> I like the clean up part but is this a real problem that requires CC > >> stable? Have you ever seen this to be a real problem? It looks like > >> something based on reading the code. > > I'm sorry but I haven't seen this to be a real problem. It's a theoretical bug. Make it clear in the changelog > > We discussed that it isn't an issue anymore (we never call it on > > memory holes), but might have been an issue on older kernels, back > > when we didn't have the "memory holes" check in the memory offlining > > path in place. > > So is the Cc:stable needed in this case? I do not think so. Even if this was happening in the practice then the practical consequences would be pretty minor, right? (few pageblocks stay isolated thus unavailable). I do realize that the stable tree is in a hoarding mode for quite some years but my general approach has been (in line with the documentation) to mark and backport only fixes that really do matter.
On Mon, 13 Sep 2021 14:59:42 +0200 Michal Hocko <mhocko@suse.com> wrote: > I do realize that the stable tree is in a hoarding mode for quite some > years but my general approach has been (in line with the documentation) > to mark and backport only fixes that really do matter. Me2. There has to be some risk-vs-reward test to be passed...
On 2021/9/13 20:59, Michal Hocko wrote: > On Mon 13-09-21 20:43:35, Miaohe Lin wrote: >> On 2021/9/13 20:20, David Hildenbrand wrote: >>> On 13.09.21 14:12, Michal Hocko wrote: >>>> On Mon 13-09-21 19:51:25, Miaohe Lin wrote: >>>>> In start_isolate_page_range() undo path, pfn_to_online_page() just checks >>>>> the first pfn in a pageblock while __first_valid_page() will traverse the >>>>> pageblock until the first online pfn is found. So we may miss the call to >>>>> unset_migratetype_isolate() in undo path and pages will remain isolated >>>>> unexpectedly. Fix this by calling undo_isolate_page_range() and this will >>>>> also help to simplify the code further. >>>> >>>> I like the clean up part but is this a real problem that requires CC >>>> stable? Have you ever seen this to be a real problem? It looks like >>>> something based on reading the code. >> >> I'm sorry but I haven't seen this to be a real problem. It's a theoretical bug. > > Make it clear in the changelog Will do. > >>> We discussed that it isn't an issue anymore (we never call it on >>> memory holes), but might have been an issue on older kernels, back >>> when we didn't have the "memory holes" check in the memory offlining >>> path in place. >> >> So is the Cc:stable needed in this case? > > I do not think so. Even if this was happening in the practice then the > practical consequences would be pretty minor, right? (few pageblocks > stay isolated thus unavailable). > > I do realize that the stable tree is in a hoarding mode for quite some > years but my general approach has been (in line with the documentation) > to mark and backport only fixes that really do matter. So even the Fixes tag should be removed ? Many thanks. >
On 2021/9/14 10:51, Andrew Morton wrote: > On Mon, 13 Sep 2021 14:59:42 +0200 Michal Hocko <mhocko@suse.com> wrote: > >> I do realize that the stable tree is in a hoarding mode for quite some >> years but my general approach has been (in line with the documentation) >> to mark and backport only fixes that really do matter. > > Me2. There has to be some risk-vs-reward test to be passed... Will keep this in mind when I try to cc stable. Thanks. > . >
On Tue 14-09-21 11:09:47, Miaohe Lin wrote:
[...]
> So even the Fixes tag should be removed ?
I would keep that one there. Fixes tag is useful to frame the scope of
the fix. For example when somebody is backporting the commit mentioned
in the Fixes tag then a) a lot of follow up patches with Fixes can tell
you this won't be an easy ride and you might want to reconsider risks
vs. benefit b) it helps to collect follow up fixes more easily.
That is a different story from cc: stable which just collects patches
and push them to all consumers of the stable branch if they apply.
To conclude, the Fixes tag is a generaly useful tag to bind patches
together and let people evaluate how important that is while Cc stable
is an indication that a fix is serious enough to push to all stable
users.
On 2021/9/14 15:06, Michal Hocko wrote: > On Tue 14-09-21 11:09:47, Miaohe Lin wrote: > [...] >> So even the Fixes tag should be removed ? > > I would keep that one there. Fixes tag is useful to frame the scope of > the fix. For example when somebody is backporting the commit mentioned > in the Fixes tag then a) a lot of follow up patches with Fixes can tell > you this won't be an easy ride and you might want to reconsider risks > vs. benefit b) it helps to collect follow up fixes more easily. > > That is a different story from cc: stable which just collects patches > and push them to all consumers of the stable branch if they apply. > > To conclude, the Fixes tag is a generaly useful tag to bind patches > together and let people evaluate how important that is while Cc stable > is an indication that a fix is serious enough to push to all stable > users. > I see. Many thanks for your detailed explanation! :)
diff --git a/mm/page_isolation.c b/mm/page_isolation.c index a95c2c6562d0..f93cc63d8fa1 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -183,7 +183,6 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, unsigned migratetype, int flags) { unsigned long pfn; - unsigned long undo_pfn; struct page *page; BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages)); @@ -193,25 +192,12 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, pfn < end_pfn; pfn += pageblock_nr_pages) { page = __first_valid_page(pfn, pageblock_nr_pages); - if (page) { - if (set_migratetype_isolate(page, migratetype, flags)) { - undo_pfn = pfn; - goto undo; - } + if (page && set_migratetype_isolate(page, migratetype, flags)) { + undo_isolate_page_range(start_pfn, pfn, migratetype); + return -EBUSY; } } return 0; -undo: - for (pfn = start_pfn; - pfn < undo_pfn; - pfn += pageblock_nr_pages) { - struct page *page = pfn_to_online_page(pfn); - if (!page) - continue; - unset_migratetype_isolate(page, migratetype); - } - - return -EBUSY; } /*
In start_isolate_page_range() undo path, pfn_to_online_page() just checks the first pfn in a pageblock while __first_valid_page() will traverse the pageblock until the first online pfn is found. So we may miss the call to unset_migratetype_isolate() in undo path and pages will remain isolated unexpectedly. Fix this by calling undo_isolate_page_range() and this will also help to simplify the code further. Fixes: 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: <stable@vger.kernel.org> --- v1->v2: Simplify the code further per David Hildenbrand. --- mm/page_isolation.c | 20 +++----------------- 1 file changed, 3 insertions(+), 17 deletions(-)