Message ID | alpine.LSU.2.11.2006101342250.4607@eggly.anvils (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm, page_alloc: capture page in task context only | expand |
On Wed, Jun 10, 2020 at 01:48:59PM -0700, Hugh Dickins wrote: > While stressing compaction, one run oopsed on NULL capc->cc in > __free_one_page()'s task_capc(zone): compact_zone_order() had been > interrupted, and a page was being freed in the return from interrupt. > > Though you would not expect it from the source, both gccs I was using > (a 4.8.1 and a 7.5.0) had chosen to compile compact_zone_order() with > the ".cc = &cc" implemented by mov %rbx,-0xb0(%rbp) immediately before > callq compact_zone - long after the "current->capture_control = &capc". > An interrupt in between those finds capc->cc NULL (zeroed by an earlier > rep stos). > > This could presumably be fixed by a barrier() before setting > current->capture_control in compact_zone_order(); but would also need > more care on return from compact_zone(), in order not to risk leaking > a page captured by interrupt just before capture_control is reset. > > Maybe that is the preferable fix, but I felt safer for task_capc() to > exclude the rather surprising possibility of capture at interrupt time. > > Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction") > Cc: stable@vger.kernel.org # 5.1+ > Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Mel Gorman <mgorman@techsingularity.net>
On 6/10/20 10:48 PM, Hugh Dickins wrote: > While stressing compaction, one run oopsed on NULL capc->cc in > __free_one_page()'s task_capc(zone): compact_zone_order() had been > interrupted, and a page was being freed in the return from interrupt. > > Though you would not expect it from the source, both gccs I was using > (a 4.8.1 and a 7.5.0) had chosen to compile compact_zone_order() with > the ".cc = &cc" implemented by mov %rbx,-0xb0(%rbp) immediately before > callq compact_zone - long after the "current->capture_control = &capc". > An interrupt in between those finds capc->cc NULL (zeroed by an earlier > rep stos). Ugh, nasty. Same here with gcc 10. > This could presumably be fixed by a barrier() before setting > current->capture_control in compact_zone_order(); but would also need > more care on return from compact_zone(), in order not to risk leaking > a page captured by interrupt just before capture_control is reset. I was hoping a WRITE_ONCE(current->capture_control) would be enough, but apparently it's not (I tried). > Maybe that is the preferable fix, but I felt safer for task_capc() to > exclude the rather surprising possibility of capture at interrupt time. > Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction") > Cc: stable@vger.kernel.org # 5.1+ > Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> But perhaps I would also make sure that we don't expose the half initialized capture_control and run into this problem again later. It's not like this is a fast path where barriers hurt. Something like this then? (with added comments) diff --git a/mm/compaction.c b/mm/compaction.c index fd988b7e5f2b..c89e26817278 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2316,15 +2316,17 @@ static enum compact_result compact_zone_order(struct zone *zone, int order, .page = NULL, }; - current->capture_control = &capc; + barrier(); + + WRITE_ONCE(current->capture_control, &capc); ret = compact_zone(&cc, &capc); VM_BUG_ON(!list_empty(&cc.freepages)); VM_BUG_ON(!list_empty(&cc.migratepages)); - *capture = capc.page; - current->capture_control = NULL; + WRITE_ONCE(current->capture_control, NULL); + *capture = READ_ONCE(capc.page); return ret; }
On Fri, 12 Jun 2020, Vlastimil Babka wrote: > On 6/10/20 10:48 PM, Hugh Dickins wrote: > > While stressing compaction, one run oopsed on NULL capc->cc in > > __free_one_page()'s task_capc(zone): compact_zone_order() had been > > interrupted, and a page was being freed in the return from interrupt. > > > > Though you would not expect it from the source, both gccs I was using > > (a 4.8.1 and a 7.5.0) had chosen to compile compact_zone_order() with > > the ".cc = &cc" implemented by mov %rbx,-0xb0(%rbp) immediately before > > callq compact_zone - long after the "current->capture_control = &capc". > > An interrupt in between those finds capc->cc NULL (zeroed by an earlier > > rep stos). > > Ugh, nasty. Same here with gcc 10. Thanks for checking, nice to know that I'm in good company :) > > > This could presumably be fixed by a barrier() before setting > > current->capture_control in compact_zone_order(); but would also need > > more care on return from compact_zone(), in order not to risk leaking > > a page captured by interrupt just before capture_control is reset. > > I was hoping a WRITE_ONCE(current->capture_control) would be enough, > but apparently it's not (I tried). Right, I don't think volatiles themselves actually constitute barriers; but I'd better keep quiet, I notice the READ_ONCE/WRITE_ONCE/data_race industry has been busy recently, and I'm likely out-of-date and mistaken. > > > Maybe that is the preferable fix, but I felt safer for task_capc() to > > exclude the rather surprising possibility of capture at interrupt time. > > > Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction") > > Cc: stable@vger.kernel.org # 5.1+ > > Signed-off-by: Hugh Dickins <hughd@google.com> > > Acked-by: Vlastimil Babka <vbabka@suse.cz> Thanks, and to Mel for his. > > But perhaps I would also make sure that we don't expose the half initialized > capture_control and run into this problem again later. It's not like this is a > fast path where barriers hurt. Something like this then? (with added comments) Would it be very rude if I leave that to you and to Mel? to add, or to replace mine if you wish - go ahead. I can easily see that more sophistication at the compact_zone_order() end may be preferable to another test and branch inside __free_one_page() (and would task_capc() be better with an "unlikely" in it?). But it seems unnecessary to have a fix at both ends, and I'm rather too wound up in other things at the moment, to want to read up on the current state of such barriers, and sign off on the Vlastipatch below myself (but I do notice that READ_ONCE seems to have more in it today than I remember, which probably accounts for why you did not put the barrier() I expected to see on the way out). Hugh > > diff --git a/mm/compaction.c b/mm/compaction.c > index fd988b7e5f2b..c89e26817278 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -2316,15 +2316,17 @@ static enum compact_result compact_zone_order(struct zone *zone, int order, > .page = NULL, > }; > > - current->capture_control = &capc; > + barrier(); > + > + WRITE_ONCE(current->capture_control, &capc); > > ret = compact_zone(&cc, &capc); > > VM_BUG_ON(!list_empty(&cc.freepages)); > VM_BUG_ON(!list_empty(&cc.migratepages)); > > - *capture = capc.page; > - current->capture_control = NULL; > + WRITE_ONCE(current->capture_control, NULL); > + *capture = READ_ONCE(capc.page); > > return ret; > }
On 6/15/20 11:03 PM, Hugh Dickins wrote: > On Fri, 12 Jun 2020, Vlastimil Babka wrote: >> > This could presumably be fixed by a barrier() before setting >> > current->capture_control in compact_zone_order(); but would also need >> > more care on return from compact_zone(), in order not to risk leaking >> > a page captured by interrupt just before capture_control is reset. >> >> I was hoping a WRITE_ONCE(current->capture_control) would be enough, >> but apparently it's not (I tried). > > Right, I don't think volatiles themselves actually constitute barriers; > but I'd better keep quiet, I notice the READ_ONCE/WRITE_ONCE/data_race > industry has been busy recently, and I'm likely out-of-date and mistaken. Same here, but from what I've read, volatiles should enforce order against other volatiles, but not non-volatiles (which is the struct initialization). So barrier() is indeed necessary, and WRITE_ONCE just to prevent (very hypothetical, hopefully) store tearing. >> >> > Maybe that is the preferable fix, but I felt safer for task_capc() to >> > exclude the rather surprising possibility of capture at interrupt time. >> >> > Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction") >> > Cc: stable@vger.kernel.org # 5.1+ >> > Signed-off-by: Hugh Dickins <hughd@google.com> >> >> Acked-by: Vlastimil Babka <vbabka@suse.cz> > > Thanks, and to Mel for his. > >> >> But perhaps I would also make sure that we don't expose the half initialized >> capture_control and run into this problem again later. It's not like this is a >> fast path where barriers hurt. Something like this then? (with added comments) > > Would it be very rude if I leave that to you and to Mel? to add, or No problem. > to replace mine if you wish - go ahead. I can easily see that more > sophistication at the compact_zone_order() end may be preferable to > another test and branch inside __free_one_page() Right, I think so, and will also generally sleep better if we don't put pointers to unitialized structures to current. > (and would task_capc() > be better with an "unlikely" in it?). I'll try and see if it generates better code. We should be also able to remove the "capc->cc->direct_compaction" check, as the only place where we set capc is compact_zone_order() which sets direct_compaction true unconditionally. > But it seems unnecessary to have a fix at both ends, and I'm rather too > wound up in other things at the moment, to want to read up on the current > state of such barriers, and sign off on the Vlastipatch below myself (but > I do notice that READ_ONCE seems to have more in it today than I remember, > which probably accounts for why you did not put the barrier() I expected > to see on the way out). Right, minimally it's a volatile cast (I've checked 5.1 too, for stable reasons) which should be enough. So I'll send the proper patch. Thanks! Vlastimil > Hugh > >> >> diff --git a/mm/compaction.c b/mm/compaction.c >> index fd988b7e5f2b..c89e26817278 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -2316,15 +2316,17 @@ static enum compact_result compact_zone_order(struct zone *zone, int order, >> .page = NULL, >> }; >> >> - current->capture_control = &capc; >> + barrier(); >> + >> + WRITE_ONCE(current->capture_control, &capc); >> >> ret = compact_zone(&cc, &capc); >> >> VM_BUG_ON(!list_empty(&cc.freepages)); >> VM_BUG_ON(!list_empty(&cc.migratepages)); >> >> - *capture = capc.page; >> - current->capture_control = NULL; >> + WRITE_ONCE(current->capture_control, NULL); >> + *capture = READ_ONCE(capc.page); >> >> return ret; >> } >
--- 5.8-rc0/mm/page_alloc.c 2020-06-08 14:38:47.298625588 -0700 +++ linux/mm/page_alloc.c 2020-06-10 12:12:34.982950441 -0700 @@ -814,6 +814,7 @@ static inline struct capture_control *ta struct capture_control *capc = current->capture_control; return capc && + in_task() && !(current->flags & PF_KTHREAD) && !capc->page && capc->cc->zone == zone &&
While stressing compaction, one run oopsed on NULL capc->cc in __free_one_page()'s task_capc(zone): compact_zone_order() had been interrupted, and a page was being freed in the return from interrupt. Though you would not expect it from the source, both gccs I was using (a 4.8.1 and a 7.5.0) had chosen to compile compact_zone_order() with the ".cc = &cc" implemented by mov %rbx,-0xb0(%rbp) immediately before callq compact_zone - long after the "current->capture_control = &capc". An interrupt in between those finds capc->cc NULL (zeroed by an earlier rep stos). This could presumably be fixed by a barrier() before setting current->capture_control in compact_zone_order(); but would also need more care on return from compact_zone(), in order not to risk leaking a page captured by interrupt just before capture_control is reset. Maybe that is the preferable fix, but I felt safer for task_capc() to exclude the rather surprising possibility of capture at interrupt time. Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction") Cc: stable@vger.kernel.org # 5.1+ Signed-off-by: Hugh Dickins <hughd@google.com> --- mm/page_alloc.c | 1 + 1 file changed, 1 insertion(+)