Message ID | 16eca862-5fa6-2333-8a81-94a2c2692758@i-love.sakura.ne.jp (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu 31-05-18 19:10:48, Tetsuo Handa wrote: > On 2018/05/30 8:07, Andrew Morton wrote: > > On Tue, 29 May 2018 09:17:41 +0200 Michal Hocko <mhocko@kernel.org> wrote: > > > >>> I suggest applying > >>> this patch first, and then fix "mm, oom: cgroup-aware OOM killer" patch. > >> > >> Well, I hope the whole pile gets merged in the upcoming merge window > >> rather than stall even more. > > > > I'm more inclined to drop it all. David has identified significant > > shortcomings and I'm not seeing a way of addressing those shortcomings > > in a backward-compatible fashion. Therefore there is no way forward > > at present. > > > > Can we apply my patch as-is first? No. As already explained before. Sprinkling new sleeps without a strong reason is not acceptable. The issue you are seeing is pretty artificial and as such doesn're really warrant an immediate fix. We should rather go with a well thought trhough fix. In other words we should simply drop the sleep inside the oom_lock for starter unless it causes some really unexpected behavior change.
On 2018/05/31 19:44, Michal Hocko wrote: > On Thu 31-05-18 19:10:48, Tetsuo Handa wrote: >> On 2018/05/30 8:07, Andrew Morton wrote: >>> On Tue, 29 May 2018 09:17:41 +0200 Michal Hocko <mhocko@kernel.org> wrote: >>> >>>>> I suggest applying >>>>> this patch first, and then fix "mm, oom: cgroup-aware OOM killer" patch. >>>> >>>> Well, I hope the whole pile gets merged in the upcoming merge window >>>> rather than stall even more. >>> >>> I'm more inclined to drop it all. David has identified significant >>> shortcomings and I'm not seeing a way of addressing those shortcomings >>> in a backward-compatible fashion. Therefore there is no way forward >>> at present. >>> >> >> Can we apply my patch as-is first? > > No. As already explained before. Sprinkling new sleeps without a strong > reason is not acceptable. The issue you are seeing is pretty artificial > and as such doesn're really warrant an immediate fix. We should rather > go with a well thought trhough fix. In other words we should simply drop > the sleep inside the oom_lock for starter unless it causes some really > unexpected behavior change. > The OOM killer did not require schedule_timeout_killable(1) to return as long as the OOM victim can call __mmput(). But now the OOM killer requires schedule_timeout_killable(1) to return in order to allow the OOM victim to call __oom_reap_task_mm(). Thus, this is a regression. Artificial cannot become the reason to postpone my patch. If we don't care artificialness/maliciousness, we won't need to care Spectre/Meltdown bugs. I'm not sprinkling new sleeps. I'm just merging existing sleeps (i.e. mutex_trylock() case and !mutex_trylock() case) and updating the outdated comments.
On Fri 01-06-18 00:23:57, Tetsuo Handa wrote: > On 2018/05/31 19:44, Michal Hocko wrote: > > On Thu 31-05-18 19:10:48, Tetsuo Handa wrote: > >> On 2018/05/30 8:07, Andrew Morton wrote: > >>> On Tue, 29 May 2018 09:17:41 +0200 Michal Hocko <mhocko@kernel.org> wrote: > >>> > >>>>> I suggest applying > >>>>> this patch first, and then fix "mm, oom: cgroup-aware OOM killer" patch. > >>>> > >>>> Well, I hope the whole pile gets merged in the upcoming merge window > >>>> rather than stall even more. > >>> > >>> I'm more inclined to drop it all. David has identified significant > >>> shortcomings and I'm not seeing a way of addressing those shortcomings > >>> in a backward-compatible fashion. Therefore there is no way forward > >>> at present. > >>> > >> > >> Can we apply my patch as-is first? > > > > No. As already explained before. Sprinkling new sleeps without a strong > > reason is not acceptable. The issue you are seeing is pretty artificial > > and as such doesn're really warrant an immediate fix. We should rather > > go with a well thought trhough fix. In other words we should simply drop > > the sleep inside the oom_lock for starter unless it causes some really > > unexpected behavior change. > > > > The OOM killer did not require schedule_timeout_killable(1) to return > as long as the OOM victim can call __mmput(). But now the OOM killer > requires schedule_timeout_killable(1) to return in order to allow the > OOM victim to call __oom_reap_task_mm(). Thus, this is a regression. > > Artificial cannot become the reason to postpone my patch. If we don't care > artificialness/maliciousness, we won't need to care Spectre/Meltdown bugs. > > I'm not sprinkling new sleeps. I'm just merging existing sleeps (i.e. > mutex_trylock() case and !mutex_trylock() case) and updating the outdated > comments. Sigh. So what exactly is wrong with going simple and do http://lkml.kernel.org/r/20180528124313.GC27180@dhcp22.suse.cz ?
diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 8ba6cb8..23ce67f 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -479,6 +479,21 @@ bool process_shares_mm(struct task_struct *p, struct mm_struct *mm) static struct task_struct *oom_reaper_list; static DEFINE_SPINLOCK(oom_reaper_lock); +/* + * We have to make sure not to cause premature new oom victim selection. + * + * __alloc_pages_may_oom() oom_reap_task_mm()/exit_mmap() + * mutex_trylock(&oom_lock) + * get_page_from_freelist(ALLOC_WMARK_HIGH) # fails + * unmap_page_range() # frees some memory + * set_bit(MMF_OOM_SKIP) + * out_of_memory() + * select_bad_process() + * test_bit(MMF_OOM_SKIP) # selects new oom victim + * mutex_unlock(&oom_lock) + * + * Therefore, the callers hold oom_lock when calling this function. + */ void __oom_reap_task_mm(struct mm_struct *mm) { struct vm_area_struct *vma; @@ -523,20 +538,6 @@ static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) { bool ret = true; - /* - * We have to make sure to not race with the victim exit path - * and cause premature new oom victim selection: - * oom_reap_task_mm exit_mm - * mmget_not_zero - * mmput - * atomic_dec_and_test - * exit_oom_victim - * [...] - * out_of_memory - * select_bad_process - * # no TIF_MEMDIE task selects new victim - * unmap_page_range # frees some memory - */ mutex_lock(&oom_lock); if (!down_read_trylock(&mm->mmap_sem)) { @@ -1077,15 +1078,9 @@ bool out_of_memory(struct oom_control *oc) dump_header(oc, NULL); panic("Out of memory and no killable processes...\n"); } - if (oc->chosen && oc->chosen != (void *)-1UL) { + if (oc->chosen && oc->chosen != (void *)-1UL) oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" : "Memory cgroup out of memory"); - /* - * Give the killed process a good chance to exit before trying - * to allocate memory again. - */ - schedule_timeout_killable(1); - } return !!oc->chosen; } @@ -1111,4 +1106,5 @@ void pagefault_out_of_memory(void) return; out_of_memory(&oc); mutex_unlock(&oom_lock); + schedule_timeout_killable(1); } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 905db9d..458ed32 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3478,7 +3478,6 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) */ if (!mutex_trylock(&oom_lock)) { *did_some_progress = 1; - schedule_timeout_uninterruptible(1); return NULL; } @@ -4241,6 +4240,12 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) /* Retry as long as the OOM killer is making progress */ if (did_some_progress) { no_progress_loops = 0; + /* + * This schedule_timeout_*() serves as a guaranteed sleep for + * PF_WQ_WORKER threads when __zone_watermark_ok() == false. + */ + if (!tsk_is_oom_victim(current)) + schedule_timeout_uninterruptible(1); goto retry; }