From patchwork Tue Mar 3 00:26:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11416913 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C2FAE1580 for ; Tue, 3 Mar 2020 00:27:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 84A912467E for ; Tue, 3 Mar 2020 00:27:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="X+hMW2YZ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 84A912467E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C255E6B0005; Mon, 2 Mar 2020 19:27:02 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BD5906B0006; Mon, 2 Mar 2020 19:27:02 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AC4C26B0007; Mon, 2 Mar 2020 19:27:02 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0119.hostedemail.com [216.40.44.119]) by kanga.kvack.org (Postfix) with ESMTP id 969116B0005 for ; Mon, 2 Mar 2020 19:27:02 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 676B0181AC9BF for ; Tue, 3 Mar 2020 00:27:02 +0000 (UTC) X-FDA: 76552161084.27.spade78_20536ea5a5248 X-Spam-Summary: 2,0,0,e7833a39f1b4a91f,d41d8cd98f00b204,minchan.kim@gmail.com,,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1311:1314:1345:1437:1515:1535:1544:1711:1730:1747:1777:1792:1801:2393:2553:2559:2562:2898:3138:3139:3140:3141:3142:3355:3622:3865:3866:3867:3868:3871:3872:3874:4118:4321:4605:5007:6261:6653:7875:7903:8957:9121:9149:10004:11026:11233:11473:11658:11914:12043:12295:12296:12297:12438:12517:12519:12555:12679:12895:13894:14096:14181:14394:14721:21080:21324:21444:21451:21627:21987:21990:30012:30034:30054:30070:30090,0,RBL:209.85.210.194:@gmail.com:.lbl8.mailshell.net-62.50.0.100 66.100.201.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: spade78_20536ea5a5248 X-Filterd-Recvd-Size: 7838 Received: from mail-pf1-f194.google.com (mail-pf1-f194.google.com [209.85.210.194]) by imf29.hostedemail.com (Postfix) with ESMTP for ; Tue, 3 Mar 2020 00:27:01 +0000 (UTC) Received: by mail-pf1-f194.google.com with SMTP id i13so539117pfe.3 for ; Mon, 02 Mar 2020 16:27:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=g5Htb+vUh56qOP6Vzd600P/JN9cTq1qV0kCCiKMtZAw=; b=X+hMW2YZrc5aBlegIh/3GJcg+WZv1aSukyhUlcHo8SY1JSMz/xzfd8J6UYvrD7ZqNL feG/jJH7mW9KZCz7yX+sN9ohUCyJYKoEkIh00jk0T6iFT0lU/8rMSnlzcOeR0njB4Q3B hElZJ56miJLCVyrwVHu+eOXJ6tJPAUvbtGuTsyUycsST6WjlJHT0ugInKSrOIcJsl5vK qgfy/BpFaJSCeJpj3OobAZOuY6mxuLeCM9JHlrY/t7mrm4Lc9W0q9yyJsdKYfUnnDefr G87efnqRRVa3yk8MWsDiac9yDB1yawk76wHyx7QlBPj1FZk8rvdaVQw3j8EDoWE4F+0b J4DA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :mime-version:content-transfer-encoding; bh=g5Htb+vUh56qOP6Vzd600P/JN9cTq1qV0kCCiKMtZAw=; b=K8jz6uU4ON0gklL4JE7zQAUiwPu3dFU+RL7fB0aEsjxzXcM60rUrOKLBFwgVcOrgtu lrchoUzj66AZUR5ni34CY1lNNgnsVA9I4JqxG/DD/zyzJLq2xMrVaeM5FeN50B3yEk6F Z2zV7buDMTXPg6optn/W/BVPUuxOULmmal7Np80GBVcx3siyekkRDJoRhI5rfJs4WoyK /ovWVIHFKQ2OvAMgpROLMfu7KJW0I5DGa3xSypeEJvqfDC4ZINMi1U7fRo5c0tK6p/MF iRjFBx/WP9c60VvkSDMAcVc6b9POsan6U33Wh01VQXg10hYQYrobyDnrJAdCBqdLDaYq Zm3A== X-Gm-Message-State: ANhLgQ26dGLHsXuM87kTC0FQG6FIA9qfCobaOuC1x5EpV/bYo+TOwDKN H9gjFEY1k/WoB78Gp+NCJWc= X-Google-Smtp-Source: ADFU+vvQVAs9yxW36AF8DBR2anF9ES7U7tLkO8900GlD5Lolz8tsul0heWb402DGS9r41xaVuz6NvA== X-Received: by 2002:a63:e50a:: with SMTP id r10mr1375010pgh.27.1583195220576; Mon, 02 Mar 2020 16:27:00 -0800 (PST) Received: from bbox-1.mtv.corp.google.com ([2620:15c:211:1:3e01:2939:5992:52da]) by smtp.gmail.com with ESMTPSA id x11sm21990313pfn.53.2020.03.02.16.26.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Mar 2020 16:26:59 -0800 (PST) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , Jan Kara , Matthew Wilcox , Josef Bacik , Johannes Weiner , Minchan Kim Subject: [PATCH v2] mm: fix long time stall from mm_populate Date: Mon, 2 Mar 2020 16:26:38 -0800 Message-Id: <20200303002638.206421-1-minchan@kernel.org> X-Mailer: git-send-email 2.25.0.265.gbab2e86ba0-goog MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Basically, fault handler releases mmap_sem before requesting readahead and then it is supposed to retry lookup the page from page cache with FAULT_FLAG_TRIED so that it avoids the live lock of infinite retry. However, what happens if the fault handler find a page from page cache and the page has readahead marker but are waiting under writeback? Plus one more condition, it happens under mm_populate which repeats faulting unless it encounters error. So let's assemble conditions below. CPU 1 CPU 2 - first loop mm_populate for () .. ret = populate_vma_page_range __get_user_pages faultin_page handle_mm_fault filemap_fault do_async_mmap_readahead if (PageReadahead(pageA)) maybe_unlock_mmap_for_io up_read(mmap_sem) shrink_page_list pageout SetPageReclaim(=SetPageReadahead)(pageA) writepage SetPageWriteback(pageA) page_cache_async_readahead() ClearPageReadahead(pageA) do_async_mmap_readahead lock_page_maybe_drop_mmap goto out_retry the pageA is reclaimed and new pageB is populated to the file offset and finally has become PG_readahead - second loop __get_user_pages faultin_page handle_mm_fault filemap_fault do_async_mmap_readahead if (PageReadahead(pageB)) maybe_unlock_mmap_for_io up_read(mmap_sem) shrink_page_list pageout SetPageReclaim(=SetPageReadahead)(pageB) writepage SetPageWriteback(pageB) page_cache_async_readahead() ClearPageReadahead(pageB) do_async_mmap_readahead lock_page_maybe_drop_mmap goto out_retry It could be repeated forever so it's livelock. Without involving reclaim, it could happens if ra_pages become zero by fadvise/other threads who have same fd one doing randome while the other one is sequential because page_cache_async_readahead has following condition check like PageWriteback and ra_pages are never synchrnized with fadvise and shrink_readahead_size_eio from other threads. page_cache_async_readahead(struct address_space *mapping, unsigned long req_size) { /* no read-ahead */ if (!ra->ra_pages) return; Thus, we need to limit fault retry from mm_populate like page fault handler. Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations") Reviewed-by: Jan Kara Signed-off-by: Minchan Kim --- mm/gup.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 1b521e0ac1de..6f6548c63ad5 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1133,7 +1133,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk, * * This takes care of mlocking the pages too if VM_LOCKED is set. * - * return 0 on success, negative error code on error. + * return number of pages pinned on success, negative error code on error. * * vma->vm_mm->mmap_sem must be held. * @@ -1196,6 +1196,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors) struct vm_area_struct *vma = NULL; int locked = 0; long ret = 0; + bool tried = false; end = start + len; @@ -1226,14 +1227,18 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors) * double checks the vma flags, so that it won't mlock pages * if the vma was already munlocked. */ - ret = populate_vma_page_range(vma, nstart, nend, &locked); + ret = populate_vma_page_range(vma, nstart, nend, + tried ? NULL : &locked); if (ret < 0) { if (ignore_errors) { ret = 0; continue; /* continue at next VMA */ } break; - } + } else if (ret == 0) + tried = true; + else + tried = false; nend = nstart + ret * PAGE_SIZE; ret = 0; }