From patchwork Fri Aug 29 03:26:51 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junxiao Bi X-Patchwork-Id: 4808601 Return-Path: X-Original-To: patchwork-ocfs2-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 980D0C0338 for ; Fri, 29 Aug 2014 03:28:33 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 9B3A12010E for ; Fri, 29 Aug 2014 03:28:32 +0000 (UTC) Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7DC8A2010B for ; Fri, 29 Aug 2014 03:28:31 +0000 (UTC) Received: from acsinet21.oracle.com (acsinet21.oracle.com [141.146.126.237]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id s7T3RpDZ012249 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 29 Aug 2014 03:27:51 GMT Received: from oss.oracle.com (oss-external.oracle.com [137.254.96.51]) by acsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s7T3RhAg018141 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 29 Aug 2014 03:27:44 GMT Received: from localhost ([127.0.0.1] helo=oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1XNCql-0006yq-JC; Thu, 28 Aug 2014 20:27:43 -0700 Received: from ucsinet21.oracle.com ([156.151.31.93]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1XNCqA-0006y8-Ui for ocfs2-devel@oss.oracle.com; Thu, 28 Aug 2014 20:27:07 -0700 Received: from userz7022.oracle.com (userz7022.oracle.com [156.151.31.86]) by ucsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s7T3R65j008369 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 29 Aug 2014 03:27:06 GMT Received: from abhmp0020.oracle.com (abhmp0020.oracle.com [141.146.116.26]) by userz7022.oracle.com (8.14.5+Sun/8.14.4) with ESMTP id s7T3R5sh004830; Fri, 29 Aug 2014 03:27:05 GMT Received: from [10.182.39.153] (/10.182.39.153) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 28 Aug 2014 20:27:04 -0700 Message-ID: <53FFF2FB.20706@oracle.com> Date: Fri, 29 Aug 2014 11:26:51 +0800 From: Junxiao Bi User-Agent: Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: xuejiufei@huawei.com, Andrew Morton , ocfs2-devel@oss.oracle.com References: <53F41CAE.2040204@huawei.com> <53F6FFB7.1090305@huawei.com> <53FA9673.20205@oracle.com> <53FEE543.10407@huawei.com> In-Reply-To: <53FEE543.10407@huawei.com> Subject: Re: [Ocfs2-devel] A deadlock when system do not has sufficient memory X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: acsinet21.oracle.com [141.146.126.237] X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 08/28/2014 04:16 PM, Xue jiufei wrote: > Hi Junxiao, > On 2014/8/25 9:50, Junxiao Bi wrote: >> Hi Jiufei, >> >> Maybe you can consider using PF_FSTRANS flag, set this flag before >> allocating memory with GFP_KERNEL flag and unset after the allocation. >> Checking this flag in ocfs2 when trying to free some pages during memory >> direct reclaim. See an example from upstream commit >> 5cf02d09b50b1ee1c2d536c9cf64af5a7d433f56 (nfs: skip commit in >> releasepage if we're freeing memory for fs-related reasons) . >> >> Thanks, >> Junxiao. >> > Thank you very much for your suggestion. But in our situation, > o2net_wq is evicting inode during memory direct reclaim, which cannot > return error or do nothing because vfs would destroy_inode after evict, > but we haven't drop inode lock yet. How about checking the flag in vfs like this? And you can set PF_FSTRANS flag in o2net_wq context where GFP_NOFS flag can't be set. commit 8d27fdec5ce234d2f02e4582d340d231396b92af Author: Junxiao Bi Date: Fri Aug 29 11:05:25 2014 +0800 super: stop shrinker for processes with PF_FSTRANS flag For some cluster fs, like ocfs2, it may be impossible to set GFP_NOFS for some memory allocation, as the allocation is in network common code, like sock_alloc() and in this case, the shrinker will call back into the fs and cause deadlock when available memory is not enough. Signed-off-by: Junxiao Bi Thanks, Junxiao. > > Thanks > Xuejiufei > >> On 08/22/2014 04:30 PM, Xue jiufei wrote: >>> On 2014/8/20 11:57, Xue jiufei wrote: >>>> Hi all, >>>> We found there may exist a deadlock when system has not sufficient >>>> memory. Here's the situation: >>>> N1 N2 >>>> send message to N1 >>>> o2net_wq(kworker) >>>> receiving message and call corresponding >>>> handler to handle this message. It may >>>> need to alloc some memory(use GFP_NOFS or GFP_KERNEL). >>>> but there's no sufficient memory, lower then >>>> min watermark. So it wakeup kswapd to reclaim memory >>>> and itself may also call >>>> __alloc_pages_direct_reclaim(), trying to >>>> free some pages. >>>> >>>> It tries to free ocfs2 inode >>>> cache and calls ocfs2_drop_lock()->dlmunlock() >>>> to drop inode lock, sending unlock message to master, >>>> say N2. When reply comes, queue sc_rx_work and >>>> wait o2net_wq to handle this work. however >>>> o2net_wq is still handling last message, so can not >>>> process the reply message. It will wait >>>> o2net_nsw_completed() in o2net_send_message_vec() >>>> forever. >>>> Kswapd thread enconter the same situation. >>>> >>>> >>>> So is there any advice to solve this deadlock? >>>> And what is the probability that kmalloc return ENOMEM when use GFP_ATOMIC flag? >>>> >>>> Thanks. >>>> >>> To avoid this deadlock, we want to alloc memory with flag GFP_ATOMIC >>> in all handlers and return ENOMEM to peer when failed. The peer will >>> try to resend the message again, o2net_wq can handle other messages. >>> However, it can not solve all problems. For example, if o2net_wq is >>> processing sc_connect_work which would call sock_alloc_inode() to alloc >>> socket_alloc with GFP_KERNEL flag when memory is insufficient and enter >>> reclaim progress, it also trigger the deadlock. We can not change this >>> alloc flag. >>> We have no idea about it. Is there any better ideas. >>> Thanks very much. >>> xuejiufei >>>> _______________________________________________ >>>> Ocfs2-devel mailing list >>>> Ocfs2-devel@oss.oracle.com >>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >>>> >>> >>> >>> >>> _______________________________________________ >>> Ocfs2-devel mailing list >>> Ocfs2-devel@oss.oracle.com >>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >>> >> >> . >> > > diff --git a/fs/super.c b/fs/super.c index b9a214d..c4a8dc1 100644 --- a/fs/super.c +++ b/fs/super.c @@ -71,6 +71,9 @@ static unsigned long super_cache_scan(struct shrinker *shrink, if (!(sc->gfp_mask & __GFP_FS)) return SHRINK_STOP; + if (current->flags & PF_FSTRANS) + return SHRINK_STOP; + if (!grab_super_passive(sb)) return SHRINK_STOP;