From patchwork Fri Aug 29 03:26:51 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Junxiao Bi <junxiao.bi@oracle.com>
X-Patchwork-Id: 4808601
Return-Path: <ocfs2-devel-bounces@oss.oracle.com>
X-Original-To: patchwork-ocfs2-devel@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.19.201])
	by patchwork2.web.kernel.org (Postfix) with ESMTP id 980D0C0338
	for <patchwork-ocfs2-devel@patchwork.kernel.org>;
	Fri, 29 Aug 2014 03:28:33 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 9B3A12010E
	for <patchwork-ocfs2-devel@patchwork.kernel.org>;
	Fri, 29 Aug 2014 03:28:32 +0000 (UTC)
Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 7DC8A2010B
	for <patchwork-ocfs2-devel@patchwork.kernel.org>;
	Fri, 29 Aug 2014 03:28:31 +0000 (UTC)
Received: from acsinet21.oracle.com (acsinet21.oracle.com [141.146.126.237])
	by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2)
	with ESMTP id s7T3RpDZ012249
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
	Fri, 29 Aug 2014 03:27:51 GMT
Received: from oss.oracle.com (oss-external.oracle.com [137.254.96.51])
	by acsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id
	s7T3RhAg018141
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 29 Aug 2014 03:27:44 GMT
Received: from localhost ([127.0.0.1] helo=oss.oracle.com)
	by oss.oracle.com with esmtp (Exim 4.63)
	(envelope-from <ocfs2-devel-bounces@oss.oracle.com>)
	id 1XNCql-0006yq-JC; Thu, 28 Aug 2014 20:27:43 -0700
Received: from ucsinet21.oracle.com ([156.151.31.93])
	by oss.oracle.com with esmtp (Exim 4.63)
	(envelope-from <junxiao.bi@oracle.com>) id 1XNCqA-0006y8-Ui
	for ocfs2-devel@oss.oracle.com; Thu, 28 Aug 2014 20:27:07 -0700
Received: from userz7022.oracle.com (userz7022.oracle.com [156.151.31.86])
	by ucsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id
	s7T3R65j008369
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 29 Aug 2014 03:27:06 GMT
Received: from abhmp0020.oracle.com (abhmp0020.oracle.com [141.146.116.26])
	by userz7022.oracle.com (8.14.5+Sun/8.14.4) with ESMTP id
	s7T3R5sh004830; Fri, 29 Aug 2014 03:27:05 GMT
Received: from [10.182.39.153] (/10.182.39.153)
	by default (Oracle Beehive Gateway v4.0)
	with ESMTP ; Thu, 28 Aug 2014 20:27:04 -0700
Message-ID: <53FFF2FB.20706@oracle.com>
Date: Fri, 29 Aug 2014 11:26:51 +0800
From: Junxiao Bi <junxiao.bi@oracle.com>
User-Agent: Mozilla/5.0 (X11; Linux i686;
	rv:24.0) Gecko/20100101 Thunderbird/24.6.0
MIME-Version: 1.0
To: xuejiufei@huawei.com, Andrew Morton <akpm@linux-foundation.org>,
	ocfs2-devel@oss.oracle.com
References: <53F41CAE.2040204@huawei.com> <53F6FFB7.1090305@huawei.com>
	<53FA9673.20205@oracle.com> <53FEE543.10407@huawei.com>
In-Reply-To: <53FEE543.10407@huawei.com>
Subject: Re: [Ocfs2-devel] A deadlock when system do not has sufficient
	memory
X-BeenThere: ocfs2-devel@oss.oracle.com
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: <ocfs2-devel.oss.oracle.com>
List-Unsubscribe: <https://oss.oracle.com/mailman/listinfo/ocfs2-devel>,
	<mailto:ocfs2-devel-request@oss.oracle.com?subject=unsubscribe>
List-Archive: <http://oss.oracle.com/pipermail/ocfs2-devel>
List-Post: <mailto:ocfs2-devel@oss.oracle.com>
List-Help: <mailto:ocfs2-devel-request@oss.oracle.com?subject=help>
List-Subscribe: <https://oss.oracle.com/mailman/listinfo/ocfs2-devel>,
	<mailto:ocfs2-devel-request@oss.oracle.com?subject=subscribe>
Sender: ocfs2-devel-bounces@oss.oracle.com
Errors-To: ocfs2-devel-bounces@oss.oracle.com
X-Source-IP: acsinet21.oracle.com [141.146.126.237]
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED,
	RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

On 08/28/2014 04:16 PM, Xue jiufei wrote:
> Hi Junxiao,
> On 2014/8/25 9:50, Junxiao Bi wrote:
>> Hi Jiufei,
>>
>> Maybe you can consider using PF_FSTRANS flag, set this flag before
>> allocating memory with GFP_KERNEL flag and unset after the allocation.
>> Checking this flag in ocfs2 when trying to free some pages during memory
>> direct reclaim. See an example from upstream commit
>> 5cf02d09b50b1ee1c2d536c9cf64af5a7d433f56 (nfs: skip commit in
>> releasepage if we're freeing memory for fs-related reasons) .
>>
>> Thanks,
>> Junxiao.
>>
> Thank you very much for your suggestion. But in our situation,
> o2net_wq is evicting inode during memory direct reclaim, which cannot
> return error or do nothing because vfs would destroy_inode after evict,
> but we haven't drop inode lock yet.
How about checking the flag in vfs like this? And you can set PF_FSTRANS
flag in o2net_wq context where GFP_NOFS flag can't be set.


commit 8d27fdec5ce234d2f02e4582d340d231396b92af
Author: Junxiao Bi <junxiao.bi@oracle.com>
Date:   Fri Aug 29 11:05:25 2014 +0800

    super: stop shrinker for processes with PF_FSTRANS flag

    For some cluster fs, like ocfs2, it may be impossible to
    set GFP_NOFS for some memory allocation, as the allocation
    is in network common code, like sock_alloc() and in this
    case, the shrinker will call back into the fs and cause
    deadlock when available memory is not enough.

    Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>


Thanks,
Junxiao.

> 
> Thanks
> Xuejiufei
> 
>> On 08/22/2014 04:30 PM, Xue jiufei wrote:
>>> On 2014/8/20 11:57, Xue jiufei wrote:
>>>> Hi all,
>>>> We found there may exist a deadlock when system has not sufficient
>>>> memory. Here's the situation:
>>>>             N1                                      N2
>>>>                                              send message to N1
>>>>       o2net_wq(kworker)
>>>> receiving message and call corresponding
>>>> handler to handle this message. It may 
>>>> need to alloc some memory(use GFP_NOFS or GFP_KERNEL).
>>>> but there's no sufficient memory, lower then
>>>> min watermark. So it wakeup kswapd to reclaim memory
>>>> and itself may also call
>>>> __alloc_pages_direct_reclaim(), trying to
>>>> free some pages.
>>>>
>>>> It tries to free ocfs2 inode
>>>> cache and calls ocfs2_drop_lock()->dlmunlock()
>>>> to drop inode lock, sending unlock message to master,
>>>> say N2. When reply comes, queue sc_rx_work and
>>>> wait o2net_wq to handle this work. however
>>>> o2net_wq is still handling last message, so can not 
>>>> process the reply message. It will wait
>>>> o2net_nsw_completed() in o2net_send_message_vec()
>>>> forever. 
>>>> Kswapd thread enconter the same situation.
>>>>
>>>>
>>>> So is there any advice to solve this deadlock?
>>>> And what is the probability that kmalloc return ENOMEM when use GFP_ATOMIC flag?
>>>>
>>>> Thanks.
>>>>
>>> To avoid this deadlock, we want to alloc memory with flag GFP_ATOMIC
>>> in all handlers and return ENOMEM to peer when failed. The peer will
>>> try to resend the message again, o2net_wq can handle other messages.
>>> However, it can not solve all problems. For example, if o2net_wq is
>>> processing sc_connect_work which would call sock_alloc_inode() to alloc
>>> socket_alloc with GFP_KERNEL flag when memory is insufficient and enter
>>> reclaim progress, it also trigger the deadlock. We can not change this
>>> alloc flag.
>>> We have no idea about it. Is there any better ideas. 
>>> Thanks very much.
>>> xuejiufei
>>>> _______________________________________________
>>>> Ocfs2-devel mailing list
>>>> Ocfs2-devel@oss.oracle.com
>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Ocfs2-devel mailing list
>>> Ocfs2-devel@oss.oracle.com
>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>
>>
>> .
>>
> 
>

diff --git a/fs/super.c b/fs/super.c
index b9a214d..c4a8dc1 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -71,6 +71,9 @@ static unsigned long super_cache_scan(struct shrinker
*shrink,
        if (!(sc->gfp_mask & __GFP_FS))
                return SHRINK_STOP;

+       if (current->flags & PF_FSTRANS)
+               return SHRINK_STOP;
+
        if (!grab_super_passive(sb))
                return SHRINK_STOP;