From patchwork Sat Jan 6 02:46:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gang He X-Patchwork-Id: 10147563 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 2A81860155 for ; Sat, 6 Jan 2018 02:46:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C93ED288AD for ; Sat, 6 Jan 2018 02:46:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BDFFF288BD; Sat, 6 Jan 2018 02:46:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID autolearn=unavailable version=3.3.1 Received: from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85]) (using TLSv1.2 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 22FD5288AD for ; Sat, 6 Jan 2018 02:46:54 +0000 (UTC) Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.21/8.16.0.21) with SMTP id w062gF2M155513; Sat, 6 Jan 2018 02:46:18 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=message-id : date : from : to : references : in-reply-to : mime-version : cc : subject : list-id : list-unsubscribe : list-archive : list-post : list-help : list-subscribe : content-type : content-transfer-encoding : sender; s=corp-2017-10-26; bh=crRQGtksvYHnbIzNZ0NYYiHU0k3IsRs3c10ZeJb4IQ0=; b=HH5+/EMZEZ/7xQOW6rso0UkAabBZNhJxRjhs189uJE2MvAXPd3CEELzMBkLTW3S7nvPg NV2+p24n7LfcQCPnvEYVJ0clBhjTpOvruh4mg++1+jcqQABEQ0A0E2OmUM9YDapWKJnu 7DcAA9TgGoOPmNptS6pIffPMyXfzGDofCVsQFU2qlH0Knm+gEnYFZCS4ePcC/FJfXCVl Dg8qknphIdWdiGojQzTsuHlp4eCnDvBRdBlFATHvz+Owk1TKHcbqaQKK386i4ItPXe2Q zOnNe29K7JMHvJtOZMbhT4e9Sch7lC1GYMhcMnhy03/Q1mp2nJMVfqovxp3uP96dGink /Q== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp2120.oracle.com with ESMTP id 2fanqy80bc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sat, 06 Jan 2018 02:46:18 +0000 Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w062kE3X022504 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 6 Jan 2018 02:46:15 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1eXeUw-0006KU-9p; Fri, 05 Jan 2018 18:46:14 -0800 Received: from userv0021.oracle.com ([156.151.31.71]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1eXeUt-0006KE-Ux for ocfs2-devel@oss.oracle.com; Fri, 05 Jan 2018 18:46:12 -0800 Received: from userp2040.oracle.com (userp2040.oracle.com [156.151.31.90]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w062kBLx022299 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL) for ; Sat, 6 Jan 2018 02:46:11 GMT Received: from pps.filterd (userp2040.oracle.com [127.0.0.1]) by userp2040.oracle.com (8.16.0.21/8.16.0.21) with SMTP id w062hx7w046515 for ; Sat, 6 Jan 2018 02:46:11 GMT Received: from prv-mh.provo.novell.com (prv-mh.provo.novell.com [137.65.248.74]) by userp2040.oracle.com with ESMTP id 2fan39gkfy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Sat, 06 Jan 2018 02:46:10 +0000 Received: from INET-PRV-MTA by prv-mh.provo.novell.com with Novell_GroupWise; Fri, 05 Jan 2018 19:46:08 -0700 Message-Id: <5A50386E020000F9001C7156@prv-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 14.2.2 Date: Fri, 05 Jan 2018 19:46:06 -0700 From: "Gang He" To: References: <1514447305-30814-1-git-send-email-ghe@suse.com> <5A4F8C30020000F9000A16FE@prv-mh.provo.novell.com> <20180105125034.762581d16dc8572990a9e5b6@linux-foundation.org> In-Reply-To: <20180105125034.762581d16dc8572990a9e5b6@linux-foundation.org> Content-Disposition: inline X-CLX-Shades: MLX X-CLX-Response: 1TFkXGxwZEQpMehcaEQpZTRdnZnIRCllJFxpxGhAadwYbHhlxHxgQGncGGBo GGhEKWV4XaGN5EQpJRhdFWEtJRk91WlhFTl9JXkNFRBl1T0sRCkNOF2ttfVxAenxiTX4deGJME3 lnbE8SHmwebRN+E0ZSUF8fEQpYXBcfBBoEGxgYBxxLSEtPHhwaBRsaBBsaGgQeEgQfEBseGh8aE QpeWRd4Qk4aZREKTVwXGB8fEQpMWhd4aUJNXREKRVkXb2sRCkxfF3oFBQUFBQUFBQVvEQpMRhds a2sRCkNaFxsZHQQcHwQYHhIEHR4RCkJeFxsRCkRJFx8RCkJGF2JQUHxJQH5fUk5iEQpCXBcaEQp CRRdtWlgdXENDTm9GYxEKQk4XZx5gGnNFTGFHAUQRCkJMF2tJY0JlYxNbQVNhEQpCbBdpW05efU lHW2xZYREKQkAXYmdmTF8ffWhgcEURCkJYF2J9b3kBTxgZcHB7EQpNXhcbEQpwZxdiH2hFYxkBH mhZQBAdGhEKcGgXen9mbhhYWxhZc3AQGRoRCnBoF2dmH3B6WHwSGHMdEBkaEQpwaBdkRmNLUF0e WG9yExAZGhEKcGgXbURSYn0bfxsbSEIQGRoRCnBoF28ZWkNOXhNFc28BEBkaEQpwbBdnfEdybX1 OeAEeUxAZGhEKbX4XGxEKWE0XSxEg MIME-Version: 1.0 X-PDR: PASS X-Source-IP: 137.65.248.74 X-ServerName: prv-mh.provo.novell.com X-Proofpoint-SPF-Result: pass X-Proofpoint-SPF-Record: v=spf1 include:novell.com ~all X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8765 signatures=668651 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=0 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=163 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801060032 X-Spam: Clean Cc: ocfs2-devel@oss.oracle.com, linux-kernel@vger.kernel.org Subject: Re: [Ocfs2-devel] [PATCH v2] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8765 signatures=668651 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801060032 X-Virus-Scanned: ClamAV using ClamSMTP Hi Andrew, >>> Andrew Morton 01/06/18 4:50 AM >>> On Thu, 04 Jan 2018 23:31:12 -0700 "Gang He" wrote: > Happy new year. > Could you help to pick up this patch, which is used to fix a old patch 1cce4df04f37. > If we have not this patch, some multiple node test cases will trigger softlockup problems, > also make HA communication daemon (e.g. corosync) timeout and the node will has to be fenced. I have the below queued for 4.16-rc1. Is the problem seriosu enough to push this into 4.15?If possible, please do that, since it can bring the system crash or fence in some test cases. Should the fix be backported into -stable kernels? Yes, I feel it can be considered as a regression problem. Thanks a lot. Gang From: Gang He Subject: ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE If we can't get inode lock immediately in the function ocfs2_inode_lock_with_page() when reading a page, we should not return directly here, since this will lead to a softlockup problem when the kernel is configured with CONFIG_PREEMPT is not set. The method is to get a blocking lock and immediately unlock before returning, this can avoid CPU resource waste due to lots of retries, and benefits fairness in getting lock among multiple nodes, increase efficiency in case modifying the same file frequently from multiple nodes. The softlockup crash (when set /proc/sys/kernel/softlockup_panic to 1) looks like: Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x5c/0x82 panic+0xd5/0x21e watchdog_timer_fn+0x208/0x210 ? watchdog_park_threads+0x70/0x70 __hrtimer_run_queues+0xcc/0x200 hrtimer_interrupt+0xa6/0x1f0 smp_apic_timer_interrupt+0x34/0x50 apic_timer_interrupt+0x96/0xa0 RIP: 0010:unlock_page+0x17/0x30 RSP: 0000:ffffaf154080bc88 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 RAX: dead000000000100 RBX: fffff21e009f5300 RCX: 0000000000000004 RDX: dead0000000000ff RSI: 0000000000000202 RDI: fffff21e009f5300 RBP: 0000000000000000 R08: 0000000000000000 R09: ffffaf154080bb00 R10: ffffaf154080bc30 R11: 0000000000000040 R12: ffff993749a39518 R13: 0000000000000000 R14: fffff21e009f5300 R15: fffff21e009f5300 ocfs2_inode_lock_with_page+0x25/0x30 [ocfs2] ocfs2_readpage+0x41/0x2d0 [ocfs2] ? pagecache_get_page+0x30/0x200 filemap_fault+0x12b/0x5c0 ? recalc_sigpending+0x17/0x50 ? __set_task_blocked+0x28/0x70 ? __set_current_blocked+0x3d/0x60 ocfs2_fault+0x29/0xb0 [ocfs2] __do_fault+0x1a/0xa0 __handle_mm_fault+0xbe8/0x1090 handle_mm_fault+0xaa/0x1f0 __do_page_fault+0x235/0x4b0 trace_do_page_fault+0x3c/0x110 async_page_fault+0x28/0x30 RIP: 0033:0x7fa75ded638e RSP: 002b:00007ffd6657db18 EFLAGS: 00010287 RAX: 000055c7662fb700 RBX: 0000000000000001 RCX: 000055c7662fb700 RDX: 0000000000001770 RSI: 00007fa75e909000 RDI: 000055c7662fb700 RBP: 0000000000000003 R08: 000000000000000e R09: 0000000000000000 R10: 0000000000000483 R11: 00007fa75ded61b0 R12: 00007fa75e90a770 R13: 000000000000000e R14: 0000000000001770 R15: 0000000000000000 About performance improvement, we can see the testing time is reduced, and CPU utilization decreases, the detailed data is as follows. I ran multi_mmap test case in ocfs2-test package in a three nodes cluster. Before applying this patch: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2754 ocfs2te+ 20 0 170248 6980 4856 D 80.73 0.341 0:18.71 multi_mmap 1505 root rt 0 222236 123060 97224 S 2.658 6.015 0:01.44 corosync 5 root 20 0 0 0 0 S 1.329 0.000 0:00.19 kworker/u8:0 95 root 20 0 0 0 0 S 1.329 0.000 0:00.25 kworker/u8:1 2728 root 20 0 0 0 0 S 0.997 0.000 0:00.24 jbd2/sda1-33 2721 root 20 0 0 0 0 S 0.664 0.000 0:00.07 ocfs2dc-3C8CFD4 2750 ocfs2te+ 20 0 142976 4652 3532 S 0.664 0.227 0:00.28 mpirun ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o ~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d /dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared Tests with "-b 4096 -C 32768" Thu Dec 28 14:44:52 CST 2017 multi_mmap..................................................Passed. Runtime 783 seconds. After apply this patch: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2508 ocfs2te+ 20 0 170248 6804 4680 R 54.00 0.333 0:55.37 multi_mmap 155 root 20 0 0 0 0 S 2.667 0.000 0:01.20 kworker/u8:3 95 root 20 0 0 0 0 S 2.000 0.000 0:01.58 kworker/u8:1 2504 ocfs2te+ 20 0 142976 4604 3480 R 1.667 0.225 0:01.65 mpirun 5 root 20 0 0 0 0 S 1.000 0.000 0:01.36 kworker/u8:0 2482 root 20 0 0 0 0 S 1.000 0.000 0:00.86 jbd2/sda1-33 299 root 0 -20 0 0 0 S 0.333 0.000 0:00.13 kworker/2:1H 335 root 0 -20 0 0 0 S 0.333 0.000 0:00.17 kworker/1:1H 535 root 20 0 12140 7268 1456 S 0.333 0.355 0:00.34 haveged 1282 root rt 0 222284 123108 97224 S 0.333 6.017 0:01.33 corosync ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o ~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d /dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared Tests with "-b 4096 -C 32768" Thu Dec 28 15:04:12 CST 2017 multi_mmap..................................................Passed. Runtime 487 seconds. Link: https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_1514447305-2D30814-2D1-2Dgit-2Dsend-2Demail-2Dghe-40suse.com&d=DwIFAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y&m=FqkGQvctDRc0IZPbMXumDvKj4AKssblI74e-lj8_bsM&s=lv2undt396m6o7r_zPXd68c08DaN_60aqC2Wvpw3fRI&e= Fixes: 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock") Signed-off-by: Gang He Reviewed-by: Eric Ren Acked-by: alex chen Acked-by: piaojun Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Joseph Qi Cc: Changwei Ge Signed-off-by: Andrew Morton --- fs/ocfs2/dlmglue.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff -puN fs/ocfs2/dlmglue.c~ocfs2-try-a-blocking-lock-before-return-aop_truncated_page fs/ocfs2/dlmglue.c --- a/fs/ocfs2/dlmglue.c~ocfs2-try-a-blocking-lock-before-return-aop_truncated_page +++ a/fs/ocfs2/dlmglue.c @@ -2529,6 +2529,15 @@ int ocfs2_inode_lock_with_page(struct in ret = ocfs2_inode_lock_full(inode, ret_bh, ex, OCFS2_LOCK_NONBLOCK); if (ret == -EAGAIN) { unlock_page(page); + /* + * If we can't get inode lock immediately, we should not return + * directly here, since this will lead to a softlockup problem. + * The method is to get a blocking lock and immediately unlock + * before returning, this can avoid CPU resource waste due to + * lots of retries, and benefits fairness in getting lock. + */ + if (ocfs2_inode_lock(inode, ret_bh, ex) == 0) + ocfs2_inode_unlock(inode, ex); ret = AOP_TRUNCATED_PAGE; }