From patchwork Thu Nov 17 19:24:20 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Doug Anderson X-Patchwork-Id: 9435323 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 4223F6047D for ; Thu, 17 Nov 2016 20:46:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3070D296D6 for ; Thu, 17 Nov 2016 20:46:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 20E4C296DB; Thu, 17 Nov 2016 20:46:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED, T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from mx5-phx2.redhat.com (mx5-phx2.redhat.com [209.132.183.37]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 03E9D296D6 for ; Thu, 17 Nov 2016 20:46:28 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx5-phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uAHKj8sx026905; Thu, 17 Nov 2016 15:45:08 -0500 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id uAHJOo2Z002099 for ; Thu, 17 Nov 2016 14:24:50 -0500 Received: from mx1.redhat.com (ext-mx03.extmail.prod.ext.phx2.redhat.com [10.5.110.27]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uAHJOoLY005414 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Thu, 17 Nov 2016 14:24:50 -0500 Received: from mail-pg0-f47.google.com (mail-pg0-f47.google.com [74.125.83.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8F0E57EBCA for ; Thu, 17 Nov 2016 19:24:48 +0000 (UTC) Received: by mail-pg0-f47.google.com with SMTP id x23so91498587pgx.1 for ; Thu, 17 Nov 2016 11:24:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=from:to:cc:subject:date:message-id; bh=0zzhe+w+NDajjXzOxueBXLHDhchL2OBPmsAZpARJSFs=; b=ClgGvJrfTfj54qN1zZvYeG7qSOB6d+QEmtybNEPbX0RTHP7J8WinkZo8TGTseh47qg OjSH9E51Ms4WWxIcyYzNyxlB9CgKSb7ZytmqHhdabs1V4nW1d8TB252AZ62dHKJV/0kV AFjWFtWZ5Bi7f60ZCxXNIY9om9TpiEifFurPM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=0zzhe+w+NDajjXzOxueBXLHDhchL2OBPmsAZpARJSFs=; b=jWxP+ts1w2j7uK5OE49HpJB6MjVaJ6U2dpkay83/k2NIU16XNvLqtgyO9i+ZgBHSA4 o26G0KHWPaMzW7vGkR4KeYFXIdVOCIeSZBdaU+o1fUaVqv/2pEvRNEhFRpRi/i4SJSig SgQ85oTj6Po+1oZTo8yrzjvefWyrDxBMf9bx9paTtO8LkMAUHF8yJ6C2sDKfW7A8rgEU ep0+oA71MsFBP3LaOlc5Es1EPq64iPwakjwLli21I9x6qRlWkHie9HR0Iu1jm90i22pV h8Usec9oCzS4E9DmeBQ4XgXi+zOHqVSk6CPd5t0PZTFvHvxv/8bpwnw6N6hUm6Je8q7H ciBA== X-Gm-Message-State: ABUngvf1iQZvwpmFJ+EZOsbCyLokGQLS3q135ssr/A+MDtLPYAo4SyPE+BNWl3vjiqh4LjZC X-Received: by 10.98.56.149 with SMTP id f143mr6566146pfa.106.1479410687909; Thu, 17 Nov 2016 11:24:47 -0800 (PST) Received: from tictac.mtv.corp.google.com ([172.22.65.76]) by smtp.gmail.com with ESMTPSA id 16sm10056121pfk.54.2016.11.17.11.24.46 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 17 Nov 2016 11:24:46 -0800 (PST) From: Douglas Anderson To: Alasdair Kergon , Mike Snitzer Date: Thu, 17 Nov 2016 11:24:20 -0800 Message-Id: <1479410660-31408-1-git-send-email-dianders@chromium.org> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Thu, 17 Nov 2016 19:24:48 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Thu, 17 Nov 2016 19:24:48 +0000 (UTC) for IP:'74.125.83.47' DOMAIN:'mail-pg0-f47.google.com' HELO:'mail-pg0-f47.google.com' FROM:'dianders@chromium.org' RCPT:'' X-RedHat-Spam-Score: 0.288 (BAYES_50, DCC_REPUT_00_12, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_PASS) 74.125.83.47 mail-pg0-f47.google.com 74.125.83.47 mail-pg0-f47.google.com X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Scanned-By: MIMEDefang 2.78 on 10.5.110.27 X-loop: dm-devel@redhat.com X-Mailman-Approved-At: Thu, 17 Nov 2016 15:45:06 -0500 Cc: shli@kernel.org, Dmitry Torokhov , Douglas Anderson , linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, dm-devel@redhat.com, David Rientjes , Sonny Rao , linux@roeck-us.net Subject: [dm-devel] [PATCH] dm: Avoid sleeping while holding the dm_bufio lock X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Virus-Scanned: ClamAV using ClamSMTP We've seen in-field reports showing _lots_ (18 in one case, 41 in another) of tasks all sitting there blocked on: mutex_lock+0x4c/0x68 dm_bufio_shrink_count+0x38/0x78 shrink_slab.part.54.constprop.65+0x100/0x464 shrink_zone+0xa8/0x198 In the two cases analyzed, we see one task that looks like this: Workqueue: kverityd verity_prefetch_io __switch_to+0x9c/0xa8 __schedule+0x440/0x6d8 schedule+0x94/0xb4 schedule_timeout+0x204/0x27c schedule_timeout_uninterruptible+0x44/0x50 wait_iff_congested+0x9c/0x1f0 shrink_inactive_list+0x3a0/0x4cc shrink_lruvec+0x418/0x5cc shrink_zone+0x88/0x198 try_to_free_pages+0x51c/0x588 __alloc_pages_nodemask+0x648/0xa88 __get_free_pages+0x34/0x7c alloc_buffer+0xa4/0x144 __bufio_new+0x84/0x278 dm_bufio_prefetch+0x9c/0x154 verity_prefetch_io+0xe8/0x10c process_one_work+0x240/0x424 worker_thread+0x2fc/0x424 kthread+0x10c/0x114 ...and that looks to be the one holding the mutex. The problem has been reproduced on fairly easily: 0. Be running Chrome OS w/ verity enabled on the root filesystem 1. Pick test patch: http://crosreview.com/412360 2. Install launchBalloons.sh and balloon.arm from http://crbug.com/468342 ...that's just a memory stress test app. 3. On a 4GB rk3399 machine, run nice ./launchBalloons.sh 4 900 100000 ...that tries to eat 4 * 900 MB of memory and keep accessing. 4. Login to the Chrome web browser and restore many tabs With that, I've seen printouts like: DOUG: long bufio 90758 ms ...and stack trace always show's we're in dm_bufio_prefetch(). The problem is that we try to allocate memory with GFP_NOIO while we're holding the dm_bufio lock. Instead we should be using GFP_NOWAIT. Using GFP_NOIO can cause us to sleep while holding the lock and that causes the above problems. The current behavior explained by David Rientjes: It will still try reclaim initially because __GFP_WAIT (or __GFP_KSWAPD_RECLAIM) is set by GFP_NOIO. This is the cause of contention on dm_bufio_lock() that the thread holds. You want to pass GFP_NOWAIT instead of GFP_NOIO to alloc_buffer() when holding a mutex that can be contended by a concurrent slab shrinker (if count_objects didn't use a trylock, this pattern would trivially deadlock). Suggested-by: David Rientjes Signed-off-by: Douglas Anderson Reviewed-by: Guenter Roeck --- Note that this change was developed and tested against the Chrome OS 4.4 kernel tree, not mainline. Due to slight differences in verity between mainline and Chrome OS it became too difficult to reproduce my testing setup on mainline. This patch still seems correct and relevant to upstream, so I'm posting it. If this is not acceptible to you then please ignore this patch. Also note that when I tested the Chrome OS 3.14 kernel tree I couldn't reproduce the long delays described in the patch. Presumably something changed in either the kernel config or the memory management code between the two kernel versions that made this crop up. In a similar vein, it is possible that problems described in this patch are no longer reproducible upstream. However, the arguments made in this patch (that we don't want to block while holding the mutex) still apply so I think the patch may still have merit. drivers/md/dm-bufio.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c index b3ba142e59a4..3c767399cc59 100644 --- a/drivers/md/dm-bufio.c +++ b/drivers/md/dm-bufio.c @@ -827,7 +827,8 @@ static struct dm_buffer *__alloc_buffer_wait_no_callback(struct dm_bufio_client * dm-bufio is resistant to allocation failures (it just keeps * one buffer reserved in cases all the allocations fail). * So set flags to not try too hard: - * GFP_NOIO: don't recurse into the I/O layer + * GFP_NOWAIT: don't wait; if we need to sleep we'll release our + * mutex and wait ourselves. * __GFP_NORETRY: don't retry and rather return failure * __GFP_NOMEMALLOC: don't use emergency reserves * __GFP_NOWARN: don't print a warning in case of failure @@ -837,7 +838,8 @@ static struct dm_buffer *__alloc_buffer_wait_no_callback(struct dm_bufio_client */ while (1) { if (dm_bufio_cache_size_latch != 1) { - b = alloc_buffer(c, GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); + b = alloc_buffer(c, GFP_NOWAIT | __GFP_NORETRY | + __GFP_NOMEMALLOC | __GFP_NOWARN); if (b) return b; }