From patchwork Fri Jun 29 02:43:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mikulas Patocka X-Patchwork-Id: 10495557 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 668F2601C7 for ; Fri, 29 Jun 2018 02:43:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 52D4A2A110 for ; Fri, 29 Jun 2018 02:43:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 445EA2A168; Fri, 29 Jun 2018 02:43:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 567A02A110 for ; Fri, 29 Jun 2018 02:43:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 38B446B026D; Thu, 28 Jun 2018 22:43:32 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 313196B026E; Thu, 28 Jun 2018 22:43:32 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1DC936B026F; Thu, 28 Jun 2018 22:43:32 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt0-f197.google.com (mail-qt0-f197.google.com [209.85.216.197]) by kanga.kvack.org (Postfix) with ESMTP id E097F6B026D for ; Thu, 28 Jun 2018 22:43:31 -0400 (EDT) Received: by mail-qt0-f197.google.com with SMTP id d23-v6so7517409qtj.12 for ; Thu, 28 Jun 2018 19:43:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:date:from:to :cc:subject:in-reply-to:message-id:references:user-agent :mime-version; bh=J9zv6FY1oAAlffLgbdoKdpWRfyN0RNC/93Ugiyot3QA=; b=Oik9VVzy6WiND7zpuNutqJR/kvHR8bc9A3UhD+am+0lmCpnLUs+blj+jBLLjfu7UOh pWqAKmNkY2Fzy+XVPpJ00oMWDj3Addrlg6qfMnKuyKDXveVQIGBqTI4XXc60sc1qcnI0 fd2yp9B9xldXeXFDBW6X59Hy6I7doX1ukhIB7qTrj2n9DeHdwKkXcnes+kb02LIdBawR yW/Tt9Pt379lvrLPxmp2GXGDJhmlTnueLtWfzgfnuRgpzSeoBAoMyuP8ejb469Vtcy9E Dx8T8vRT/IWa2K7WQESOI89GJS73pBYDpea463YuHiBJa2PNhqQjT4+kk4lrs7xYzoHh yuKQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of mpatocka@redhat.com designates 66.187.233.73 as permitted sender) smtp.mailfrom=mpatocka@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: APt69E1rI86EpWRj17IoaXxgjEGuH0BDn9yFFw3tmjV6mtjdcjyOspKF /UTk9RMZcQ5b2iU4Bkf1HodtJGxMhkC5+KsUiZ+VjlCdXFZ3kmdrGsK7tsszewxf5cIvtYX7XV9 7oaQNGzemqj9KWmy+0U7zaIXn/KJyCBKLv4oE3Gtjhew4qiJzUf5T5P7VpZjLaV06cQ== X-Received: by 2002:aed:37e6:: with SMTP id j93-v6mr11887564qtb.13.1530240211650; Thu, 28 Jun 2018 19:43:31 -0700 (PDT) X-Google-Smtp-Source: AAOMgpegjcLjY+OpKF+NQiLBWxX7tezClH98gw3VhqS+IaaXlsS4l3Fz367TOOladyzL9xps5Vtt X-Received: by 2002:aed:37e6:: with SMTP id j93-v6mr11887533qtb.13.1530240210304; Thu, 28 Jun 2018 19:43:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530240210; cv=none; d=google.com; s=arc-20160816; b=zYXHwFS7++5Obo8HkCB3oIzMeBNTzLX9Qi+oE0ornsHoN/Vd3qSh2nRpi1da21Nstz pMTHWJvXB8go48F0s1j/IDd5v1vpr0MO+Pq/pTB+nSVGAU++9Eg30YPr+7oo/VnT6Fo4 KDkN9AJ7gil/FN2Gap+4UYrXDMrY2vkYeMU1SgqUyngeMJAvXpTP8L0wlhpe4gUSJUbf BniozYWatn5ylV3/mDIZNrjJI5vhULqRobBD1opYFNvWUJ7yWD3cLl3vcHjGm3NNk6G6 DZ0EjfR21gW7+AuhYjfv+AzA7tz2Qy8rXapNpqchBWpntaL78qpq0mCcDzOHgF46GT3U z4og== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:user-agent:references:message-id:in-reply-to:subject :cc:to:from:date:arc-authentication-results; bh=J9zv6FY1oAAlffLgbdoKdpWRfyN0RNC/93Ugiyot3QA=; b=Fg2JXNPFkRJCSjIox63d3QJ4lO5RR107Rga3mj4ssBG5udSFpXt9EGX6gsrtNGR0mt av5mpuawlR8dE1gdpxbulRmkJumrUQagcj+D0cB9LB8yV/d2h7D44oD/HHVnEY7wmCbL eHeIrGfcyzfWffMAcsneGwKM2D1BwOMkGe+Z8Ds+LdtZuF3TTPPuiS9lVesdxZ8snoSp ZHFULirwhH/Rfik8G+oQ985VxV8EiqeAbO9QjtrM57MDUjqxhObv/5mH7LHtxNgNpYJZ ueRrpN1eOTCag9hXNf3AlQZk+oh260J2+Gw7cxKdJMhl6ksaag47nn4XqMliagh81ofs Go9g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of mpatocka@redhat.com designates 66.187.233.73 as permitted sender) smtp.mailfrom=mpatocka@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx3-rdu2.redhat.com. [66.187.233.73]) by mx.google.com with ESMTPS id q53-v6si3916585qte.315.2018.06.28.19.43.30 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 28 Jun 2018 19:43:30 -0700 (PDT) Received-SPF: pass (google.com: domain of mpatocka@redhat.com designates 66.187.233.73 as permitted sender) client-ip=66.187.233.73; Authentication-Results: mx.google.com; spf=pass (google.com: domain of mpatocka@redhat.com designates 66.187.233.73 as permitted sender) smtp.mailfrom=mpatocka@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id CD67E401DE7C; Fri, 29 Jun 2018 02:43:29 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (file01.intranet.prod.int.rdu2.redhat.com [10.11.5.7]) by smtp.corp.redhat.com (Postfix) with ESMTPS id AAA8F2026D6A; Fri, 29 Jun 2018 02:43:29 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (localhost [127.0.0.1]) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4) with ESMTP id w5T2hTAc017543; Thu, 28 Jun 2018 22:43:29 -0400 Received: from localhost (mpatocka@localhost) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4/Submit) with ESMTP id w5T2hTuZ017539; Thu, 28 Jun 2018 22:43:29 -0400 X-Authentication-Warning: file01.intranet.prod.int.rdu2.redhat.com: mpatocka owned process doing -bs Date: Thu, 28 Jun 2018 22:43:29 -0400 (EDT) From: Mikulas Patocka X-X-Sender: mpatocka@file01.intranet.prod.int.rdu2.redhat.com To: Michal Hocko cc: jing xia , Mike Snitzer , agk@redhat.com, dm-devel@redhat.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: dm bufio: Reduce dm_bufio_lock contention In-Reply-To: <20180625145733.GP28965@dhcp22.suse.cz> Message-ID: References: <20180622090151.GS10465@dhcp22.suse.cz> <20180622090935.GT10465@dhcp22.suse.cz> <20180622130524.GZ10465@dhcp22.suse.cz> <20180625090957.GF28965@dhcp22.suse.cz> <20180625141434.GO28965@dhcp22.suse.cz> <20180625145733.GP28965@dhcp22.suse.cz> User-Agent: Alpine 2.02 (LRH 1266 2009-07-14) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Fri, 29 Jun 2018 02:43:29 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Fri, 29 Jun 2018 02:43:29 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'mpatocka@redhat.com' RCPT:'' X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP On Mon, 25 Jun 2018, Michal Hocko wrote: > On Mon 25-06-18 10:42:30, Mikulas Patocka wrote: > > > > > > On Mon, 25 Jun 2018, Michal Hocko wrote: > > > > > > And the throttling in dm-bufio prevents kswapd from making forward > > > > progress, causing this situation... > > > > > > Which is what we have PF_THROTTLE_LESS for. Geez, do we have to go in > > > circles like that? Are you even listening? > > > > > > [...] > > > > > > > And so what do you want to do to prevent block drivers from sleeping? > > > > > > use the existing means we have. > > > -- > > > Michal Hocko > > > SUSE Labs > > > > So - do you want this patch? > > > > There is no behavior difference between changing the allocator (so that it > > implies PF_THROTTLE_LESS for block drivers) and chaning all the block > > drivers to explicitly set PF_THROTTLE_LESS. > > As long as you can reliably detect those users. And using gfp_mask is You can detect them if __GFP_IO is not set and __GFP_NORETRY is set. You can grep the kernel for __GFP_NORETRY to find all the users. > about the worst way to achieve that because users tend to be creative > when it comes to using gfp mask. PF_THROTTLE_LESS in general is a > way to tell the allocator that _you_ are the one to help the reclaim by > cleaning data. But using PF_LESS_THROTTLE explicitly adds more lines of code than implying PF_LESS_THROTTLE in the allocator. > > But if you insist that the allocator can't be changed, we have to repeat > > the same code over and over again in the block drivers. > > I am not familiar with the patched code but mempool change at least > makes sense (bvec_alloc seems to fallback to mempool which then makes > sense as well). If others in md/ do the same thing > > I would just use current_restore_flags rather than open code it. > > Thanks! > -- > Michal Hocko > SUSE Labs So, do you accept this patch? Mikulas From: Mikulas Patocka Subject: [PATCH] mm: set PF_LESS_THROTTLE when allocating memory for i/o When doing __GFP_NORETRY allocation, the system may sleep in wait_iff_congested if there are too many dirty pages. Unfortunatelly this sleeping may slow down kswapd, preventing it from doing writeback and resolving the congestion. This patch fixes it by setting PF_LESS_THROTTLE when allocating memory for block device drivers. Signed-off-by: Mikulas Patocka Cc: stable@vger.kernel.org Acked-by: Michal Hocko # mempool_alloc and bvec_alloc --- block/bio.c | 4 ++++ drivers/md/dm-bufio.c | 14 +++++++++++--- drivers/md/dm-crypt.c | 8 ++++++++ drivers/md/dm-integrity.c | 4 ++++ drivers/md/dm-kcopyd.c | 3 +++ drivers/md/dm-verity-target.c | 4 ++++ drivers/md/dm-writecache.c | 4 ++++ mm/mempool.c | 4 ++++ 8 files changed, 42 insertions(+), 3 deletions(-) Index: linux-2.6/mm/mempool.c =================================================================== --- linux-2.6.orig/mm/mempool.c 2018-06-29 03:47:16.290000000 +0200 +++ linux-2.6/mm/mempool.c 2018-06-29 03:47:16.270000000 +0200 @@ -369,6 +369,7 @@ void *mempool_alloc(mempool_t *pool, gfp unsigned long flags; wait_queue_entry_t wait; gfp_t gfp_temp; + unsigned old_flags; VM_WARN_ON_ONCE(gfp_mask & __GFP_ZERO); might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM); @@ -381,7 +382,10 @@ void *mempool_alloc(mempool_t *pool, gfp repeat_alloc: + old_flags = current->flags & PF_LESS_THROTTLE; + current->flags |= PF_LESS_THROTTLE; element = pool->alloc(gfp_temp, pool->pool_data); + current_restore_flags(old_flags, PF_LESS_THROTTLE); if (likely(element != NULL)) return element; Index: linux-2.6/block/bio.c =================================================================== --- linux-2.6.orig/block/bio.c 2018-06-29 03:47:16.290000000 +0200 +++ linux-2.6/block/bio.c 2018-06-29 03:47:16.270000000 +0200 @@ -217,6 +217,7 @@ fallback: } else { struct biovec_slab *bvs = bvec_slabs + *idx; gfp_t __gfp_mask = gfp_mask & ~(__GFP_DIRECT_RECLAIM | __GFP_IO); + unsigned old_flags; /* * Make this allocation restricted and don't dump info on @@ -229,7 +230,10 @@ fallback: * Try a slab allocation. If this fails and __GFP_DIRECT_RECLAIM * is set, retry with the 1-entry mempool */ + old_flags = current->flags & PF_LESS_THROTTLE; + current->flags |= PF_LESS_THROTTLE; bvl = kmem_cache_alloc(bvs->slab, __gfp_mask); + current_restore_flags(old_flags, PF_LESS_THROTTLE); if (unlikely(!bvl && (gfp_mask & __GFP_DIRECT_RECLAIM))) { *idx = BVEC_POOL_MAX; goto fallback; Index: linux-2.6/drivers/md/dm-bufio.c =================================================================== --- linux-2.6.orig/drivers/md/dm-bufio.c 2018-06-29 03:47:16.290000000 +0200 +++ linux-2.6/drivers/md/dm-bufio.c 2018-06-29 03:47:16.270000000 +0200 @@ -356,6 +356,7 @@ static void __cache_size_refresh(void) static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask, unsigned char *data_mode) { + void *ptr; if (unlikely(c->slab_cache != NULL)) { *data_mode = DATA_MODE_SLAB; return kmem_cache_alloc(c->slab_cache, gfp_mask); @@ -363,9 +364,14 @@ static void *alloc_buffer_data(struct dm if (c->block_size <= KMALLOC_MAX_SIZE && gfp_mask & __GFP_NORETRY) { + unsigned old_flags; *data_mode = DATA_MODE_GET_FREE_PAGES; - return (void *)__get_free_pages(gfp_mask, + old_flags = current->flags & PF_LESS_THROTTLE; + current->flags |= PF_LESS_THROTTLE; + ptr = (void *)__get_free_pages(gfp_mask, c->sectors_per_block_bits - (PAGE_SHIFT - SECTOR_SHIFT)); + current_restore_flags(old_flags, PF_LESS_THROTTLE); + return ptr; } *data_mode = DATA_MODE_VMALLOC; @@ -381,8 +387,10 @@ static void *alloc_buffer_data(struct dm */ if (gfp_mask & __GFP_NORETRY) { unsigned noio_flag = memalloc_noio_save(); - void *ptr = __vmalloc(c->block_size, gfp_mask, PAGE_KERNEL); - + unsigned old_flags = current->flags & PF_LESS_THROTTLE; + current->flags |= PF_LESS_THROTTLE; + ptr = __vmalloc(c->block_size, gfp_mask, PAGE_KERNEL); + current_restore_flags(old_flags, PF_LESS_THROTTLE); memalloc_noio_restore(noio_flag); return ptr; } Index: linux-2.6/drivers/md/dm-integrity.c =================================================================== --- linux-2.6.orig/drivers/md/dm-integrity.c 2018-06-29 03:47:16.290000000 +0200 +++ linux-2.6/drivers/md/dm-integrity.c 2018-06-29 03:47:16.270000000 +0200 @@ -1318,6 +1318,7 @@ static void integrity_metadata(struct wo int r; if (ic->internal_hash) { + unsigned old_flags; struct bvec_iter iter; struct bio_vec bv; unsigned digest_size = crypto_shash_digestsize(ic->internal_hash); @@ -1331,8 +1332,11 @@ static void integrity_metadata(struct wo if (unlikely(ic->mode == 'R')) goto skip_io; + old_flags = current->flags & PF_LESS_THROTTLE; + current->flags |= PF_LESS_THROTTLE; checksums = kmalloc((PAGE_SIZE >> SECTOR_SHIFT >> ic->sb->log2_sectors_per_block) * ic->tag_size + extra_space, GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN); + current_restore_flags(old_flags, PF_LESS_THROTTLE); if (!checksums) checksums = checksums_onstack; Index: linux-2.6/drivers/md/dm-kcopyd.c =================================================================== --- linux-2.6.orig/drivers/md/dm-kcopyd.c 2018-06-29 03:47:16.290000000 +0200 +++ linux-2.6/drivers/md/dm-kcopyd.c 2018-06-29 03:47:16.270000000 +0200 @@ -245,7 +245,10 @@ static int kcopyd_get_pages(struct dm_kc *pages = NULL; do { + unsigned old_flags = current->flags & PF_LESS_THROTTLE; + current->flags |= PF_LESS_THROTTLE; pl = alloc_pl(__GFP_NOWARN | __GFP_NORETRY | __GFP_KSWAPD_RECLAIM); + current_restore_flags(old_flags, PF_LESS_THROTTLE); if (unlikely(!pl)) { /* Use reserved pages */ pl = kc->pages; Index: linux-2.6/drivers/md/dm-verity-target.c =================================================================== --- linux-2.6.orig/drivers/md/dm-verity-target.c 2018-06-29 03:47:16.290000000 +0200 +++ linux-2.6/drivers/md/dm-verity-target.c 2018-06-29 03:47:16.280000000 +0200 @@ -596,9 +596,13 @@ no_prefetch_cluster: static void verity_submit_prefetch(struct dm_verity *v, struct dm_verity_io *io) { struct dm_verity_prefetch_work *pw; + unsigned old_flags; + old_flags = current->flags & PF_LESS_THROTTLE; + current->flags |= PF_LESS_THROTTLE; pw = kmalloc(sizeof(struct dm_verity_prefetch_work), GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); + current_restore_flags(old_flags, PF_LESS_THROTTLE); if (!pw) return; Index: linux-2.6/drivers/md/dm-writecache.c =================================================================== --- linux-2.6.orig/drivers/md/dm-writecache.c 2018-06-29 03:47:16.290000000 +0200 +++ linux-2.6/drivers/md/dm-writecache.c 2018-06-29 03:47:16.280000000 +0200 @@ -1473,6 +1473,7 @@ static void __writecache_writeback_pmem( unsigned max_pages; while (wbl->size) { + unsigned old_flags; wbl->size--; e = container_of(wbl->list.prev, struct wc_entry, lru); list_del(&e->lru); @@ -1486,6 +1487,8 @@ static void __writecache_writeback_pmem( bio_set_dev(&wb->bio, wc->dev->bdev); wb->bio.bi_iter.bi_sector = read_original_sector(wc, e); wb->page_offset = PAGE_SIZE; + old_flags = current->flags & PF_LESS_THROTTLE; + current->flags |= PF_LESS_THROTTLE; if (max_pages <= WB_LIST_INLINE || unlikely(!(wb->wc_list = kmalloc(max_pages * sizeof(struct wc_entry *), GFP_NOIO | __GFP_NORETRY | @@ -1493,6 +1496,7 @@ static void __writecache_writeback_pmem( wb->wc_list = wb->wc_list_inline; max_pages = WB_LIST_INLINE; } + current_restore_flags(old_flags, PF_LESS_THROTTLE); BUG_ON(!wc_add_block(wb, e, GFP_NOIO)); Index: linux-2.6/drivers/md/dm-crypt.c =================================================================== --- linux-2.6.orig/drivers/md/dm-crypt.c 2018-06-29 03:47:16.290000000 +0200 +++ linux-2.6/drivers/md/dm-crypt.c 2018-06-29 03:47:16.280000000 +0200 @@ -2181,12 +2181,16 @@ static void *crypt_page_alloc(gfp_t gfp_ { struct crypt_config *cc = pool_data; struct page *page; + unsigned old_flags; if (unlikely(percpu_counter_compare(&cc->n_allocated_pages, dm_crypt_pages_per_client) >= 0) && likely(gfp_mask & __GFP_NORETRY)) return NULL; + old_flags = current->flags & PF_LESS_THROTTLE; + current->flags |= PF_LESS_THROTTLE; page = alloc_page(gfp_mask); + current_restore_flags(old_flags, PF_LESS_THROTTLE); if (likely(page != NULL)) percpu_counter_add(&cc->n_allocated_pages, 1); @@ -2893,7 +2897,10 @@ static int crypt_map(struct dm_target *t if (cc->on_disk_tag_size) { unsigned tag_len = cc->on_disk_tag_size * (bio_sectors(bio) >> cc->sector_shift); + unsigned old_flags; + old_flags = current->flags & PF_LESS_THROTTLE; + current->flags |= PF_LESS_THROTTLE; if (unlikely(tag_len > KMALLOC_MAX_SIZE) || unlikely(!(io->integrity_metadata = kmalloc(tag_len, GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN)))) { @@ -2902,6 +2909,7 @@ static int crypt_map(struct dm_target *t io->integrity_metadata = mempool_alloc(&cc->tag_pool, GFP_NOIO); io->integrity_metadata_from_pool = true; } + current_restore_flags(old_flags, PF_LESS_THROTTLE); } if (crypt_integrity_aead(cc))