From patchwork Wed Apr 27 20:59:15 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 8962691 Return-Path: X-Original-To: patchwork-linux-block@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id AF154BF29F for ; Wed, 27 Apr 2016 20:59:22 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 5D10C2021F for ; Wed, 27 Apr 2016 20:59:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 39F33201C7 for ; Wed, 27 Apr 2016 20:59:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752586AbcD0U7T (ORCPT ); Wed, 27 Apr 2016 16:59:19 -0400 Received: from mail-io0-f177.google.com ([209.85.223.177]:33226 "EHLO mail-io0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752258AbcD0U7R (ORCPT ); Wed, 27 Apr 2016 16:59:17 -0400 Received: by mail-io0-f177.google.com with SMTP id f89so58702821ioi.0 for ; Wed, 27 Apr 2016 13:59:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=kksI/go/auB0LnDR5/Chjz/8FkZYwmdUeweXOdVuD8o=; b=kiyJrTnvE7jL9wj1EfQ7KczYQVLV24hUGAYjPxL/Jv7QBzqErog1GHspBw/Hgqxx4Z AuCwUN7mKlYTIxp4usat/0UKeozQuB/OjK2C/eC3XP+AcaLrmscdDRmXfuRXrAmSmt2b muZywSbGh5vP9C+8+ziNsR14pyWK6N05/U/FtXeMVmt9DdHb28qUCbGhfA1OBurVMvbc CoqPr6pTnJR0vlAZuMpL/MK6DfscBee8m9O2PaE2r907rRJxrJzZ8lckQq42M3xD9BcS UwdHUuOm2//wM29bUMI78t8GJ/5P6oXVcqMYr7X2VHGf7CpMF9BIa66eAITkQYcnlRgp PsSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=kksI/go/auB0LnDR5/Chjz/8FkZYwmdUeweXOdVuD8o=; b=Z62H9HWYeISMp7Z779lw1exS6GonQ6Sz1NW9lozcUe4ywrvNYc0itrVOWC5Wv7PuAV LO42DeS05MUMQqjjUcTgipu9ayKG4GKhb+kO8oqGNBu85TjKuo4ZwfBHp5J8DDvghhCH PrTKcm449xpWbvq6YWAn20JVFDm9bHEbvNTCfmk3jBrChiX1/M5DzPK3buvauWwydCZA GXoXdEs64C3a9+I7Feaerjy4f/vO5qaE7mZu3+9G1sGtwsgUQ4/i37/mhm4V3TyXbLbz Ok4aSjrQ/4hADuSVMq3Xeaj7XEiZUnMheudDzaRmRtNMLBg/8RpQduP0G1fT6ZfQSO2C 1g6w== X-Gm-Message-State: AOPr4FXjO8Zp+We47f97ER3m9ultKvQpzFRXH2YIKCS+o5p1KgwGN8vB7vcaxz6/7ODuGA== X-Received: by 10.107.129.75 with SMTP id c72mr13479088iod.102.1461790757020; Wed, 27 Apr 2016 13:59:17 -0700 (PDT) Received: from localhost ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id d5sm5373235igj.18.2016.04.27.13.59.16 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Wed, 27 Apr 2016 13:59:16 -0700 (PDT) Date: Wed, 27 Apr 2016 14:59:15 -0600 From: Jens Axboe To: Jan Kara Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, dchinner@redhat.com, sedat.dilek@gmail.com Subject: Re: [PATCHSET v5] Make background writeback great again for the first time Message-ID: <20160427205915.GC25397@kernel.dk> References: <1461686131-22999-1-git-send-email-axboe@fb.com> <20160427180105.GA17362@quack2.suse.cz> <5721021E.8060006@fb.com> <20160427203708.GA25397@kernel.dk> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20160427203708.GA25397@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Spam-Status: No, score=-7.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Wed, Apr 27 2016, Jens Axboe wrote: > On Wed, Apr 27 2016, Jens Axboe wrote: > > On 04/27/2016 12:01 PM, Jan Kara wrote: > > >Hi, > > > > > >On Tue 26-04-16 09:55:23, Jens Axboe wrote: > > >>Since the dawn of time, our background buffered writeback has sucked. > > >>When we do background buffered writeback, it should have little impact > > >>on foreground activity. That's the definition of background activity... > > >>But for as long as I can remember, heavy buffered writers have not > > >>behaved like that. For instance, if I do something like this: > > >> > > >>$ dd if=/dev/zero of=foo bs=1M count=10k > > >> > > >>on my laptop, and then try and start chrome, it basically won't start > > >>before the buffered writeback is done. Or, for server oriented > > >>workloads, where installation of a big RPM (or similar) adversely > > >>impacts database reads or sync writes. When that happens, I get people > > >>yelling at me. > > >> > > >>I have posted plenty of results previously, I'll keep it shorter > > >>this time. Here's a run on my laptop, using read-to-pipe-async for > > >>reading a 5g file, and rewriting it. You can find this test program > > >>in the fio git repo. > > > > > >I have tested your patchset on my test system. Generally I have observed > > >noticeable drop in average throughput for heavy background writes without > > >any other disk activity and also somewhat increased variance in the > > >runtimes. It is most visible on this simple testcases: > > > > > >dd if=/dev/zero of=/mnt/file bs=1M count=10000 > > > > > >and > > > > > >dd if=/dev/zero of=/mnt/file bs=1M count=10000 conv=fsync > > > > > >The machine has 4GB of ram, /mnt is an ext3 filesystem that is freshly > > >created before each dd run on a dedicated disk. > > > > > >Without your patches I get pretty stable dd runtimes for both cases: > > > > > >dd if=/dev/zero of=/mnt/file bs=1M count=10000 > > >Runtimes: 87.9611 87.3279 87.2554 > > > > > >dd if=/dev/zero of=/mnt/file bs=1M count=10000 conv=fsync > > >Runtimes: 93.3502 93.2086 93.541 > > > > > >With your patches the numbers look like: > > > > > >dd if=/dev/zero of=/mnt/file bs=1M count=10000 > > >Runtimes: 108.183, 97.184, 99.9587 > > > > > >dd if=/dev/zero of=/mnt/file bs=1M count=10000 conv=fsync > > >Runtimes: 104.9, 102.775, 102.892 > > > > > >I have checked whether the variance is due to some interaction with CFQ > > >which is used for the disk. When I switched the disk to deadline, I still > > >get some variance although, the throughput is still ~10% lower: > > > > > >dd if=/dev/zero of=/mnt/file bs=1M count=10000 > > >Runtimes: 100.417 100.643 100.866 > > > > > >dd if=/dev/zero of=/mnt/file bs=1M count=10000 conv=fsync > > >Runtimes: 104.208 106.341 105.483 > > > > > >The disk is rotational SATA drive with writeback cache, queue depth of the > > >disk reported in /sys/block/sdb/device/queue_depth is 1. > > > > > >So I think we still need some tweaking on the low end of the storage > > >spectrum so that we don't lose 10% of throughput for simple cases like > > >this. > > > > Thanks for testing, Jan! I haven't tried old QD=1 SATA. I wonder if > > you are seeing smaller requests, and that is why it both varies and > > you get lower throughput? I'll try and setup a test here similar to > > yours. > > Jan, care to try the below patch? I can't fully reproduce your issue on > a SCSI disk limited to QD=1, but I have a feeling this might help. It's > a bit of a hack, but the general idea is to allow one more request to > build up for QD=1 devices. That eliminates wait time between one request > finishing, and the next being submitted. That accidentally added a potentially stall, this one is both cleaner and should have that fixed. diff --git a/lib/wbt.c b/lib/wbt.c index 650da911f24f..322f5e04e994 100644 --- a/lib/wbt.c +++ b/lib/wbt.c @@ -98,18 +98,23 @@ void __wbt_done(struct rq_wb *rwb) else limit = rwb->wb_normal; + inflight = atomic_dec_return(&rwb->inflight); + /* - * Don't wake anyone up if we are above the normal limit. If - * throttling got disabled (limit == 0) with waiters, ensure - * that we wake them up. + * wbt got disabled with IO in flight. Wake up any potential + * waiters, we don't have to do more than that. */ - inflight = atomic_dec_return(&rwb->inflight); - if (limit && inflight >= limit) { - if (!rwb->wb_max) - wake_up_all(&rwb->wait); + if (!rwb_enabled(rwb)) { + wake_up_all(&rwb->wait); return; } + /* + * Don't wake anyone up if we are above the normal limit. + */ + if (inflight && inflight >= limit) + return; + if (waitqueue_active(&rwb->wait)) { int diff = limit - inflight; @@ -150,14 +155,26 @@ static void calc_wb_limits(struct rq_wb *rwb) return; } - depth = min_t(unsigned int, RWB_MAX_DEPTH, rwb->queue_depth); - /* - * Reduce max depth by 50%, and re-calculate normal/bg based on that + * For QD=1 devices, this is a special case. It's important for those + * to have one request ready when one completes, so force a depth of + * 2 for those devices. On the backend, it'll be a depth of 1 anyway, + * since the device can't have more than that in flight. */ - rwb->wb_max = 1 + ((depth - 1) >> min(31U, rwb->scale_step)); - rwb->wb_normal = (rwb->wb_max + 1) / 2; - rwb->wb_background = (rwb->wb_max + 3) / 4; + if (rwb->queue_depth == 1) { + rwb->wb_max = rwb->wb_normal = 2; + rwb->wb_background = 1; + } else { + depth = min_t(unsigned int, RWB_MAX_DEPTH, rwb->queue_depth); + + /* + * Reduce max depth by 50%, and re-calculate normal/bg based on + * that. + */ + rwb->wb_max = 1 + ((depth - 1) >> min(31U, rwb->scale_step)); + rwb->wb_normal = (rwb->wb_max + 1) / 2; + rwb->wb_background = (rwb->wb_max + 3) / 4; + } } static bool inline stat_sample_valid(struct blk_rq_stat *stat)